Loading...
 

2020 Big data in R with Tidyverse and Spark

"SEMINARIO: análisis de Big data con Tidyverse y Spark: uso en estadística pública"
Dilluns 14 setembre 16h-17:30h+preguntes
Start Slideshow Presentation

1.1. Intro slides

1.1.1. Public money - public code


There is a Government policy for open source and agile methodologies

Click to expand
Click to expand

1.1.2. R & Big Data

https://github.com/rstudio/webinars/blob/master/14-Work-with-big-data/14-Work-with-big-data.pdf

1.1.3. Tidyverse

1.1.4. Spark (sparklyr):

https://github.com/rstudio/webinars/blob/master/42-Introduction%20to%20sparklyr/Introducing%20sparklyr%20-%20Webinar.pdf

1.1.5. Speeding up Spark via R via Arrow

https://arrow.apache.org/blog/2019/01/25/r-spark-improvements/

1.2. Intro Exercise

See:
https://gitlab.com/radup/curs-r-introduccio/blob/master/codi/extra.tips.bigdata.R

1.3. References:

  • Some online tutorials
    • 30Gb DataSet
    • Text mining using sparklyr


Image Seed: noun \ˈsēd\ : the beginning of something which continues to develop or grow

Knowledge seeds

Switch Language