In edgararuiz/bigdataclass: Setup of the Big Data with R class

eval_sparklyr <- FALSE
if(Sys.getenv("GLOBAL_EVAL") != "") eval_sparklyr <- Sys.getenv("GLOBAL_EVAL")

Intro to `sparklyr`

library(dplyr)
library(sparklyr)

New Spark session

Learn to open a new Spark session

Load the sparklyr library r library(sparklyr)
Use spark_connect() to create a new local Spark session r sc <- spark_connect(master = "local")
Click on the Spark button to view the current Spark session's UI
Click on the Log button to see the message history

Data transfer

Practice uploading data to Spark

Load the dplyr library r library(dplyr)
Copy the mtcars dataset into the session r spark_mtcars <- copy_to(sc, mtcars, "my_mtcars")
In the Connections pane, expande the my_mtcars table
Go to the Spark UI, note the new jobs
In the UI, click the Storage button, note the new table
Click on the In-memory table my_mtcars link

Spark and `dplyr`

See how Spark handles dplyr commands

Run the following code snipett r spark_mtcars %>% group_by(am) %>% summarise(mpg_mean = mean(mpg, na.rm = TRUE))
Go to the Spark UI and click the SQL button
Click on the top item inside the Completed Queries table
At the bottom of the diagram, expand Details

edgararuiz/bigdataclass documentation built on Jan. 3, 2020, 6:46 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com