eval_caching <- FALSE if(Sys.getenv("GLOBAL_EVAL") != "") eval_caching <- Sys.getenv("GLOBAL_EVAL")
```r library(sparklyr) library(dplyr) library(readr) library(purrr) ```
See the machanics of how Spark is able to use files as a data source
Examine the contents of the /usr/share/class/files folder
Load the sparklyr
library
r
library(sparklyr)
Use spark_connect()
to create a new local Spark session
r
sc <- spark_connect(master = "local")
Load the readr
and purrr
libraries
r
library(readr)
library(purrr)
Read the top 5 rows of the transactions_1 CSV file
r
top_rows <- read_csv("/usr/share/class/files/transactions_1.csv", n_max = 5)
Create a list based on the column names, and add a list item with "character" as its value. Name the variable file_columns
r
file_columns <- top_rows %>%
rename_all(tolower) %>%
map(function(x) "character")
Preview the contents of the file_columns
variable
r
head(file_columns)
Use spark_read()
to "map" the file's structure and location to the Spark context. Assign it to the spark_lineitems
variable
```r
```
In the Connections pane, click on the table icon by the transactions
variable
Verify that the new variable pointer works by using tally()
```r
```
Learn how to cache a subset of the data in Spark
Create a subset of the orders table object. Summarize by date, careate a total price and number of items sold.
r
daily_orders <-
Use compute()
to extract the data into Spark memory
```r
```
Confirm new variable pointer works ```r
```
Go to the Spark UI
Click the Storage button
Notice that "orders" is now cached into Spark memory
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.