eval_vroom <- FALSE if(Sys.getenv("GLOBAL_EVAL") != "") eval_vroom <- Sys.getenv("GLOBAL_EVAL")
library(vroom) library(fs) library(purrr) library(dplyr)
vroomvroom basicsLoad data into R using vroom
Load the vroom() library
r
library(vroom)
Use the vroom() function to read the transactions_1.csv file from the /usr/share/class/files folder
r
vroom("/usr/share/class/files/transactions_1.csv")
Use the id argument to add the file name to the data frame. Use file_name as the argument's value
r
vroom("/usr/share/class/files/transactions_1.csv", id = "file_name")
Load the prior command into a variable called vr_transactions
```r
vr_transactions <- vroom("/usr/share/class/files/transactions_1.csv", id = "file_name")
vr_transactions ```
Load the file spec into a variable called vr_spec, using the spec() command
```r
vr_spec <- spec(vr_transactions)
vr_spec ```
Load the fs and dplyr libraries
r
library(fs)
library(dplyr)
List files in the /usr/share/class/files folder using the dir_ls() function
r
dir_ls("/usr/share/class/files")
In the dir_ls() function, use the glob argument to pass a wildcard to list CSV files only. Load to a variable named files
r
files <- dir_ls("/usr/share/class/files", glob = "*.csv")
Pass the files variable to vroom. Set the n_max argument to 1,000 to limit the data load for now
r
vroom(files, n_max = 1000)
Add a col_types argument with vr_specs as its value
r
vroom(files, n_max = 1000, col_types = vr_spec)
Use the col_select argument to pass a list object containing the following variables: order_id, date, customer_name, and price
r
vroom(files, n_max = 1000, col_types = vr_spec,
col_select = list(order_id, date, customer_name, price)
)
For files that are too large to have in memory, keep a summarization
Use a for() loop to print the content of each vector inside files
r
for(i in seq_along(files)) {
print(files[i])
}
Switch the print() command with the vroom command, using the same arguments, except the file name. Use the files variable. Load the results into a variable called transactions.
```r
for(i in seq_along(files)) {
transactions <- vroom(files[i], n_max = 1000, col_types = vr_spec,
col_select = list(order_id, date, customer_name, price))
} ```
Group transactions by order_id and get the total of price and the number of records. Name them total_sales and no_items respectively. Name the new variable orders
r
for(i in seq_along(files)) {
transactions <- vroom(files[i], n_max = 1000, col_types = vr_spec,
col_select = list(order_id, date, customer_name, price))
orders <- transactions %>%
group_by(order_id) %>%
summarise(total_sales = sum(price), no_items = n())
}
Define the orders variable as NULL prior to the for loop and add a bind_rows() step to orders to preserve each summarized view.
r
orders <- NULL
for(i in seq_along(files)) {
transactions <- vroom(files[i], n_max = 1000, col_types = vr_spec,
col_select = list(order_id, date, customer_name, price))
orders <- transactions %>%
group_by(order_id) %>%
summarise(total_sales = sum(price), no_items = n()) %>%
bind_rows(orders)
}
Remove the transactions variable at the end of each cycle
r
orders <- NULL
for(i in seq_along(files)) {
transactions <- vroom(files[i], n_max = 1000, col_types = vr_spec,
col_select = list(order_id, date, customer_name, price))
orders <- transactions %>%
group_by(order_id) %>%
summarise(total_sales = sum(price), no_items = n()) %>%
bind_rows(orders)
rm(transactions)
}
Preview the orders variable
r
orders
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.