Home

/

GitHub

/

rstudio-conf-2020/big-data

/

In rstudio-conf-2020/big-data: Content and setup files for the Big Data with R class

eval_dtplyr <- FALSE
if(Sys.getenv("GLOBAL_EVAL") != "") eval_dtplyr <- Sys.getenv("GLOBAL_EVAL")

library(data.table)
library(dtplyr)
library(dplyr)
library(lobstr)
library(fs)
library(purrr)

Introduction to `dtplyr`

`dtplyr` basics

Load data into R via data.table, and then wrap it with dtplyr

Load the data.table, dplyr, dtplyr, purrr and fs libraries r library(data.table) library(dplyr) library(dtplyr) library(purrr) library(fs)
Read the transactions.csv file, from the /usr/share/class/files folder. Use the fread() function to load the data into a variable called transactions r transactions <- dir_ls("/usr/share/class/files", glob = "*.csv") %>% map(fread) %>% rbindlist()
Preview the data using glimpse() ```r

```
Use lazy_dt() to "wrap" the transactions variable into a new variable called dt_transactions ```r

```
View dt_transactions structure with glimpse() ```r

```

Object sizes

Confirm that dtplyr is not making copies of the original data.table

Load the lobstr library r library(lobstr)
Use obj_size() to obtain transactions's size in memory ```r

```
Use obj_size() to obtain dt_transactions's size in memory ```r

```
Use obj_size() to obtain dt_transactions and transactions size in memory together ```r

```

How `dtplyr` works

Under the hood view of how dtplyr operates data.table objects

Use dplyr verbs on top of dt_transactions to obtain the total sales by month r dt_transactions %>% group_by(date_month) %>% summarise(total_sales = sum(price))
Load the above code into a variable called by_month ```r

```
Use show_query() to see the data.table code that by_month actually runs ```r

```
Use glimpse() to view how by_month, instead of modifying the data, only adds steps that will later be executed by data.table ```r

```
Create a new column using mutate() r dt_transactions %>% mutate(new_field = price / 2)
Use show_query() to see the copy() command being used ```r

```
Check to confirm that the new column did not persist in dt_transactions ```r

```
Use lazy_dt() with the immutable argument set to FALSE to avoid the copy r m_transactions <- lazy_dt(copy(transactions), immutable = FALSE)

r m_transactions
Create a new_field column in m_transactions using mutate() r m_transactions %>% mutate(new_field = price / 2)
Use show_query() to see that copy() is no longer being used ```r

```
Inspect m_transactions to see that new_field has persisted ```r

```

Working with `dtplyr`

Learn data conversion and basic visualization techniques

Use as_tibble() to convert the results of by_month into a tibble r by_month %>% as_tibble()
Load the ggplot2 library r library(ggplot2)
Use as_tibble() to convert before creating a line plot ```r by_month %>%

ggplot() + geom_line(aes(date_month, total_sales)) ```

Pivot data

Review a simple way to aggregate data faster, and then pivot it as a tibble

Load the tidyr library r library(tidyr)
Group db_transactions by date_month and date_day, then aggregate price into total_sales r dt_transactions %>% group_by(date_month, date_day) %>% summarise(total_sales = sum(price))
Copy the aggregation code above, collect it into a tibble, and then use pivot_wider() to make the date_day the column headers. ```r dt_transactions %>% group_by(date_month, date_day) %>% summarise(total_sales = sum(price)) %>%

pivot_wider(names_from = date_day, values_from = total_sales) ```

rstudio-conf-2020/big-data documentation built on Feb. 4, 2020, 5:24 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

rstudio-conf-2020/big-data
Content and setup files for the Big Data with R class

In rstudio-conf-2020/big-data: Content and setup files for the Big Data with R class

Introduction to `dtplyr`

`dtplyr` basics

Object sizes

How `dtplyr` works

Working with `dtplyr`

Pivot data

R Package Documentation

Browse R Packages

We want your feedback!

rstudio-conf-2020/big-data Content and setup files for the Big Data with R class

In rstudio-conf-2020/big-data: Content and setup files for the Big Data with R class

Introduction to dtplyr

dtplyr basics

Object sizes

How dtplyr works

Working with dtplyr

Pivot data

R Package Documentation

Browse R Packages

We want your feedback!

rstudio-conf-2020/big-data
Content and setup files for the Big Data with R class

Introduction to `dtplyr`

`dtplyr` basics

How `dtplyr` works

Working with `dtplyr`