eval_dtplyr <- FALSE if(Sys.getenv("GLOBAL_EVAL") != "") eval_dtplyr <- Sys.getenv("GLOBAL_EVAL")
library(data.table) library(dtplyr) library(dplyr) library(lobstr) library(fs) library(purrr)
dtplyrdtplyr basicsLoad data into R via data.table, and then wrap it with dtplyr
Load the data.table, dplyr, dtplyr, purrr and fs libraries
r
library(data.table)
library(dplyr)
library(dtplyr)
library(purrr)
library(fs)
Read the transactions.csv file, from the /usr/share/class/files folder. Use the fread() function to load the data into a variable called transactions
r
transactions <- dir_ls("/usr/share/class/files", glob = "*.csv") %>%
map(fread) %>%
rbindlist()
Preview the data using glimpse()
```r
```
Use lazy_dt() to "wrap" the transactions variable into a new variable called dt_transactions
```r
```
View dt_transactions structure with glimpse()
```r
```
Confirm that dtplyr is not making copies of the original data.table
Load the lobstr library
r
library(lobstr)
Use obj_size() to obtain transactions's size in memory
```r
```
Use obj_size() to obtain dt_transactions's size in memory
```r
```
Use obj_size() to obtain dt_transactions and transactions size in memory together
```r
```
dtplyr worksUnder the hood view of how dtplyr operates data.table objects
Use dplyr verbs on top of dt_transactions to obtain the total sales by month
r
dt_transactions %>%
group_by(date_month) %>%
summarise(total_sales = sum(price))
Load the above code into a variable called by_month
```r
```
Use show_query() to see the data.table code that by_month actually runs
```r
```
Use glimpse() to view how by_month, instead of modifying the data, only adds steps that will later be executed by data.table
```r
```
Create a new column using mutate()
r
dt_transactions %>%
mutate(new_field = price / 2)
Use show_query() to see the copy() command being used
```r
```
Check to confirm that the new column did not persist in dt_transactions
```r
```
Use lazy_dt() with the immutable argument set to FALSE to avoid the copy
r
m_transactions <- lazy_dt(copy(transactions), immutable = FALSE)
r
m_transactions
Create a new_field column in m_transactions using mutate()
r
m_transactions %>%
mutate(new_field = price / 2)
Use show_query() to see that copy() is no longer being used
```r
```
Inspect m_transactions to see that new_field has persisted
```r
```
dtplyrLearn data conversion and basic visualization techniques
Use as_tibble() to convert the results of by_month into a tibble
r
by_month %>%
as_tibble()
Load the ggplot2 library
r
library(ggplot2)
Use as_tibble() to convert before creating a line plot
```r
by_month %>%
ggplot() + geom_line(aes(date_month, total_sales)) ```
Review a simple way to aggregate data faster, and then pivot it as a tibble
Load the tidyr library
r
library(tidyr)
Group db_transactions by date_month and date_day, then aggregate price into total_sales
r
dt_transactions %>%
group_by(date_month, date_day) %>%
summarise(total_sales = sum(price))
Copy the aggregation code above, collect it into a tibble, and then use pivot_wider() to make the date_day the column headers.
```r
dt_transactions %>%
group_by(date_month, date_day) %>%
summarise(total_sales = sum(price)) %>%
pivot_wider(names_from = date_day, values_from = total_sales) ```
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.