eval_dtplyr <- FALSE if(Sys.getenv("GLOBAL_EVAL") != "") eval_dtplyr <- Sys.getenv("GLOBAL_EVAL")
library(data.table) library(dtplyr) library(dplyr) library(lobstr) library(fs) library(purrr)
dtplyr
dtplyr
basicsLoad data into R via data.table
, and then wrap it with dtplyr
Load the data.table
, dplyr
, dtplyr
, purrr
and fs
libraries
r
library(data.table)
library(dplyr)
library(dtplyr)
library(purrr)
library(fs)
Read the transactions.csv file, from the /usr/share/class/files folder. Use the fread()
function to load the data into a variable called transactions
r
transactions <- dir_ls("/usr/share/class/files", glob = "*.csv") %>%
map(fread) %>%
rbindlist()
Preview the data using glimpse()
r
glimpse(transactions)
Use lazy_dt()
to "wrap" the transactions
variable into a new variable called dt_transactions
r
dt_transactions <- lazy_dt(transactions)
View dt_transactions
structure with glimpse()
r
glimpse(dt_transactions)
Confirm that dtplyr
is not making copies of the original data.table
Load the lobstr
library
r
library(lobstr)
Use obj_size()
to obtain transactions
's size in memory
r
obj_size(transactions)
Use obj_size()
to obtain dt_transactions
's size in memory
r
obj_size(dt_transactions)
Use obj_size()
to obtain dt_transactions
and transactions
size in memory together
r
obj_size(transactions, dt_transactions)
dtplyr
worksUnder the hood view of how dtplyr
operates data.table
objects
Use dplyr
verbs on top of dt_transactions
to obtain the total sales by month
r
dt_transactions %>%
group_by(date_month) %>%
summarise(total_sales = sum(price))
Load the above code into a variable called by_month
r
by_month <- dt_transactions %>%
group_by(date_month) %>%
summarise(total_sales = sum(price))
Use show_query()
to see the data.table
code that by_month
actually runs
r
show_query(by_month)
Use glimpse()
to view how by_month
, instead of modifying the data, only adds steps that will later be executed by data.table
r
glimpse(by_month)
Create a new column using mutate()
r
dt_transactions %>%
mutate(new_field = price / 2)
Use show_query()
to see the copy()
command being used
r
dt_transactions %>%
mutate(new_field = price / 2) %>%
show_query()
Check to confirm that the new column did not persist in dt_transactions
r
dt_transactions
Use lazy_dt()
with the immutable
argument set to FALSE
to avoid the copy
r
m_transactions <- lazy_dt(copy(transactions), immutable = FALSE)
r
m_transactions
Create a new_field
column in m_transactions
using mutate()
r
m_transactions %>%
mutate(new_field = price / 2)
Use show_query()
to see that copy()
is no longer being used
r
m_transactions %>%
mutate(new_field = price / 2) %>%
show_query()
Inspect m_transactions
to see that new_field
has persisted
r
m_transactions
dtplyr
Learn data conversion and basic visualization techniques
Use as_tibble()
to convert the results of by_month
into a tibble
r
by_month %>%
as_tibble()
Load the ggplot2
library
r
library(ggplot2)
Use as_tibble()
to convert before creating a line plot
r
by_month %>%
as_tibble() %>%
ggplot() +
geom_line(aes(date_month, total_sales))
Review a simple way to aggregate data faster, and then pivot it as a tibble
Load the tidyr
library
r
library(tidyr)
Group db_transactions
by date_month
and date_day
, then aggregate price
into total_sales
r
dt_transactions %>%
group_by(date_month, date_day) %>%
summarise(total_sales = sum(price))
Copy the aggregation code above, collect it into a tibble
, and then use pivot_wider()
to make the date_day
the column headers.
r
dt_transactions %>%
group_by(date_month, date_day) %>%
summarise(total_sales = sum(price)) %>%
as_tibble() %>%
pivot_wider(names_from = date_day, values_from = total_sales)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.