eval_dtplyr <- FALSE if(Sys.getenv("GLOBAL_EVAL") != "") eval_dtplyr <- Sys.getenv("GLOBAL_EVAL")
library(data.table) library(dtplyr) library(dplyr) library(lobstr) library(fs) library(purrr)
dtplyr
dtplyr
basicsLoad data into R via data.table
, and then wrap it with dtplyr
Load the data.table
, purrr
and fs
libraries
r
library(data.table)
library(purrr)
library(fs)
Read the transactions.csv file, from the {{files}} folder. Use the fread()
function to load the data into a variable called transactions
r
transactions <- dir_ls("{{files}}", glob = "*.csv") %>%
map(fread) %>%
rbindlist()
Preview the data using str()
r
str(transactions)
Load the dplyr
and dtplyr
libraries
r
library(dplyr)
library(dtplyr)
Use the lazy_dt()
to "wrap" the transactions
variable, into a new variable called dt_transactions
r
dt_transactions <- lazy_dt(transactions)
View the dt_transactions
variable's structure with str()
r
str(dt_transactions)
Confirm that dtplyr
is not making copies of the original data.table
Load the lobstr
library
r
library(lobstr)
Use obj_size()
to obtain transactions
's size in memory
r
obj_size(transactions)
Use obj_size()
to obtain dt_transactions
's size in memory
r
obj_size(dt_transactions)
Use obj_size()
to obtain dt_transactions
and transactions
size in memory together
r
obj_size(transactions, dt_transactions)
dtplyr
worksUnder the hood view of how dtplyr
operates data.table
objects
Use dplyr
verbs on top of dt_transactions
to obtain the total sales by month
r
dt_transactions %>%
group_by(date_month) %>%
summarise(total_sales = sum(price))
Load the above code into a variable called by_month
r
by_month <- dt_transactions %>%
group_by(date_month) %>%
summarise(total_sales = sum(price))
Use show_query()
to see the data.table
code that by_month
actually runs
r
show_query(by_month)
Use str()
to view how by_month
, instead of modifying the data, it only adds steps that will later be operated by data.table
r
str(by_month)
dtplyr
Learn data conversion and basic visualization techniques
Use as_tibble()
to convert the results of by_month
into a tibble
r
by_month %>%
as_tibble()
Load the ggplot2
library
r
library(ggplot2)
Use as_tibble()
to convert before creating a line plot
r
by_month %>%
as_tibble() %>%
ggplot() +
geom_line(aes(date_month, total_sales))
Review a simple way to aggregate data faster, and then pivot it as a tibble
Load the tidyr
library
r
library(tidyr)
Group db_transactions
by date_month
and date_day
, then aggregate price
into total_sales
r
dt_transactions %>%
group_by(date_month, date_day) %>%
summarise(total_sales = sum(price))
Copy the aggregation code above, then collect it into a tibble
, and then use pivot_wider()
to make the date_day
the column headers.
r
dt_transactions %>%
group_by(date_month, date_day) %>%
summarise(total_sales = sum(price)) %>%
as_tibble() %>%
pivot_wider(names_from = date_day, values_from = total_sales)
mutate()
verbSee how dtplyr
creates a copy of the original data.table
object in order to make the mutate
operation work the same as it does on dtplr
Use mutate()
and show_query()
to see the copy()
command being used
r
dt_transactions %>%
mutate(new_field = price / 2) %>%
show_query()
Use lazy_dt()
with the immutable
argument set to FALSE
to avoid the copy
r
lazy_dt(transactions, immutable = FALSE) %>%
mutate(new_field = price / 2) %>%
show_query()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.