eval_adv <- FALSE if(Sys.getenv("GLOBAL_EVAL") != "") eval_adv <- Sys.getenv("GLOBAL_EVAL")
library(connections) library(dplyr) library(rlang) library(config) library(dbplyr) library(purrr)
Load the connections
and dplyr
libraries
r
library(connections)
library(dplyr)
library(dbplyr)
library(config)
Use connection_open()
to open a Database connection
r
con <- connection_open(
RPostgres::Postgres(),
host = "localhost",
user = get("user"),
password = get("pwd"),
port = 5432,
dbname = "postgres",
bigint = "integer"
)
Create a variable that points to the v_orders table
r
orders <- tbl(con, in_schema("retail", "v_orders"))
Create a simple dplyr
call that gets the average of all order totals
r
orders %>%
summarise(mean = mean(order_total, na.rm = TRUE))
Load the rlang
library
r
library(rlang)
Create a new function call my_mean()
that will take an argument, x
, and then returns the results of enquo(x)
r
my_mean <- function(x){
enquo(x)
}
Test the new function. It should return the same variable name, but inside quosure. Use order_total
as its argument's value to test
r
my_mean(order_total)
In the function, re-assign x
to the result of enquo(x)
, and then return x
r
my_mean <- function(x){
x <- enquo(x)
x
}
Test the same way again, the output should match to what it was as before
r
my_mean(order_total)
Remove the last line that has x
, add the contents of the function with the initial dplyr
code from step 3. Then replace order_total
with !! x
r
my_mean <- function(x){
x <- enquo(x)
orders %>%
summarise(mean = mean(!! x, na.rm = TRUE))
}
Test the new function by passing order_total
as x
r
my_mean(order_total)
In the summarise()
step, replace the name mean
, with !! as_label(x)
, also replace the =
sign, with :=
r
my_mean <- function(x){
x <- enquo(x)
orders %>%
summarise(!! as_label(x) := mean(!! x, na.rm = TRUE))
}
Run the function again, the name of the column should match the argument value
r
my_mean(order_total)
Test the function by passing a formula, such as order_total / order_qty
r
my_mean(order_total / order_qty)
Make the function generic, add a new argument called: .data
. Inisde the function, replace orders
with .data
r
my_mean <- function(.data, x){
x <- enquo(x)
.data %>%
summarise(!! as_label(x) := mean(!! x, na.rm = TRUE))
}
The function now behaves more like a dplyr
verb. Start with orders
and then pipe into the function
r
orders %>%
my_mean(order_total)
Clean up the code by removing the pipe that inside the function
r
my_mean <- function(.data, x){
x <- enquo(x)
summarise(
.data,
!! as_label(x) := mean(!! x, na.rm = TRUE)
)
}
Confirm that there is no change in the behavior of the function
r
orders %>%
my_mean(order_total)
Add a show_query()
step to preview the resulting SQL statement
r
orders %>%
my_mean(order_total) %>%
show_query()
Try the function with a non-DB backed variable, such as mtcars
. Use mpg
as the aggregating variable
r
mtcars %>%
my_mean(mpg)
Create functions that handle a variable number of arguments. The goal of the exercise is to create an anti-select()
function.
Load the purrr
package
r
library(purrr)
Use ... as the second argument of a function called de_select()
. Inside the function use enquos()
to parse it
r
de_select <- function(.data, ...){
vars <- enquos(...)
vars
}
Test the function using orders
r
orders %>%
de_select(order_id, date)
Add a step to the function that iterates through each quosure and prefixes a minus sign to tell select()
to drop that specific field. Use map()
for the iteration, and quo()
to create the prefixed expression.
r
de_select <- function(.data, ...){
vars <- enquos(...)
vars <- map(vars, ~ quo(- !! .x))
vars
}
Run the same test to view the new results
r
orders %>%
de_select(order_id, date)
Add the select()
step. Use !!! to parse the vars variable inside select()
r
de_select <- function(.data, ...){
vars <- enquos(...)
vars <- map(vars, ~ quo(- !! .x))
select(.data, !!! vars)
}
Run the test again, this time the operation will take place.
r
orders %>%
de_select(order_id, date)
Add a show_query()
step to see the resulting SQL
r
orders %>%
de_select(order_id, date) %>%
show_query()
Test the function with a different data set, such as mtcars
r
mtcars %>%
de_select(mpg, wt, am)
Suggested approach to avoid passing multiple, and similar, queries to the database
Create a simple dplyr
piped operation that returns the mean of order_total for the months of January, February and March as a group
r
orders %>%
filter(date_month %in% c(1,2,3)) %>%
summarise(mean = mean(order_total, na.rm = TRUE))
Assign the first operation to a variable called a, and create copy of the operation but changing the selected months to January, March and April. Assign the second one to a variable called b. ```r a <- orders %>% filter(date_month %in% c(1,2,3)) %>% summarise(mean = mean(order_total, na.rm = TRUE))
b <- orders %>% filter(date_month %in% c(1,3,4)) %>% summarise(mean = mean(order_total, na.rm = TRUE)) ```
Use union() to pass a and b at the same time to the database
r
union(a, b)
Pipe the previous instruction to show_query()
to confirm that the resulting query is a single one
r
union(a, b) %>%
show_query()
Assign to a new variable called months an overlapping set of months
r
months <- list(
c(1,2,3),
c(1,3,4),
c(2,4,6)
)
Use map()
to cycle through each set of overlapping months. Notice that it returns three separate results, meaning that it went to the database three times
r
months %>%
map(
~ orders %>%
filter(date_month %in% .x) %>%
summarise(mean = mean(order_total, na.rm = TRUE))
)
Add a reduce()
operation and use union()
command to create a single query
r
months %>%
map(
~ orders %>%
filter(date_month %in% .x) %>%
summarise(mean = mean(order_total, na.rm = TRUE))
) %>%
reduce(function(x, y) union(x, y))
Use show_query()
to see the resulting single query sent to the database
r
months %>%
map(
~ orders %>%
filter(date_month %in% .x) %>%
summarise(mean = mean(order_total, na.rm = TRUE))
) %>%
reduce(function(x, y) union(x, y)) %>%
show_query()
Create a table with a from and to ranges
r
ranges <- tribble(
~ from, ~to,
1, 4,
2, 5,
3, 7
)
See how map2()
works by passing the two variables as the x and y arguments, and adding them as the function
r
map2(ranges$from, ranges$to, ~.x + .y)
Replace x + y with the dplyr
operation from the previous exercise. In it, re-write the filter to use x and y as the month ranges
r
map2(
ranges$from,
ranges$to,
~ orders %>%
filter(date_month >= .x & date_month <= .y) %>%
summarise(mean = mean(order_total, na.rm = TRUE))
)
Add the reduce()
operation
r
map2(
ranges$from,
ranges$to,
~ orders %>%
filter(date_month >= .x & date_month <= .y) %>%
summarise(mean = mean(order_total, na.rm = TRUE))
) %>%
reduce(function(x, y) union(x, y))
Add a show_query()
step to see how the final query was constructed.
r
map2(
ranges$from,
ranges$to,
~ orders %>%
filter(date_month >= .x & date_month <= .y) %>%
summarise(mean = mean(order_total, na.rm = TRUE))
) %>%
reduce(function(x, y) union(x, y)) %>%
show_query()
Create two character variables. One with the name of a field in flights and another with a new name to be given to the field
r
my_field <- "new"
orders_field <- "order_total"
Add a mutate()
step that adds the new field. And then another step selecting just the new field
r
orders %>%
mutate(my_field = !! orders_field) %>%
select(my_field)
Add a mutate()
step that adds the new field. And then another step selecting just the new field
r
orders %>%
mutate(!! my_field := !! orders_field) %>%
select(my_field)
Wrap orders_field
inside a sym()
function
r
orders %>%
mutate(!! my_field := !! sym(orders_field)) %>%
select(my_field)
Pipe the code into show_query()
r
orders %>%
mutate(!! my_field := !! sym(orders_field)) %>%
select(my_field) %>%
show_query()
connection_close(con)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.