eval_adv <- FALSE
if(Sys.getenv("GLOBAL_EVAL") != "") eval_adv <- Sys.getenv("GLOBAL_EVAL")
  library(connections)
  library(dplyr)
  library(rlang)

Advanced Operations

Simple wrapper function

  1. Load the connections and dplyr libraries r library(connections) library(dplyr)

  2. Use connection_open() to open a Database connection r con <- connection_open(RSQLite::SQLite(), "database/local.sqlite")

  3. Create a variable that points to the v_orders table r orders <- tbl(con, "v_orders")

  4. Create a simple dplyr call that gets the average of all order totals r orders %>% summarise(mean = mean(order_total, na.rm = TRUE))

  5. Load the rlang library r library(rlang)

  6. Create a new function call my_mean() that will take an argument, x, and then returns the results of enquo(x) r my_mean <- function(x){ enquo(x) }

  7. Test the new function. It should return the same variable name, but inside quosure. Use order_total as its argument's value to test r my_mean(order_total)

  8. In the function, re-assign x to the result of enquo(x), and then return x r my_mean <- function(x){ x <- enquo(x) x }

  9. Test the same way again, the output should match to what it was as before r my_mean(order_total)

  10. Remove the last line that has x, add the contents of the function with the initial dplyr code from step 3. Then replace order_total with !! x r my_mean <- function(x){ x <- enquo(x) orders %>% summarise(mean = mean(!! x, na.rm = TRUE)) }

  11. Test the new function by passing order_total as x r my_mean(order_total)

  12. In the summarise() step, replace the name mean, with !! as_label(x), also replace the = sign, with := r my_mean <- function(x){ x <- enquo(x) orders %>% summarise(!! as_label(x) := mean(!! x, na.rm = TRUE)) }

  13. Run the function again, the name of the column should match the argument value r my_mean(order_total)

  14. Test the function by passing a formula, such as order_total / order_qty

    r my_mean(order_total / order_qty)

  15. Make the function generic, add a new argument called: .data. Inisde the function, replace orders with .data r my_mean <- function(.data, x){ x <- enquo(x) .data %>% summarise(!! as_label(x) := mean(!! x, na.rm = TRUE)) }

  16. The function now behaves more like a dplyr verb. Start with orders and then pipe into the function r orders %>% my_mean(order_total)

  17. Clean up the code by removing the pipe that inside the function r my_mean <- function(.data, x){ x <- enquo(x) summarise( .data, !! as_label(x) := mean(!! x, na.rm = TRUE) ) }

  18. Confirm that there is no change in the behavior of the function r orders %>% my_mean(order_total)

  19. Add a show_query() step to preview the resulting SQL statement r orders %>% my_mean(order_total) %>% show_query()

  20. Try the function with a non-DB backed variable, such as mtcars. Use mpg as the aggregating variable r mtcars %>% my_mean(mpg)

Multiple variables

Multiple queries

Characters to field names



edgararuiz/bigdataclass documentation built on Jan. 3, 2020, 6:46 p.m.