eval_viz <- FALSE
if(Sys.getenv("GLOBAL_EVAL") != "") eval_viz <- Sys.getenv("GLOBAL_EVAL")
library(connections)
library(dplyr)
library(dbplyr)
library(dbplot)
library(ggplot2)
library(leaflet)
library(config)

Data Visualizations

Simple plot

Practice pushing the calculations to the database

  1. Load the connections, dplyr, dbplyr, and config libraries r library(connections) library(dplyr) library(dbplyr) library(config)

  2. Use connection_open() to open a Database connection r con <- connection_open( RPostgres::Postgres(), host = "localhost", user = get("user", config = "dev"), password = get("pwd", config = "dev"), port = 5432, dbname = "postgres", bigint = "integer" )

  3. Use tbl() to create a pointer to the v_orders table r orders <-

  4. Use collect() bring back the aggregated results into a "pass-through" variable called by_year r by_year <- orders %>% count(date_year) %>% collect()

  5. Preview the by_year variable ```r

    ```

  6. Load the ggplot2 library r library(ggplot2)

  7. Plot results using ggplot2 r ggplot(by_year) + geom_col(aes(date_year, n))

  8. Using the code in this section, create a single piped code set which also creates the plot ```r

    ```

Plot in one code segment

Practice going from dplyr to ggplot2 without using pass-through variable, great for EDA

  1. Summarize the order totals in a new variable called sales r orders %>% summarise(sales = sum(order_total))

  2. Summarize the order totals grouped by date_year in a new variable called sales r orders %>% group_by(date_year) %>% summarise(sales = sum(order_total))

  3. Summarize the order totals grouped by date_year in a new variable called sales and plot the results ```r

    ```

  4. Switch the calculation to reflect the average of the order sale total ```r

    ```

Create a histogram

Use the dbplot package to easily create a histogram

  1. Load the dbplot package r library(dbplot)

  2. Use the dbplot_histogram() to build the histogram r orders %>% dbplot_histogram(order_total)

  3. Adjust the binwidth to 10 ```r

    ```

Raster plot

Use dbplot's raster graph

  1. Use a dbplot_raster() to visualize order_qty versus order_total r orders %>% dbplot_raster(order_qty, order_total)

  2. Change the plot's resolution to 10 ```r

    ```

Using the compute functions

  1. Use the db_compute_raster() function to get the underlying results that feed the plot r locations <- orders %>% db_compute_raster2(customer_lon, customer_lat, resolution = 10)

  2. Preview the locations variable r locations

  3. Load the leaflet library r library(leaflet)

  4. Pipe location into the leaflet() function, and then pipe that into the addTiles() function r locations %>% leaflet() %>% addTiles()

  5. Add the addRectangles() function using the longitude and latitude variables

locations %>%
  leaflet() %>% 
    addTiles() %>%
    addRectangles(
      ~customer_lon, 
      ~customer_lat, 
      ~customer_lon_2,
      ~customer_lat_2
    )
  1. Add the fillOpacity argument to the addRectangles() step, use n() as the value for it
locations %>%
  leaflet() %>% 
    addTiles() %>%
    addRectangles(
      ~customer_lon, 
      ~customer_lat, 
      ~customer_lon_2,
      ~customer_lat_2,
      fillOpacity = ~`n()`
    )
  1. Modify fillOpacity to be calculated as a percentage against the maximum number of orders
locations %>%
  leaflet() %>% 
    addTiles() %>%
    addRectangles(
      ~customer_lon, 
      ~customer_lat, 
      ~customer_lon_2,
      ~customer_lat_2,
      fillOpacity = ~(`n()` / max(`n()`))
    )
  1. Add the popup argument with the following instruction as its value: ~paste0("<p>No of orders: ",n(),"</p>")
locations %>%
  leaflet() %>% 
    addTiles() %>%
    addRectangles(
      ~customer_lon, 
      ~customer_lat, 
      ~customer_lon_2,
      ~customer_lat_2,
      fillOpacity = ~(`n()` / max(`n()`)),
      popup = ~paste0("<p>No of orders: ",  `n()`,"</p>")
    )
  1. Disconnect from the database using connection_close r connection_close(con)


rstudio-conf-2020/big-data documentation built on Feb. 4, 2020, 5:24 p.m.