In gmcginnis/AirVizR: Visualization Tools for Air Quality Data

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval = FALSE,
  tidy = TRUE
)

# Import the necessary package:
library(AirVizR)

Importing using PurpleAir API

The following showcases an example of retrieving PurpleAir data using functions derived from those in the AirSensor package. Because this process is dependent on external functions, it is possible that changes to the PurpleAir API structure could result in depreciation or failures to compile.

Inputs

Many functions rely on the same inputs, therefore default to the same defined objects in the environment. It is highly recommended to specify inputs upfront in their own chunk to simplify functionality. Various environment cleans are present in this document to showcase where and when these objects can be discarded.

### REQUIRED inputs

# REQUIRED: Start and end dates of interest. Please use the following format: "YYYY-MM-DD", with leading zeros where appropriate
# Data will be pulled from the start of the start date through the end of the end date
input_startdate <- "2020-07-01"
input_enddate <- "2020-07-07"

# Drop monitors flagged as reporting high values?
# TRUE is recommended, but keep in mind that EPA correction factors can account for high values and A/B monitor disparity
input_drop_hi <- TRUE

## REQUIRED: Inside/Outside argument. At least one must be TRUE.
# Grab data for outdoor sensors?
include_outside <- TRUE
# Grab data for indoor sensors?
include_inside <- TRUE


### RECOMMENDED inputs
# Set to NULL if not interested in filtering by the following.

# State of monitors to include. If NULL/unspecified, all monitors specified in the above boundaries will be included regardless of state
input_stateCode <- "OR"

# Boundaries (in degrees) of monitors to include. It can also evaluate n/s or e/w pairs of bounds individually.
# Helpful website: bboxfinder [dot] com
input_west <- -122.854
input_south <- 45.4
input_east <- -122.58
input_north <- 45.6

# Labels of monitors to include. If NULL/unspecified, all monitors will be included.
# The labels are based on string detection, so for instance having "STAR" will pull all that have the word "STAR" in the label.
# \\b is added where we want only values that report those words on their own (ex. STAR Lab and not MYSTARSYSTEM1)
# An example format is below. Capitalization matters!
input_labels <- c("se", "SE", "Se", "\\bSTAR\\b", "\\bPSU\\b", "(C|c)ollege", "(R|r)ow", "Richmond")


## Optional grouping settings (which will NOT impact raw data output, but can be useful when graphing!)

## DATE GROUPING
# Run date grouping? If set to "FALSE", the date inputs below will not matter (but will still appear in the environment)
run_date_grouping <- TRUE
# If TRUE, specify the groupings below as respective items in their lists. Dates that are not included in these ranges will be dropped.
# (i.e. the first item in each list is one "group", the second item in each list is another "group", etc.)
# Date grouping categories:
input_date_tags <- c("Before", "Independence Day", "After")
# Start dates (in a list, "YYYY-MM-DD")
input_date_starts <- c("2020-07-01", "2020-07-04", "2020-07-05")
# End dates (in a list, "YYYY-MM-DD")
input_date_ends <- c("2020-07-03", "2020-07-04", "2020-07-07")

## HOUR GROUPING
# Run hour grouping? If set to "FALSE", the hour inputs below will not matter (but will still appear in the environment)
run_hour_grouping <- TRUE
# If TRUE, specify the groupings below as respective items in their lists
# (i.e. the first item in each list is one "group", the second item in each list is another "group", etc.)
# Hour grouping categories:
input_hour_tags <- c("Morning", "Afternoon", "Evening", "Night")
# Start hours (in a list, using full hours in 24 hour format)
input_hour_starts <- c(5, 12, 17, 21)
# End hours will be automatically generated; no need to add "end" hours. This also means that no hours will be dropped.
# If you would like to drop specific hours, consider using the filter_df() function

Loading Data

The data import process iterates over a loop and will take the greatest amount of time. Keep in mind that if many IDs and/or a long time span is chosen, it will take longer to load. If more than 100 IDs are detected, a message will appear in the console to warm of a large data load; run example(get_ids) to see an example of this message. This is also why get_ids() is not automatically integrated into get_area_pat().

pas_area <- get_area_pas()

ids <- get_ids()

# The following step will take the longest time
results <- get_area_pat()

# For ease of functionality, separate the results into their own respective files
raw_meta <- results$raw_meta
raw_data <- results$raw_data

# OPTIONAL - clean the (digital) environment
remove(input_stateCode, input_north, input_east, input_south, input_west, input_labels, ids, pas_area, results, include_inside, include_outside)

Wrangling Data

Now that we have the raw data imported, we will wrangle them by applying basic quality control, time adjustments, and correction factors.

# Wrangling the data
pa_meta <- wrangle_meta(raw_meta)
pa_full <- wrangle_data(raw_data)

# Creating data sets for each grouping option
pa_hourly <- apply_functions(pa_full, by_day = TRUE, by_hour = TRUE)
pa_daily <- apply_functions(pa_full, by_day = TRUE, by_hour = FALSE)
pa_diurnal <- apply_functions(pa_full, by_day = FALSE, by_hour = TRUE)

# Apply date and/or hour tags to the original data set, if specified in the inputs
if (run_date_grouping == TRUE){
  pa_full <- apply_date_tags(pa_full)
}
if (run_hour_grouping == TRUE) {
  pa_full <- apply_hour_tags(pa_full)
}

# OPTIONAL
# Cleaning the global environment
remove(input_drop_hi, run_date_grouping, run_hour_grouping)
remove(list = c(ls(pattern = "input_(hour|date)_")))
# It is recommended to keep raw_meta and raw_data since it takes the longest to load,
# as well as the input start & end dates (since the diurnal set does not contain date information)