In MazamaScience/AirSensor2: Process and Display Data from Air Quality Sensors

knitr::opts_chunk$set(fig.width = 7, fig.height = 5)

Introduction

This tutorial demonstrates how to create pas, pat and monitor objects for PurpleAir sensors in a particular community. Target audiences include grad students, researchers, air quality professionals and any member of the public concerned about air quality and comfortable working with R and RStudio.

Goal

Our goal in this tutorial is to create pas, pat and monitor objects (data structures) for the Methow Valley -- a community in north-central Washington state. Clean Air Methow operates as a project of the Methow Valley Citizens Council and began deploying PurpleAir Sensors in 2018:

In the summer of 2018, Clean Air Methow launched the Clean Air Ambassador Program, an exciting citizen science project, and one of the largest, rural networks of low-cost sensors in the world!

This tutorial will demonstrate how to access and work with data from this collection of sensors.

Data Structures

pas objects in the AirSensor package contain per-instrument metadata for a collection of PurpleAir sensors. This will include both "spatial metadata" like longitude, latitude, timezone, etc. as well as each instrument's sensor_index, allowing us to request timeseries data from the PurpleAir API.

pat objects in the AirSensor package contain time-invariant spatial metadata for a single sensor as well as the time-dependent measurements made by that sensor.

monitor objects in the AirMonitor package contain time-invariant spatial metadata for multiple sensors as well as the time-dependent PM2.5 measurements made by those sensors.

Create a PurpleAirSynoptic (pas) object

To find the sensors we wish to investigate, we must first create a pas object with metadata for all PurpleAir senors in our target area. The Methow Valley valley is located entirely within Okanogan County, WA, so we can create an Okanogan-only pas object as our starting point.

# AirSensor2 package
library(AirSensor2)

cat(getwd())

# Set user's PurpleAir_API_READ_KEY
source('global_vars.R')
setAPIKey("PurpleAir-read", PurpleAir_API_READ_KEY)

# Initialize spatial datasets
initializeMazamaSpatialUtils()

# Create a new 'pas' object for Okanogan county
okanogan_pas <-
  pas_createNew(
    countryCodes = "US",
    stateCodes = "WA",
    counties = c("Okanogan"),
    lookbackDays = 365,         # all sensors from the past year
    location_type = 0           # outdoor sensors only
  )

# Interactive map
okanogan_pas %>% pas_leaflet()

Explore available sensors

Clicking on some of the sensors in the Methow Valley, it quickly becomes apparent that many of those sensors have a label that associates them with the "Clean Air Ambassador" program. Unfortunately, the naming is not consistent.

A quick review of sort(pas$locationName) reveals:

"Clean Air Ambassador @..."
"MV Ambassador @..."
"MV Ambassadors@..."
"MV Clean Air Abassador @..."
"MV Clean Air Ambassador @..."
"MV Clean Air Ambassador-..."
"MVCC Ambassador @..."
"MVCC Ambassador@..."

Clearly, some effort was made to systematize the naming even if it wasn't entirely successful. Nevertheless, we can filter for all location names that begin with "MV" or "Clean Air" to create a MVCAA-only pas object

mvcaa_pas <-
  okanogan_pas %>%
  pas_filter(stringr::str_detect(locationName, "^MV|^Clean Air"))

# Interactive map
mvcaa_pas %>% pas_leaflet()

Create a PurpleAirTimeseries (pat) object

A pat object contains time series data for a specific sensor. The pat_create() function downloads all data records for a sensor -- "raw data" typically measured every 2 minutes. A similar function, pat_createHourly(), downloads hourly aggregated data as provided by the PurpleAir API.

Raw Data

The pat_create() function has a fields argument that lets you specify which data fields should be included in the result. But default, it uses all those defined in PurpleAir_PAT_QC_FIELDS:

PurpleAir_PAT_QC_FIELDS %>%
  stringr::str_split_1(',')

Clicking on the leaflet map above, we identify the sensor_index for the "Winthrop Library" sensor as "13681". The following chunk of code creates a raw pat object for this sensor:

# Create raw pat object
pat <-
  pat_create(
    api_key = PurpleAir_API_READ_KEY,
    pas = mvcaa_pas,
    sensor_index = "13681",
    startdate = "2023-07-16",
    enddate = "2023-07-18",
    timezone = "UTC",
    verbose = TRUE
  )

# Pull out data
tbl <- pat$data

# Review parameters
names(tbl)

We can now use the standard behavior of the base plot() function to review all parameters and look for any interesting correlations among them.

In the plots below, we see that temperature and humidity (aka "relative humidity") are inversely correlated, that pm2.5_atm_a and pm2.5_atm_b are strongly correlated and that variables pm2.5_atm and pm2.5_cf are essentially identical.

# NOTE:  Using "pch = 15" greatly improves the speed of drawing

# Sensor Electronics
plot(tbl[,c(1,2:5)], pch = 15, cex = 0.5, main = "Sensor Electronics")

# Atmospheric Variables
plot(tbl[,c(1,6:8)], pch = 15, cex = 0.5, main = "Atmospheric Variables")

# PM2.5 "atm"
plot(tbl[,c(1,12:14)], pch = 15, cex = 0.5, main = "PM2.5 'atm'")

# PM2.5 comparison
plot(tbl[,c(1,9,12,15)], pch = 15, cex = 0.5, main = "PM2.5 'alt' vs 'atm' vs 'cf1'")

Create a 'monitor' object

For use cases involving comparison with regulatory monitors, calculating daily averages or informing the public, it is imperative to use hourly aggregated data that has had a correction equation applied. (PurpleAir sensors tend to report pm2.5 values that are higher than those reported by EPA regulatory monitors.)

The PurpleAir_createMonitor() function works similarly to pat_createHourly() but only downloads those parameters typically used in QC and correction functions. A correction_FUN is applied to the hourly pat object and returns corrected PM2.5 values. By default, an EPA vetted correction function is applied that brings PurpleAir values in line with EPA data even in the smoky conditions seen with wildfire smoke. (See the documentation for pat_applyCorrection() for details.)

PurpleAir_createMonitor() returns a monitor object ready to use with the AirMonitor and AirMonitorPlots packages.

Below, we create a monitor object for a period in August, 2021 when wildfire smoke severely impacted the Methow Valley.

NOTE: Many sensors will not have data going back multiple years.

# "Mazama Trailhead" during the Cedar Creek and Cub Creek fires of 2021
monitor <-
  PurpleAir_createNewMonitor(
    api_key = PurpleAir_API_READ_KEY,
    pas = mvcaa_pas,
    sensor_index = "95227",           # Mazama Trailhead
    startdate = "2021-07-15",
    enddate = "2021-08-01",
    timezone = "America/Los_Angeles", # timestamps interpreted in this time zone
    verbose = TRUE
  )

# Check to see that we have data
dim(monitor$data)

We can now use functions from the AirMonitor and AirMonitorPlots packages to manipulate and visualize this data.

NOTE: Values are in units of µg/m³ not AQI.

# Create a basic timeseries plot
AirMonitor::monitor_timeseriesPlot(
  monitor,
  shadedNight = TRUE, 
  addAQI = TRUE
)
AirMonitor::addAQILegend("topleft", cex = 0.8, bg = "white")

# Create a daily barplot
AirMonitor::monitor_dailyBarplot(
  monitor,
)
AirMonitor::addAQILegend("topleft", cex = 0.8, bg = "white")

# Extract regulatory daily averages using "Local Standard Time"
monitor %>%
  AirMonitor::monitor_dailyStatistic(
    FUN = mean,
    minHours = 18,
    dayBoundary = "LST"
  ) %>%
  AirMonitor::monitor_getData()

# Use AirMonitorPlots to create a "diurnal" plot
monitor %>%
  AirMonitorPlots::monitor_ggDailyByHour_archival()