library(AirSensor) library(dplyr) library(ggplot2) knitr::opts_chunk$set(fig.width = 7, fig.height = 5) # NOTE: Use example PAS data for vignettes to avoid long wait-times data("example_pas") pas <- example_pas
Synoptic data provides a synopsis - a comprehensive view of something at a moment in time. This vignette demonstrates an example workflow for exploring air quality synoptic data using the AirSensor R package and data captured by PurpleAir air quality sensors.
PurpleAir sensor readings are uploaded to the cloud every 120 seconds. (Every 80 seconds prior to a May 31, 2019 firmware upgrade.) Data are processed by PurpleAir and a version of the data is displayed on the PurpleAir website.
You can generate a current PurpleAir Synoptic (PAS) object (hereafter called
a pas
) by using the pas_createNew()
function. A pas
object is just a large
dataframe with r ncol(pas)
data columns and a record for each
PurupleAir sensor channel (2 channels per sensor).
The pas_createNew()
function performs the following tasks under the hood:
Download a raw dataset of the entire PurpleAir network that includes both
metadata and recent PM2.5 averages for each deployed sensor across the globe.
See downloadParseSynopticData()
for more info.
Subset and enhance the raw dataset by replacing variables with more consistent,
human readable names and adding spatial metadata for each sensor including the
nearest official air quality monitor. For a more in depth explanation, see
enhanceSynopticData()
.
To create a new pas
object you must first properly initialize the
MazamaSpatialUtils package. The following example will create a brand new
pas
object with up-to-the-minute data:
NOTE: This can take up to a minute to process.
library(AirSensor) library(dplyr) library(ggplot2) # Define PURPLE_AIR_API_READ_KEY in a .gitignore protected file source("global_vars.R") setAPIKey("PurpleAir-read", PURPLE_AIR_API_READ_KEY) # Initialize spatial data processing library(MazamaSpatialUtils) initializeMazamaSpatialUtils() # Create a 'pas' object with current data pas <- pas_createNew( countryCodes = "US", stateCodes = "CA", counties = "Los Angeles" )
It is also possible to load pre-generated pas
objects from a data archive.
These objects are updated regularly throughout each day and are typically used
by other package functions primarily for the location metadata they contain.
Archived pas
objects from previous days will thus have data associated with
near midnight of that date.
The archived pas
objects can be loaded very quickly with the pas_load()
function which obtains pas
objects from the archive specified with
setArchvieBaseUrl()
. When used without specifying the datestamp
argument,
pas_load()
will obtain the most recently processed pas
object -- typically
less than an hour old.
# Load packages library(AirSensor) library(dplyr) library(ggplot2) # Set location of pre-generated data files setArchiveBaseUrl("https://airfire-data-exports.s3-us-west-2.amazonaws.com/PurpleAir/v1") # Load the most recent archived 'pas' object pas <- pas_load()
The pas
dataset contains 45 columns, and each row corresponds to different
PurpleAir sensors. For the data analysis examples we will focus on the columns
labeled stateCode
, pm25_*
, humidity
, pressure
, temperature
, and
pwfsl_closestDistance
.
The complete list of columns is given below. Names in ALL_CAPS
have been
retained from the PurpleAir .json file. Other columns have been renamed for
human readability.
names(pas)
Let's take a quick peek at some of the PM2.5 data:
# Extract and round just the PM2.5 data pm25_data <- pas %>% select(starts_with("pm2.5_")) %>% round(1) # Combine sensor label and pm2.5 data bind_cols(label = pas$locationName, pm25_data) %>% head(10) %>% knitr::kable( col.names = c("location name", "10 min", "30 min", "1 hr", "6 hr", "1 day", "1 wk"), caption = "PAS PM2.5 Values" )
pas
PM2.5 DataTo visually explore a region, we can use our pas
data with the pas_leaflet()
function to plot an interactive leaflet map.
By default, pas_leaflet()
will map the coordinates of each PurpleAir sensor
and the hourly PM2.5 data. Clicking on a sensor will show sensor metadata.
pas %>% pas_leaflet(parameter = "pm2.5_60minute")
If we want to narrow our selection, for example to California, we can look at which
locations have a moderate to unhealthy 6-hour average air quality rating with
the following short script that uses the %>%
"pipe" operator:
pas %>% pas_filter(pm2.5_6hour >= 15.0) %>% pas_leaflet(parameter = "pm2.5_6hour")
This code pipes our pas
data into pas_filter()
where we can set our
selection criteria. The stateCode
is the ISO 3166-2 state code,
which tells pas_filter()
to subset for only those station sin California.
The pm25_6hr > 25.0
filter selects those records where the 6-hour average is
above 25.0. The final function in the pipeline plots the remaining sensors
colored by pm25_6hr
.
pas
Auxiliary DataWe can also explore and utilize other PurpleAir sensor data. Check the
pas_leaflet()
documentation for all supported parameters.
Here is an example of humidity data captured from PurpleAir sensors across the state of California.
pas %>% pas_leaflet(parameter = "humidity")
Happy Exploring!
Mazama Science
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.