This vignette explores the
ws_monitor data model used throughout the PWFSLSmoke package to store
and work with monitoring data.
The PWFSLSmoke package is designed to provide a compact, full-featured suite of utilities for
working with PM 2.5 data used to monitor wildfire smoke. A uniform data model
provides consistent data access across monitoring data available from different agencies.
The core data model in this package is defined by the
ws_monitor object used to store data associated
with groups of individual monitors.
To work efficiently with the package it is important to understand the structure of this
data object and which functions operate on it. Package functions that begin with
expect objects of class
ws_monitor as their first argument. ('ws_' stands for 'wildfire smoke')
Monitoring data will typically be obtained from an agency charged with archiving data acquired at monitoring sites. For wildfire smoke, the primary pollutant is PM 2.5 and the sites archiving this data include EPA, AirNow, AIRSIS and WRCC.
The data model for monitoring data consists of an R
list with two dataframes:
data dataframe contains all hourly measurements organized with rows (the 'unlimited' dimension) as unique timesteps and
columns as unique monitor deployments. The very first column is always named
datetime and contains the
in Coordinated Universal Time (UTC).
meta dataframe contains all metadata associated with monitor deployment sites and is organized with
rows as unique monitor deployments and columns as site attributes. The following columns are guaranteed to exist
monitorID-- unique ID for each site-instrument combination
longitude-- decimal degrees East
latitude-- decimal degrees North
elevation-- meters above sea level
timezone-- Olson timezone
countryCode-- ISO 3166-1 alpha-2 code
stateCode-- ISO 3166-2 alpha-2 code
(The MazamaSpatialUtils package is used to assign timezones and state and country codes.)
Starting with version 1.0 of the package, the following additional columns (mostly for internal use) will always exist:
siteName-- familiar name for a monitoring site
countyName-- county/province name
msaName-- US Census Bureau 'Metropolitan/Micropolitan Statistical Area'
agencyName-- agency responsible for collecting the data
monitorType-- broad instrument categories for E-Sampler, EBAM or BAM-1020
siteID-- unique identifier for each site
instrumentID-- sequential identifier for each instrument at a single site
aqsID-- AQS site identifier (often used as the
pwfslID-- PWFSL site identifier (used as the
siteIDfor temporary monitors)
pwfslDataIngestSource-- identifier for the source of monitoring data (e.g. AIRNOW, AIRSIS, WRCC_DUMPFILE, etc.)
telemetryAggregator-- data provider for temporary monitors (e.g. 'wrcc' or 'usfs.airsis')
telemetryUnitID-- unique ID for each monitoring site used within the
These additional columns of information are much more variable and, depending on the source of data, may include many missing values.
It is important to note that the
monitorID acts as a unique key that connects data with metadata.
monitorID is used for column names in the
data dataframe and for row names in the
So the following will always be true:
rownames(ws_monitor$meta) == ws_monitor$meta$monitorID colnames(ws_monitor$data) == c('datetime',ws_monitor$meta$monitorID)
Example 1: Exploring
We will use the built-in "Northwest_Megafires" dataset and the
monitor_subset() function to subset a
which we can then explore.
suppressPackageStartupMessages(library(PWFSLSmoke)) # Get some airnow data for Washington state in the summer of 2015 # NOTE: 'tlim' is interpreted as UTC unless we specify 'timezone' N_M <- monitor_subset(Northwest_Megafires, tlim=c(20150801,20150831), timezone="America/Los_Angeles") WA <- monitor_subset(N_M, stateCodes='WA') # 'ws_monitor' objects can be identified by their class class(WA) # Examine the 'meta' dataframe dim(WA$meta) rownames(WA$meta) colnames(WA$meta) # Examine the 'data' dataframe dim(WA$data) colnames(WA$data) # This should always be true all(rownames(WA$meta) == colnames(WA$data[,-1]))
Example 2: Manipulating
The PWFSLSmoke package has numerous functions that can work with
all of which begin with
monitor_. If you need to do something that the package functions do not
provide you can manipulate
ws_monitor objects directly as long as you retain the structure
of the data model.
Functions that accept and return
ws_monitor objects include:
These functions can be used with the magrittr package
%>% pipe as in the following example:
# Calculate daily means for the Methow from monitors in Twisp and Winthrop TwispID <- '530470009_01' WinthropID <- '530470010_01' Methow_Valley_JulyMeans <- Northwest_Megafires %>% monitor_subset(monitorIDs=c(TwispID,WinthropID)) %>% monitor_collapse(monitorID='MethowValley') %>% monitor_subset(tlim=c(20150701,20150731), timezone='America/Los_Angeles') %>% monitor_dailyStatistic(minHours=18) # Look at the first week Methow_Valley_JulyMeans$data[1:7,]
The following code mixes use of package functions with direct manipulation of the
# Use special knowledge of AirNow IDs to subset airnow data for Spokane county monitors SpokaneCountyIDs <- N_M$meta$monitorID[stringr::str_detect(N_M$meta$monitorID, "^53063")] Spokane <- monitor_subset(N_M, monitorIDs=SpokaneCountyIDs) # Apply 3-hr rolling mean Spokane_3hr <- monitor_rollingMean(Spokane, 3, align="center") # 1) Replace data columns with their squares (exponentiation is not supplied by the package) Spokane_3hr_squared <- Spokane_3hr Spokane_3hr_squared$data[,-1] <- (Spokane_3hr$data[,-1])^2 # exclude the 'datetime' column # NOTE: Exponentiation is only used as an example. It does not generate a meaningful result. # Create a daily averaged 'ws_monitor' object Spokane_daily_3hr <- monitor_dailyStatistic(Spokane_3hr) # 2) Check out the correlation between monitors (correlation is not supplied by the package) data <- Spokane_daily_3hr$data[,-1] # exclude the 'datetime' column cor(data, use='complete.obs')
This introduction to the
ws_monitor data model should be enough to get you started.
Lots more documentation and examples are available in the package documentation.
Best of luck exploring and understanding PM 2.5 values associated with wildfire smoke!
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.