knitr::opts_chunk$set(fig.width = 7, fig.height = 5)
This vignette explores the mts_monitor data model used throughout the AirMonitor package to store and work with monitoring data.
The AirMonitor package is designed to provide a compact, full-featured suite of utilities for working with PM2.5 data. A uniform data model provides consistent data access across monitoring data available from different agencies. The core data model in this package is defined by the mts_monitor object used to store data associated with groups of individual monitors.
To work efficiently with the package it is important that you understand the structure
of this data object and the functions that operate on it. Package functions whose
names begin with monitor_
, expect objects of class mts_monitor as their first
argument. ('mts' stands for 'Multiple Time Series')
The AirMonitor package uses the mts data model defined in the MazamaTimeSeries package.
In this data model, each unique time series is referred to as a
"device-deployment" -- a time series collected by a particular device at a
specific location. Multiple device-deployments are stored in memory as a
mts_monitor object, typically called monitor
. Each monitor
is just an \pkg{R}
list with two dataframes.
monitor$meta
-- rows = unique device-deployments; cols = device/location metadata
monitor$data
-- rows = UTC times; cols = device-deployment data (plus an additional datetime
column)
A key feature of this data model is the use of the deviceDeploymentID
as a
"foreign key" that allows data
columns to be mapped onto the associated
spatial and device metadata in a meta
row. The following will always be true:
identical(names(monitor$data), c('datetime', monitor$meta$deviceDeploymentID))
Each column of monitor$data
represents a time series associated with a particular
device-deployment while each row of monitor$data
represents a synoptic snapshot of all
measurements made at a particular time.
In this manner, software can create both time series plots and maps from a single
monitor
object in memory.
The data
dataframe contains all hourly measurements organized with rows
(the 'unlimited' dimension) as unique timesteps and columns as unique
device-deployments. The very first column is always named datetime
and contains
the POSIXct
datetime in Coordinated Universal Time (UTC). This time axis is
guaranteed to be a regular hourly axis with no gaps.
The meta
dataframe contains all metadata associated with device-deployments
and is organized with rows as unique device-deployments and columns containing
both location and device metadata. The following columns are guaranteed to exist
in the meta
dataframe. Those marked with "(optional)" may contain NA
s.
Additional columns may also be present depending on the data source.
deviceDeploymentID
-- unique ID associated with a time seriesdeviceID
-- unique location IDdeviceType
-- (optional) device typedeviceDescription
-- (optional) human readable device descriptiondeviceExtra
-- (optional) additional human readable device information pollutant
-- pollutant name from AirMonitor::pollutantNames
units
-- one of "PPM|PPB|UG/M3"
dataIngestSource
-- (optional) source of datadataIngestURL
-- (optional) URL used to access datadataIngestUnitID
-- (optional) instrument identifier used at dataIngestSource
dataIngestExtra
-- (optional) human readable data ingest informationdataIngestDescription
-- (optional) human readable data ingest instructionslocationID
-- unique location ID from MazamaLocationUtils::location_createID()
locationName
-- human readable location namelongitude
-- longitudelatitude
-- latitudeelevation
-- (optional) elevationcountryCode
-- ISO 3166-1 alpha-2 country codestateCode
-- ISO 3166-2 alpha-2 state codecountyName
-- US county nametimezone
-- Olson time zonehouseNumber
-- (optional) street
-- (optional)city
-- (optional)zip
-- (optional)AQSID
-- (optional) EPA AQS unique identifierfullAQSID
-- (optional) EPA AQS unique identifierExample 1: Exploring mts_monitor objects
We will use the built-in "NW_Megafires" dataset and various monitor_filter~()
functions to subset a mts_monitor object which we then examine.
library(AirMonitor) # Recipe to select Washington state monitors in August of 2014: monitor <- # 1) start with NW Megafires NW_Megafires %>% # 2) filter to only include Washington state monitor_filter(stateCode == "WA") %>% # 3) filter to only include August monitor_filterDate(20150801, 20150901) %>% # 4) remove monitors with all missing values monitor_dropEmpty() # 'mts_monitor' objects can be identified by their class class(monitor) # They alwyas have two elements called 'meta' and 'data' names(monitor) # Examine the 'meta' dataframe dim(monitor$meta) names(monitor$meta) # Examine the 'data' dataframe dim(monitor$data) # This should always be true identical(names(monitor$data), c('datetime', monitor$meta$deviceDeploymentID))
Example 2: Basic manipulation of mts_monitor objects
The AirMonitor package has numerous functions that work with
mts_monitor objects, all of which begin with monitor_
. If you need to do
something that the package functions do not provide, you can manipulate
mts_monitor objects directly as long as you retain the structure of the data
model.
Functions that accept and return mts_monitor objects include:
monitor_aqi()
monitor_collapse()
monitor_combine()
monitor_dailyStatistic()
monitor_dailyThreshold()
monitor_dropEmpty()
monitor_filter()
( aka monitor_filterMeta()
)monitor_filterByDistance()
monitor_filterDate()
monitor_filterDatetime()
monitor_mutate()
monitor_nowcast()
monitor_replaceValues()
monitor_select()
( aka monitor_reorder()
)monitor_selectWhere()
monitor_trimDate()
These functions can be used with the magrittr package pipe operator (%>%
)
as in the following example:
# First, Obtain the monitor ids by clicking on dots in the interactive map: NW_Megafires %>% monitor_leaflet()
# Calculate daily means for the Methow Valley from monitors in Twisp and Winthrop TwispID <- "99a6ee8e126ff8cf_530470009_04" WinthropID <- "123035bbdc2bc702_530470010_04" # Recipe to calculate Methow Valley August Means: Methow_Valley_AugustMeans <- # 1) start with NW Megafires NW_Megafires %>% # 2) select monitors from Twisp and Winthrop monitor_select(c(TwispID, WinthropID)) %>% # 3) average them together hour-by-hour monitor_collapse(deviceID = 'MethowValley') %>% # 4) restrict data to August monitor_filterDate(20150801, 20150901) %>% # 5) calculate daily mean monitor_dailyStatistic(mean, minHours = 18) %>% # 6) round data to one decimal place monitor_mutate(round, 1) # Look at the first week Methow_Valley_AugustMeans$data[1:7,]
Example 3: Advanced manipulation of mts_monitor objects
The following code demonstrates user creation of a custom function to
manipulate the data
tibble from a mts_monitor object with monitor_mutate()
.
# Monitors within 100 km of Spokane, WA Spokane <- NW_Megafires %>% monitor_filterByDistance(-117.42, 47.70, 100000) %>% monitor_filterDate(20150801, 20150901) %>% monitor_dropEmpty() # Show the daily statistic for one week Spokane %>% monitor_filterDate(20150801, 20150808) %>% monitor_dailyStatistic(mean) %>% monitor_getData() # Custom function to convert from metric ug/m3 to imperial grain/gallon my_FUN <- function(x) { return( x * 15.43236 / 0.004546 ) } Spokane %>% monitor_filterDate(20150801, 20150808) %>% monitor_mutate(my_FUN) %>% monitor_dailyStatistic(mean) %>% monitor_getData()
Understanding that monitor$data
is a just a dataframe of measurements
prepended with a datetime
column, we can pull out the measurements and do analyses
independent of the mts_monitor data model. Here we look for correlations among
the PM2.5 time series.
# Pull out the time series data to calculate correlations Spokane_data <- Spokane %>% monitor_getData() %>% dplyr::select(-1) # omit 'datetime' column # Provide human readable names names(Spokane_data) <- Spokane$meta$locationName # Find correlation among monitors cor(Spokane_data, use = "complete.obs")
This introduction to the mts_monitor data model should be enough to get you started. Lots more examples are available in the package documentation.
Best of luck exploring and understanding PM 2.5 air quality data!
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.