Loading Monitoring Data

knitr::opts_chunk$set(fig.width = 7, fig.height = 5)
knitr::opts_chunk$set(error = TRUE)
library(dplyr)
library(PWFSLSmoke)
library(magrittr)

This article describes pre-generated ws_monitor objects available from a USFS maintained archive and how to load them with various monitor_load~() functions.

Pre-Generated Data Files

The USFS Pacific Wildland Fire Sciences Lab AirFire team works to model wildland fire emissions and maintains an up-to-date database of data from monitors using data from regulatory-type monitors (FRM or FEM).

Instruments include permanent monitors that form part of the AirNow network as well as temporary monitors that are deployed during wildfire events. Quality controlled data from permanent monitors comes directly from AirNow. Data from temporary monitors is provided in raw form from either AIRSIS or the Western Regional Climate Center WRCC and is aggregated and quality controlled by AirFire in real-time.

As of February, 2021, pre-generated ws_monitor objects containing this data are publicly available at https://haze.airfire.org/monitoring/.

Four types of archive files — hourly, daily, monthly, and annual

The data archive is maintained by AirFire and, for each data provider, the archive offers datasets that are updated in real-time and on hourly-, daily-, and monthly schedules. The following types of files are available:

Hourly files -- These "real-time" files are updated multiple times an hour (~ every two minutes) and contain data for the previous 10 days. They are used to generate graphics for the Monitoring v4 site.

Daily files -- Daily files are updated once a day, shortly after midnight and contain data for the previous 45 days. The time span is long enough to contain data for a previous month until 10-days into the next month at which point AirNow stops backfilling data that straggles in due to power outages, etc.

Monthly files -- Monthly files are updated within a few days after the 10'th of the following month after data updates have ceased. For instance, a July file will be updated in mid-August.

Annual files -- Annual files are updated mid-month, after the monthly file update, by appending the just-updated monthly file (covering the previous month) to the growing annual file. Thus, an annual will always cover the period from the beginning of the year to the end of the last month.

NOTE: The date and time of annual and monthly updates are subject to change. Exploring the data archive with a web browser, the exact date and time is seen next to file of interest, under the column Last modified. For instance, in the WRCC 2017 archive you can see that the file wrcc_PM2.5_2021.RData was generated on February 14, 2021 at 01:24 am.

Archive Structure

The AirFire maintained data archive uses a consistent and predictable structure:

├── AIRSIS
│   └── RData
│       ├── 2004
│       ├── ...
│       └── 2021
├── AirNow
│   └── RData
│       ├── 2016
│       ├── ...
│       ├── 2020
│       │   ├── 01
│       │   ├── ...
│       │   └── 12
│       └── 2021
│           ├── 01
│           └── 02
├── EPA
│   └── RData
│       ├── 1998
│       ├── ...
│       └── 2017
├── WRCC
│   └── RData
│       ├── 2010
│       ├── ...
│       └── 2021
└── latest
    └── RData

Top level -- There is a dedicated folder for each agency (i.e AIRSIS/, AirNow/, WRCC/). Additionally, at this level, latest data (hourly and daily updates) for all agencies is grouped together and can be quickly accessed from the latest/RData/ subdirectory.

Second level -- Each agency has the RData/ folder (e.g. AirNow/RData/) where all R binary files are stored. Other data types like csv/ and geojson/ may also exist.

Third level -- Each data provider has annual folders (e.g. AirNow/RData/2021/) and a latest/ folder (e.g. AirNow/RData/latest/). The latter contains hourly (e.g. airnow_PM2.5_latest10.RData) and daily (e.g. airnow_PM2.5_latest45.RData) updates (this a just copy of the same data found in latest/RData/).

Fourth level -- Each data provider has annual datasets (e.g. airnow_PM2.5_2021.RData). Additionally, because of the size of the files, AirNow has monthly folders (e.g. AirNow/RData/2021/01/), and therefore a fifth level containing monthly datasets (e.g. airNow_PM2.5_2021_01.RData).

NOTE: No annual data files exist for AirNow prior to 2016.

Direcly loading datasets

With knowledge of the structure of the data archives, it is possible to load datasets directly from the data archive with base R functions by using get(load(url(…))). For instance, if you want to load the latest data (hourly updates) from AirNow, you can just type:

AirNow_hourly <- get(load(url("https://haze.airfire.org/monitoring/latest/RData/airnow_PM2.5_latest10.RData")))

Loading functions

For each agency (AirNow, AIRSIS, WRCC), the PWFSLSmoke package offers three loading functions:

~_loadLatest() - for hourly updates covering the most recent 10 days

airnow_loadLatest(
parameter = 'PM2.5',
baseUrl = "https://haze.airfire.org/monitoring/latest/RData",
dataDir = NULL) 

~_loadDaily()` - for daily updates covering the most recent 45 days

airnow_loadDaily(
parameter = 'PM2.5',
baseUrl = "https://haze.airfire.org/monitoring/latest/RData",
dataDir = NULL) 

~_loadAnnual() - for data extended more than 45 days into the past

airnow_loadAnnual(
year = NULL,
parameter = 'PM2.5',
baseUrl = "https://haze.airfire.org/monitoring",
dataDir = NULL) 

These functions are typically used with default settings for parameter, baseUrl and dataDir:

airnow_latest <- airnow_loadLatest()
wrcc_daily <- wrcc_lodDaily()

NOTE: If dataDir is specified, data will be preferentially loaded from a local directory rather than from baseUrl.

epa_load()

The epa_load() and epa_loadAnnual() functions have an additional parameterCode value which must be specified. EPA parameter codes include:

monitor_load()

monitor_load(
  startdate = NULL,
  enddate = NULL,
  monitorIDs = NULL,
  parameter = "PM2.5",
  baseUrl = "https://haze.airfire.org/monitoring",
  dataDir = NULL,
  aqsPreference = "airnow"
)

The preferred function for general data loading is monitor_load(), which loads all available monitoring data for a given time range. Data from AirNow, AIRSIS and WRCC are combined into a single ws_monitor object. Archival datasets are joined with 'daily' and 'latest' datasets as needed to satisfy the requested date range.

NOTE: Because monitor_load() performs a lot of validation tasks as it joins annual, monthly and recent data, it is recommended that you load no more than ~3 months of consecutive data using this function. Longer timeseries can be obtained using the _loadAnnual() functions.

NOTE: the PWFSLSmoke package does not yet create time series crossing year boundaries because “joining” timeseries together can take a long time. This capability will be enabled at some point in the future.

Examples using PWFSLSmoke loading functions

Timeseries plot using the latest data for Washington State

# Recipe to get latest AirNow data for WA
airnow_wa <-
  airnow_loadLatest() %>%
  monitor_subset(stateCodes= 'WA')

# Timeseries plot
monitor_timeseriesPlot(airnow_wa, style = 'gnats')
addAQIStackedBar()
addAQILines()
title("Washington State Latest AirNow Data")

Timeseries plot using 2015 annual data for the Pacific Northwest.

# EPA FRM/FEM data
epa <-
  epa_loadAnnual(2015, parameterCode = "88101") %>%
  monitor_subset(
    stateCodes = c("OR", "ID", "WA"),
    tlim = c(20150701, 20151001)
  )

# AIRSIS temporary monitors
airsis <-
  airsis_loadAnnual(2015) %>%
  monitor_subset(
    stateCodes = c("OR", "ID", "WA"),
    tlim = c(20150701, 20151001)
  )

# Combine EPA and AIRSIS data into a single ws_monitor object
pnw <-
  monitor_combine(list(epa, airsis)) %>%
  monitor_nowcast()

# Timeseries plot using NowCast smoothing
monitor_timeseriesPlot(pnw, style = 'gnats')
addAQIStackedBar()
addAQILines()
title("Pacific Northwest 2015 Megafires -- NowCast Smoothed")

Best of luck gathering data for your research!



Try the PWFSLSmoke package in your browser

Any scripts or data that you put into this service are public.

PWFSLSmoke documentation built on Nov. 23, 2021, 5:06 p.m.