knitr::opts_chunk$set(
  dpi = 150,
  collapse = TRUE,
  comment = "#>",
  out.width = "100%",
  fig.width = 8,
  warning = FALSE,
  message = FALSE
)

The rcaaqs package strives to automate the calculation of CAAQS metrics outlined by the Canadian Council of Ministers of the Environment. All calculations are derived from the Guidance Document on Achievement Determination Canadian Ambient Air Quality Standards for Fine Particulate Matter and Ozone and Methods for SO~2~ and NO~2~ outlined by Environmental Reporting BC

See the section "Under the hood" for a better idea of how rcaaqs performs the underlying calculations.

CAAQS Metrics

There are 7 CAAQS metrics for 4 different pollutants, in the rcaaqs package each has a corresponding function:

While the underlying calculations are different, in general the procedure to calculate any given metric is similar. Therefore we will focus on Annual PM2.5.

Let's start by loading some helpful packages

# For plotting
library(ggplot2)  

# For data manipulation
library(dplyr)    
library(tidyr)

# For calculations
library(rcaaqs)

The rcaaqs package contains several sample data sets, let's take a look at the Ozone (O~3~) sample data:

#head(o3_sample_data)
head(pm25_sample_data)

Data Preparation

The clean_neg() and date_fill() are useful helper functions for ensuring all the pollutant values are positive or zero, and that you are starting with a complete dataframe (i.e. no missing hourly readings):

pm25_sample_data_clean <- pm25_sample_data %>% 
  mutate(value = clean_neg(value, type = "pm25"))

pm25_sample_data_clean <- pm25_sample_data_clean %>% 
  group_by(ems_id, site) %>% 
  do(., date_fill(., date_col = "date_time",
                  fill_cols = c("ems_id", "site"),
                  interval = "1 hour")) %>% 
  ungroup()

head(pm25_sample_data_clean)

Calculate Ambient CAAQS Metric & CAAQS Achievement

Calculating a CAAQS metric is done using one of the 7 the caaqs_metrics functions. Note you must include grouping variables (i.e. station information in our case here):

pm <- pm_annual_caaqs(pm25_sample_data_clean, 
                     by = c("ems_id", "site"))
pm

The resulting object is of class caaqs plus an additional class denoting the parameter analyzed:

class(pm)

The object contains all of the intermediate analysis results as well as the final caaqs results. There are accessor/extractor functions to get the different components. See ?get_caaqs for the different functions. We can get the caaqs results using the get_caaqs() function:

get_caaqs(pm)

Besides the actual CAAQS metric values (metric_value), there are several important columns in the caaqs results dataframe.

CAAQS details

CAAQS values

Data completeness

Note that in all cases values of NA indicate that flagging didn't apply to this metric (see CAAQS Documentation for more details)

Calculate Management CAAQS Metric Values and Management Levels

Prior to determining management levels, jurisdictions have the option of adjusting their air zone metric values to remove external influences over which jurisdictions have little or no control, such as those related to transboundary flows and exceptional events. These arrangements aim to ensure that jurisdictions are responsible for managing only the emissions sources they can control.

Excluding Dates based on Exceptional Events or Transboundary Flows

First let's take a look at the daily averages. To do this, we need to get the intermediate daily analysis from the CAAQS results useing the extractor functions:

pm_daily_avg <- get_daily(pm)
pm_daily_avg
ggplot(data = pm_daily_avg, aes(x = date, y = avg_24h)) +
  geom_line() +
  facet_wrap(~ site)

We can also plot the time series with CAAQS results using the built-in plot_ts function, providing an id (and the column in which to look) of the station we wish to plot:

plot_ts(pm, id = "0310172", id_col = "ems_id", rep_yr = 2013)

Let's say, for illustration purposes, that all high values in the summer of 2012 were due to forest fires and we want to exclude them.

We'll filter() the daily data to just these dates and select() only the columns that reflect the grouping of our data and the dates.

high_dates <- pm_daily_avg %>%
  filter(date >= as.Date("2012-05-01"),
         date < as.Date("2012-10-01"),
         avg_24h > 10) %>%
  select(ems_id, site, date)
high_dates

Determine Air Quality Management System Management Levels Now we can pass this excluded data on to the caaqs_management()function via the exclude_df argument, which specifies the data frame, and the exclude_df_dt which specifies the name of the column holding the dates.

pm_excl <- caaqs_management(pm, exclude_df = high_dates, exclude_df_dt = "date")
pm_excl
pm_excl_caaqs <- get_caaqs(pm_excl)
pm_excl_caaqs

Note that there are some new columns in the caaqs plus managament results data frame:

We can also define excluded dates with date ranges, but here we must specify both start and end column names:

high_dates <- data.frame(ems_id = "0310162",
                         site = "Port Moody Rocky Point Park",
                         start = as.Date(c("2012-06-11", "2012-07-01")),
                         end = as.Date(c("2012-06-25", "2012-08-14")))
high_dates

pm_excl <- caaqs_management(pm, exclude_df = high_dates,
                            exclude_df_dt = c("start", "end"))

pm_excl_caaqs <- get_caaqs(pm_excl)
pm_excl_caaqs

Under the hood

To understand what's happening within the CAAQS functions we can take a look at this function flow chart:

knitr::include_graphics(system.file('function_overview.png', package = 'rcaaqs'))

Each step from the documentation has a corresponding function (or functions) in the rcaaqs package, but most of these are hidden. The green boxes reflect the wrapper CAAQS functions that perform all the underlying steps. For example, the o3_caaqs() function performs all the steps outlined in the second green column.

Airzones

We can match our data to an airzone by using the assign_airzones() function but you must have the following:

For the following examples, we'll use our pm CAAQS data, but only for 2013. To get the management levels, we must run caaqs_management() on the caaqs object, even if we are not adding a data frame of dates to exclude.

pm <- caaqs_management(pm)
pm

pm_caaqs <- get_caaqs(pm) %>%
  filter(caaqs_year == 2013)
pm_caaqs

We will get our map file from the bcmaps package (note that you must also install the bcmaps.rdata package).

library(bcmaps)
az <- bcmaps::airzones()
class(az)
az

For the purposes of this vignette, we'll get our stations coordinates from a file included in the package. You can obtain the most up to date stations data from the BC Data Catalogue.

stations_file <- system.file("air_stations_2018-11-07.csv", package = "rcaaqs")

stations <- readr::read_csv(stations_file, na = "N/A") %>%  # Read csv file
  rename_all(tolower) %>%                                   # Rename columns to lower case
  filter(ems_id %in% unique(pm_caaqs$ems_id)) %>%           # Filter to ids in our pm data
  select(ems_id, latitude, longitude) %>%                   # Select only the important columns
  mutate(latitude = as.numeric(latitude),                   # Transform lat and lon to numeric
         longitude = as.numeric(longitude)) %>%
  distinct()                                                # Remove any duplicates

stations

Now we'll merge them into our CAAQS data

pm_caaqs <- left_join(pm_caaqs, stations, by = "ems_id")
pm_caaqs

Now we have our airzones map and our station coordinates, we can match stations to airzones:

pm_az <- assign_airzone(pm_caaqs, airzones = az, coords = c("longitude", "latitude"))
pm_az
pm_az2 <- airzone_metric(pm_az)
pm_az2

Mapping airzones

library(sf)
library(gridExtra)

az_plot <- st_transform(az, crs = "+proj=longlat") %>%
  left_join(pm_az2, by = c("Airzone" = "airzone"))

g1 <- ggplot() +
  geom_sf(data = az_plot, aes(fill = caaqs_ambient)) +
  geom_sf(data = st_as_sf(pm_az, coords = c("lon", "lat"), crs = "+proj=longlat"),
          shape = 21, colour = "black", aes(fill = caaqs_ambient)) +
  scale_fill_manual(values = get_colours(type = "achievement"), drop = FALSE) +
  labs(title = "Achievement Status: PM2.5 Annual CAAQS") +
  theme(legend.position = c(0.85, 0.85))

g2 <- ggplot() +
  geom_sf(data = az_plot, aes(fill = mgmt_level)) +
  geom_sf(data = st_as_sf(pm_az, coords = c("lon", "lat"), crs = "+proj=longlat"),
          shape = 21, colour = "black", aes(fill = mgmt_level)) +
  scale_fill_manual(values = get_colours(type = "management"), drop = FALSE) +
  labs(title = "Management Status: PM2.5 Annual CAAQS") + 
  theme(legend.position = c(0.85, 0.85))

grid.arrange(g1, g2, ncol = 1)


bcgov/rcaaqs documentation built on Dec. 12, 2023, 9:21 a.m.