In nabilabd/hybridSA: Implemnents Hybrid Source Apportionment

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "figures/README-"
)

Hybrid Source Apportionment

This is a package implementing a novel method for source apportionment of fine particulate matter(i.e., PM2.5). And an attempt at, um, actual science.

Documentation is still (clearly) ongoing. Stay tuned!

Installation

To install the package, you can run the following commands:

library(devtools)
install_github("nabilabd/hybridSA")

Motivation

Observation data at monitoring sites is typically highly sparse, spatially. On the other hand, simulated concentrations of particulate matter, while perhaps much more dense, might suffer from certain occasional numerical inaccuracies. But, by combining both, it might be possible to produce estimates even better than either method individually.

Introduction

suppressPackageStartupMessages({
  library(dplyr)
  library(ggplot2)
  library(magrittr)
})
load("data/csn_site_index2.rda")

This package is largely based around two functions: hybridsa, and get_optim.

Data

There are two inputs for the hybrid optimization:

Observed concentrations
Simulated concentrations

The observed concentrations consists of speciated PM2.5 (i.e., a breadown of particulate matter into different chemical elements it consists of or comes from) as well as PM2.5 mass. This values are obtained from the EPA Air Quality Service (AQS) (under: "Particulates"). In data-raw is to be a script that downloads and consolidates the concentrations of these substances for 2005-2012 into a single file. Elaboration on how this data was obtained, how uncertainty calculations are performed, and more, is to be included in a forthcoming vignette.

states <- map_data("state") %>% tbl_df

us_base <- geom_path(data=states, aes(long, lat, group = group, order = order))

csn_site_index2 %>% 
  ggplot(aes(LON, LAT)) + geom_point() + 
  geom_point(color = "green", size = 1.5)  + 
  us_base + coord_map() + theme_bw() + ggtitle("CSN Monitoring Network - Sites Used")

The simulated concentrations are generated from CMAQ modeling. Each value corresponds to the daily average taken over a 36 x 36km grid, and is associated with the center of that grid cell. The entire spatial domain extends beyond the contiguous US, and consists of a 112 x 148 elements.

source("R/make_grid.R")
source("R/spatial_grid.R")
library(sp)
spatial_grid <- make_grid()

ggplot(spatial_grid, aes(long, lat)) + geom_point(size = .025) + 
  us_base + coord_map() + ggtitle("CMAQ Values Spatial Domain") + theme_bw()

Tutorials

Since working with spatial (and spatio-temporal) data in R might seem intimidating, especially for those with less exposure to the language, I'm in the process of writing some tutorials, to help ease the learning curve, and make some the code used in this package more readily accessible. So far, there is:

References

Here are two papers which used an earlier version of the approach presented here:

This paper was also useful for background on methods used and some of the motivation:

Source apportionment of PM2.5 in the southeastern United States by Marmur et. al

Although this paper used a much smaller dataset than the first two, it makes up for it by presenting more of the intuition and rationale for incorporating certain components of the present implemenation.

nabilabd/hybridSA documentation built on May 23, 2019, 12:03 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com