In MazamaScience/PWFSLSmoke: Utilities for Working with Air Quality Monitoring Data

knitr::opts_chunk$set(echo = TRUE)
options(knitr.table.format = "html")

library(PWFSLSmoke)

Introduction

In this vignette, we take a look at how the PWFSLSmoke package works with generic air quality data, given a data source and some basic configuration details telling PWFSLSmoke how to interpret the data.

Downloading Data

Determining an Acceptable Data Format

First, we need to acquire some example data. AIRSIS provides access to raw data in a *.csv format, one example of which can be downloaded here. These AIRSIS files aren't yet in a format generic_parseData() accepts, so we do some basic data cleaning to ensure each row corresponds to one observation.

## Note:
#  This section should only be run if a pre-cleaned version of the AIRSIS
#  example data does not already exist.

rawData <- readr::read_csv("localData/airsis_ebam_example-raw.csv")

fillCols <- list("Latitude", "Longitude", "Sys. Volts")

# The raw data consists of data records with no location information interspersed
# with 'GPS' records that have only location and voltage. This next step does
# the following:
#  1) replicate data associated with fillCols so that location information exists
#     in every row
#  2) remove any rows that are missing the "Type" field found in every data record

# NOTE:  '!!!' is tidy unquoting for lists
cleanData <- rawData %>%
  tidyr::fill(!!!fillCols, .direction = "down") %>%
  tidyr::fill(!!!fillCols, .direction = "up") %>%
  filter(!is.na(Type))

# Wow! Using tidyverse packages makes for very expressive code!

readr::write_csv(cleanData, "localData/airsis_ebam_example-clean.csv")

In general, PWFSLSmoke expects generic data to adhere to a few specific rules:

Each row corresponds to an observation at a single point in time
All of the required columns from the configuration list exist
Header rows only exist at the top of the file

Get the Data

generic_downloadData() accepts a file path an argument. This path can be either a local file path or a remote connection (such as http://, https://, ftp://, or ftps://). For this example, we will use a local file.

filePath <- filePath <- system.file(
  "extdata", "generic_data_example.csv",
  package = "PWFSLSmoke", 
  mustWork = TRUE
)

fileString <- generic_downloadData(filePath)

The output of this function is the file, stored as a single string ready to be parsed by generic_parseData().

Configuration List Details:

configList can be either an R list or a JSON file, but both must follow the same nested key-value format:

 configList
  |_ column_names
  |   |_ datetime = [string],
  |   |_ pm25 = [string]
  |
  |_ station_meta
  |   |_ latitude = [numeric],
  |   |_ longitude = [numeric],
  |   |_ site_name = [optional | string]
  |
  |_ parsing_info
      |_ header_rows = [integer]
      |_ column_types = [string],
      |_ datetime_format = [string],
      |_ tz = [optional | string | default: "UTC"],
      |_ decimal_mark = [optional | string | default: "."],
      |_ grouping_mark = [optional | string | default: ","],
      |_ delimiter = [optional | string | default: ","],
      |_ encoding = [optional | string | default: "UTF-8"]

Required parameters are the top level keys of the configuration list. Each parameter alters a specific aspect of parsing:

"column_names" contains entries for mapping column names from the input file to ws_monitor columns names needed to built a ws_monitor object.
"station_meta" contains entries for metadata about the station that generated the input file, which allows PWFSLSmoke to augment the ws_monitor object later in the pipeline. Some entries are required, others are optional.
"parsing_info"

Extra keys in columnNames and stationMeta are currently ignored.

Define a Configuration List

The configuration list can be read from a JSON file, or created as a list directly.

# configListPath <- system.file(
#   "extdata", "generic_configList_example.json",
#   package = "PWFSLSmoke",
#   mustWork = TRUE
# )
# configList <- jsonlite::fromJSON(configPath)

configList <- list(
  column_names = list(
    datetime = "Date/Time/GMT",
    pm25 = "ConcHr"
  ),
  station_meta = list(
    latitude = 35.75506,
    longitude = -118.4175,
    site_name = "1026 Kernville EBAM"
  ),
  parsing_info = list(
    header_rows = 5,
    column_types = "ccdddddddddddcdcc",
    datetime_format = "mdy T",
    tz = "UTC",
    delimiter = ","
  )
)

Parse Data

parsedData <- generic_parseData(fileString, configList)

MazamaScience/PWFSLSmoke documentation built on July 3, 2023, 11:03 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

MazamaScience/PWFSLSmoke
Utilities for Working with Air Quality Monitoring Data

In MazamaScience/PWFSLSmoke: Utilities for Working with Air Quality Monitoring Data

Introduction

Downloading Data

Determining an Acceptable Data Format

Get the Data

Configuration List Details:

Define a Configuration List

Parse Data

R Package Documentation

Browse R Packages

We want your feedback!

MazamaScience/PWFSLSmoke Utilities for Working with Air Quality Monitoring Data

In MazamaScience/PWFSLSmoke: Utilities for Working with Air Quality Monitoring Data

Introduction

Downloading Data

Determining an Acceptable Data Format

Get the Data

Configuration List Details:

Define a Configuration List

Parse Data

R Package Documentation

Browse R Packages

We want your feedback!

MazamaScience/PWFSLSmoke
Utilities for Working with Air Quality Monitoring Data