knitr::opts_chunk$set(echo = TRUE)
options(knitr.table.format = "html")

library(PWFSLSmoke)

Introduction

In this vignette, we take a look at how the PWFSLSmoke package works with generic air quality data, given a data source and some basic configuration details telling PWFSLSmoke how to interpret the data.

Downloading Data

Determining an Acceptable Data Format

First, we need to acquire some example data. AIRSIS provides access to raw data in a *.csv format, one example of which can be downloaded here. These AIRSIS files aren't yet in a format generic_parseData() accepts, so we do some basic data cleaning to ensure each row corresponds to one observation.

## Note:
#  This section should only be run if a pre-cleaned version of the AIRSIS
#  example data does not already exist.

rawData <- readr::read_csv("localData/airsis_ebam_example-raw.csv")

fillCols <- list("Latitude", "Longitude", "Sys. Volts")

# The raw data consists of data records with no location information interspersed
# with 'GPS' records that have only location and voltage. This next step does
# the following:
#  1) replicate data associated with fillCols so that location information exists
#     in every row
#  2) remove any rows that are missing the "Type" field found in every data record

# NOTE:  '!!!' is tidy unquoting for lists
cleanData <- rawData %>%
  tidyr::fill(!!!fillCols, .direction = "down") %>%
  tidyr::fill(!!!fillCols, .direction = "up") %>%
  filter(!is.na(Type))

# Wow! Using tidyverse packages makes for very expressive code!

readr::write_csv(cleanData, "localData/airsis_ebam_example-clean.csv")

In general, PWFSLSmoke expects generic data to adhere to a few specific rules:

Get the Data

generic_downloadData() accepts a file path an argument. This path can be either a local file path or a remote connection (such as http://, https://, ftp://, or ftps://). For this example, we will use a local file.

filePath <- filePath <- system.file(
  "extdata", "generic_data_example.csv",
  package = "PWFSLSmoke", 
  mustWork = TRUE
)

fileString <- generic_downloadData(filePath)

The output of this function is the file, stored as a single string ready to be parsed by generic_parseData().

Configuration List Details:

configList can be either an R list or a JSON file, but both must follow the same nested key-value format:

 configList
  |_ column_names
  |   |_ datetime = [string],
  |   |_ pm25 = [string]
  |
  |_ station_meta
  |   |_ latitude = [numeric],
  |   |_ longitude = [numeric],
  |   |_ site_name = [optional | string]
  |
  |_ parsing_info
      |_ header_rows = [integer]
      |_ column_types = [string],
      |_ datetime_format = [string],
      |_ tz = [optional | string | default: "UTC"],
      |_ decimal_mark = [optional | string | default: "."],
      |_ grouping_mark = [optional | string | default: ","],
      |_ delimiter = [optional | string | default: ","],
      |_ encoding = [optional | string | default: "UTF-8"]

Required parameters are the top level keys of the configuration list. Each parameter alters a specific aspect of parsing:

Extra keys in columnNames and stationMeta are currently ignored.

Define a Configuration List

The configuration list can be read from a JSON file, or created as a list directly.

# configListPath <- system.file(
#   "extdata", "generic_configList_example.json",
#   package = "PWFSLSmoke",
#   mustWork = TRUE
# )
# configList <- jsonlite::fromJSON(configPath)

configList <- list(
  column_names = list(
    datetime = "Date/Time/GMT",
    pm25 = "ConcHr"
  ),
  station_meta = list(
    latitude = 35.75506,
    longitude = -118.4175,
    site_name = "1026 Kernville EBAM"
  ),
  parsing_info = list(
    header_rows = 5,
    column_types = "ccdddddddddddcdcc",
    datetime_format = "mdy T",
    tz = "UTC",
    delimiter = ","
  )
)

Parse Data

parsedData <- generic_parseData(fileString, configList)


MazamaScience/PWFSLSmoke documentation built on July 3, 2023, 11:03 a.m.