knitr::opts_chunk$set(echo = TRUE) options(knitr.table.format = "html") library(PWFSLSmoke)
In this vignette, we take a look at how the PWFSLSmoke
package works with
generic air quality data, given a data source and some basic configuration
details telling PWFSLSmoke
how to interpret the data.
First, we need to acquire some example data. AIRSIS provides access to raw data
in a *.csv
format, one example of which can be downloaded here.
These AIRSIS files aren't yet in a format generic_parseData()
accepts, so we
do some basic data cleaning to ensure each row corresponds to one observation.
## Note: # This section should only be run if a pre-cleaned version of the AIRSIS # example data does not already exist. rawData <- readr::read_csv("localData/airsis_ebam_example-raw.csv") fillCols <- list("Latitude", "Longitude", "Sys. Volts") # The raw data consists of data records with no location information interspersed # with 'GPS' records that have only location and voltage. This next step does # the following: # 1) replicate data associated with fillCols so that location information exists # in every row # 2) remove any rows that are missing the "Type" field found in every data record # NOTE: '!!!' is tidy unquoting for lists cleanData <- rawData %>% tidyr::fill(!!!fillCols, .direction = "down") %>% tidyr::fill(!!!fillCols, .direction = "up") %>% filter(!is.na(Type)) # Wow! Using tidyverse packages makes for very expressive code! readr::write_csv(cleanData, "localData/airsis_ebam_example-clean.csv")
In general, PWFSLSmoke
expects generic data to adhere to a few specific rules:
generic_downloadData()
accepts a file path an argument. This path can be
either a local file path or a remote connection (such as http://
, https://
,
ftp://
, or ftps://
). For this example, we will use a local file.
filePath <- filePath <- system.file( "extdata", "generic_data_example.csv", package = "PWFSLSmoke", mustWork = TRUE ) fileString <- generic_downloadData(filePath)
The output of this function is the file, stored as a single string ready to be
parsed by generic_parseData()
.
configList
can be either an R
list
or a JSON file, but
both must follow the same nested key-value format:
configList |_ column_names | |_ datetime = [string], | |_ pm25 = [string] | |_ station_meta | |_ latitude = [numeric], | |_ longitude = [numeric], | |_ site_name = [optional | string] | |_ parsing_info |_ header_rows = [integer] |_ column_types = [string], |_ datetime_format = [string], |_ tz = [optional | string | default: "UTC"], |_ decimal_mark = [optional | string | default: "."], |_ grouping_mark = [optional | string | default: ","], |_ delimiter = [optional | string | default: ","], |_ encoding = [optional | string | default: "UTF-8"]
Required parameters are the top level keys of the configuration list. Each parameter alters a specific aspect of parsing:
"column_names" contains entries for mapping column names from the input
file to ws_monitor
columns names needed to built a ws_monitor
object.
"station_meta" contains entries for metadata about the station that
generated the input file, which allows PWFSLSmoke
to augment the
ws_monitor
object later in the pipeline. Some entries are required,
others are optional.
"parsing_info"
Extra keys in columnNames
and stationMeta
are currently ignored.
The configuration list can be read from a JSON file, or created as a list directly.
# configListPath <- system.file( # "extdata", "generic_configList_example.json", # package = "PWFSLSmoke", # mustWork = TRUE # ) # configList <- jsonlite::fromJSON(configPath) configList <- list( column_names = list( datetime = "Date/Time/GMT", pm25 = "ConcHr" ), station_meta = list( latitude = 35.75506, longitude = -118.4175, site_name = "1026 Kernville EBAM" ), parsing_info = list( header_rows = 5, column_types = "ccdddddddddddcdcc", datetime_format = "mdy T", tz = "UTC", delimiter = "," ) )
parsedData <- generic_parseData(fileString, configList)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.