resample: Resampling data
In Ostluft/rOstluft: Tools for handling air quality data by Ostluft

resample

R Documentation

Resampling data

Description

Aggregate data by different time periods. Following this simple steps:

split data in series
- pad data serie (needed for calculation of capture threshold, detection of gaps)
- group serie by new interval with lubridate::floor_date()
- apply statistical method or user provides function (user can provide list per parameter)
combine resampled series

It is possible to supply different methods for different parameters. The argument statistic can be named list. The name stands for the parameter. The value can be a function to apply, a name of method or a list of names. Some methods renames the parameter and changes the unit. A list of method names can only contain one non renaming method.

Usage

resample(
  data,
  statistic = "mean",
  new_interval,
  data_thresh = NULL,
  max_gap = NULL,
  rename_parameter = TRUE,
  percentile = 0.95,
  skip_padding = FALSE,
  start_date = NULL,
  end_date = NULL,
  drop_last = FALSE
)

Arguments

`data`	A tibble in rOstluft long format
`statistic`	Statistical method(s) to apply when aggregating the data. Can be a simple string with name of the method or a function with one argument. Or a list with parameter as name and the statistical method as value (function or name of method). Or a list with parameter as and a list of statisticals methods. All methods must support renaming parameter. A default statistic for all parameters not in the list, can be defined with the name "default_statistic". See section Statistical methods and examples
`new_interval`	New interval. Must be longer than actual interval (not checked)
`data_thresh`	optional minimum data capture threshold in to use
`max_gap`	optional maxium Number of consecutive NA values
`rename_parameter`	optional rename parameter
`percentile`	The percentile level used when statistic = "percentile". The default is 0.95
`skip_padding`	don't pad the data before applying statistics. Default FALSE
`start_date`	optional start date for padding. Default min date in series floored to the new interval
`end_date`	optional end date for padding. Default max date in series ceiled to the new interval
`drop_last`	optional drop the last added time point by padding. Default False, true if no end_date provided and max date != ceiled max date.

Value

tibble with resampled data

Statistical methods

The statistical method is a function with a numeric vector as argument and returns a single value.

"mean" average value
"median" median value
"sd" standard deviation of values
"sum" sum over all values
"max" maxium value
"min" minimum value
"n" number of valid records, renames parameter, changes unit
"coverage" percentage of valid records, renames parameter, changes unit
"percentile" calculates the percentile. Use the argument percentile to specify the level, renames parameter
"perc95" 95% percentile, renames parameter
"perc98" 98% percentile, renames parameter
"n>5" number of values > 5 (WHO PM2.5 y1 limit), renames parameter, changes unit
"n>8" number of values > 8 (CO d1 limit), renames parameter, changes unit
"n>10" number of values > 10 (PM2.5 y1 limit), renames parameter, changes unit
"n>15" number of values > 15 (WHO PM10 limit), renames parameter, changes unit
"n>25" number of values > 25 (WHO NO2 d1 limit), renames parameter, changes unit
"n>30" number of values > 30 (NO2, SO2 y1 limit), renames parameter, changes unit
"n>40" number of values > 40 (WHO SO2 d1 limit), renames parameter, changes unit
"n>45" number of values > 45 (WHO PM10 d1 limit), renames parameter, changes unit
"n>50" number of values > 50 (PM10 d1 limit), renames parameter, changes unit
"n>60" number of values > 60 (y1 limit), renames parameter, changes unit
"n>65" number of values > 65 (O3 d1 indicator), renames parameter, changes unit
"n>80" number of values > 80 (NO2 d1 limit), renames parameter, changes unit
"n>100" number of values > 100 (SO2 d1 limit), renames parameter, changes unit
"n>120" number of values > 120 (O3 h1 limit), renames parameter, changes unit
"n>160" number of values > 160 (O3 h1 indicator), renames parameter, changes unit
"n>180" number of values > 180 (O3 h1 indicator), renames parameter, changes unit
"n>200" number of values > 200 (O3 h1 indicator), renames parameter, changes unit
"n>240" number of values > 240 (O3 h1 indicator), renames parameter, changes unit
"drop" drops the parameter from the result, useful for persons too lazy to filter the input data

Wind

Wind is a special case. For vector averaging the methods needs two inputs (direction and speed). To resample wind data it is necessary to specify three parameters with the methods "wind.direction", "wind.speed_vector" and "wind.speed_scalar". Even if scalar or vector speed isn't present. The parameter will be substituted by the other.

Important: Wind calculation are standalone. It is possible to calculate multiple methods for non wind parameters.

TODO

AOT40 statistic?
some from https://github.com/davidcarslaw/openair/blob/master/R/aqStats.R?

Examples

min30 <- system.file("extdata", "Zch_Stampfenbachstrasse_min30_2013_Jan.csv",
                     package = "rOstluft.data", mustWork = TRUE)

airmo_min30 <- read_airmo_csv(min30)

# filter volume concenctrations, only use mass concentrations
airmo_min30 <- dplyr::filter(airmo_min30, !(.data$unit == "ppb" | .data$unit == "ppm"))

d1_statistics <- list(
  "default_statistic" = "drop",
  "Hr" = "mean",
  "RainDur" = "sum",
  "O3" = list("mean", "max", "min", "n")
)
resample(airmo_min30, d1_statistics, "d1", data_thresh = 0.8)

# Note: wind parameters don't support multiple methods via list!
h1_statistics <- list(
  "default_statistic" = "drop",
  "WD" = "wind.direction",
  "WVs" = "wind.speed_scalar",
  "WVv" = "wind.speed_vector",
  "RainDur" = "sum",
  "NO" = list("coverage", "mean")
)
resample(airmo_min30, h1_statistics, "h1", data_thresh = 0.8)

# Note: all resulting values should be NA -> gap is to big (480 * min30 = 10 days)
y1_statistics <- list(
  "default_statistic" = "drop",
  "O3" = list("mean", "perc98", "n", "max", "min")
)
resample(airmo_min30, y1_statistics, "y1", max_gap = 480)

Ostluft/rOstluft documentation built on Feb. 6, 2024, 1:26 a.m.