resample | R Documentation |
Aggregate data by different time periods. Following this simple steps:
split data in series
pad data serie (needed for calculation of capture threshold, detection of gaps)
group serie by new interval with lubridate::floor_date()
apply statistical method or user provides function (user can provide list per parameter)
combine resampled series
It is possible to supply different methods for different parameters. The argument statistic can be named list. The name stands for the parameter. The value can be a function to apply, a name of method or a list of names. Some methods renames the parameter and changes the unit. A list of method names can only contain one non renaming method.
resample(
data,
statistic = "mean",
new_interval,
data_thresh = NULL,
max_gap = NULL,
rename_parameter = TRUE,
percentile = 0.95,
skip_padding = FALSE,
start_date = NULL,
end_date = NULL,
drop_last = FALSE
)
data |
A tibble in rOstluft long format |
statistic |
Statistical method(s) to apply when aggregating the data. Can be a simple string with name of the method or a function with one argument. Or a list with parameter as name and the statistical method as value (function or name of method). Or a list with parameter as and a list of statisticals methods. All methods must support renaming parameter. A default statistic for all parameters not in the list, can be defined with the name "default_statistic". See section Statistical methods and examples |
new_interval |
New interval. Must be longer than actual interval (not checked) |
data_thresh |
optional minimum data capture threshold in to use |
max_gap |
optional maxium Number of consecutive NA values |
rename_parameter |
optional rename parameter |
percentile |
The percentile level used when statistic = "percentile". The default is 0.95 |
skip_padding |
don't pad the data before applying statistics. Default FALSE |
start_date |
optional start date for padding. Default min date in series floored to the new interval |
end_date |
optional end date for padding. Default max date in series ceiled to the new interval |
drop_last |
optional drop the last added time point by padding. Default False, true if no end_date provided and max date != ceiled max date. |
tibble with resampled data
The statistical method is a function with a numeric vector as argument and returns a single value.
"mean"
average value
"median"
median value
"sd"
standard deviation of values
"sum"
sum over all values
"max"
maxium value
"min"
minimum value
"n"
number of valid records, renames parameter, changes unit
"coverage"
percentage of valid records, renames parameter, changes unit
"percentile"
calculates the percentile. Use the argument percentile to specify the level, renames parameter
"perc95"
95% percentile, renames parameter
"perc98"
98% percentile, renames parameter
"n>5"
number of values > 5 (WHO PM2.5 y1 limit), renames parameter, changes unit
"n>8"
number of values > 8 (CO d1 limit), renames parameter, changes unit
"n>10"
number of values > 10 (PM2.5 y1 limit), renames parameter, changes unit
"n>15"
number of values > 15 (WHO PM10 limit), renames parameter, changes unit
"n>25"
number of values > 25 (WHO NO2 d1 limit), renames parameter, changes unit
"n>30"
number of values > 30 (NO2, SO2 y1 limit), renames parameter, changes unit
"n>40"
number of values > 40 (WHO SO2 d1 limit), renames parameter, changes unit
"n>45"
number of values > 45 (WHO PM10 d1 limit), renames parameter, changes unit
"n>50"
number of values > 50 (PM10 d1 limit), renames parameter, changes unit
"n>60"
number of values > 60 (y1 limit), renames parameter, changes unit
"n>65"
number of values > 65 (O3 d1 indicator), renames parameter, changes unit
"n>80"
number of values > 80 (NO2 d1 limit), renames parameter, changes unit
"n>100"
number of values > 100 (SO2 d1 limit), renames parameter, changes unit
"n>120"
number of values > 120 (O3 h1 limit), renames parameter, changes unit
"n>160"
number of values > 160 (O3 h1 indicator), renames parameter, changes unit
"n>180"
number of values > 180 (O3 h1 indicator), renames parameter, changes unit
"n>200"
number of values > 200 (O3 h1 indicator), renames parameter, changes unit
"n>240"
number of values > 240 (O3 h1 indicator), renames parameter, changes unit
"drop"
drops the parameter from the result, useful for persons too lazy to filter the input data
Wind is a special case. For vector averaging the methods needs two inputs (direction and speed). To resample wind
data it is necessary to specify three parameters with the methods "wind.direction"
, "wind.speed_vector"
and
"wind.speed_scalar"
. Even if scalar or vector speed isn't present. The parameter will be substituted by the other.
Important: Wind calculation are standalone. It is possible to calculate multiple methods for non wind parameters.
AOT40 statistic?
some from https://github.com/davidcarslaw/openair/blob/master/R/aqStats.R?
min30 <- system.file("extdata", "Zch_Stampfenbachstrasse_min30_2013_Jan.csv",
package = "rOstluft.data", mustWork = TRUE)
airmo_min30 <- read_airmo_csv(min30)
# filter volume concenctrations, only use mass concentrations
airmo_min30 <- dplyr::filter(airmo_min30, !(.data$unit == "ppb" | .data$unit == "ppm"))
d1_statistics <- list(
"default_statistic" = "drop",
"Hr" = "mean",
"RainDur" = "sum",
"O3" = list("mean", "max", "min", "n")
)
resample(airmo_min30, d1_statistics, "d1", data_thresh = 0.8)
# Note: wind parameters don't support multiple methods via list!
h1_statistics <- list(
"default_statistic" = "drop",
"WD" = "wind.direction",
"WVs" = "wind.speed_scalar",
"WVv" = "wind.speed_vector",
"RainDur" = "sum",
"NO" = list("coverage", "mean")
)
resample(airmo_min30, h1_statistics, "h1", data_thresh = 0.8)
# Note: all resulting values should be NA -> gap is to big (480 * min30 = 10 days)
y1_statistics <- list(
"default_statistic" = "drop",
"O3" = list("mean", "perc98", "n", "max", "min")
)
resample(airmo_min30, y1_statistics, "y1", max_gap = 480)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.