clean_wqdata: Clean Water Quality Data

View source: R/clean-wqdata.R

clean_wqdataR Documentation

Clean Water Quality Data

Description

Cleans water quality data. After standardization using standardize_wqdata replicates (two or more readings for the same variable on the same date) are averaged using the mean function. Readings for the same variable on the same date but at different levels of the columns specified in by are not considered replicates. The clean_wqdata function is automatically called by calc_limits prior to calculating limits.

Usage

clean_wqdata(
  x,
  by = NULL,
  max_cv = Inf,
  sds = 10,
  ignore_undetected = TRUE,
  large_only = TRUE,
  delete_outliers = FALSE,
  remove_blanks = FALSE,
  messages = getOption("wqbc.messages", default = TRUE),
  FUN = mean
)

Arguments

x

The data.frame to clean.

by

A character vector of the columns in x to perform the cleaning by. If you have multiple stations specify the column name that contains the station IDs.

max_cv

A number indicating the maximum permitted coefficient of variation for replicates.

sds

The number of standard deviations above which a value is considered an outlier.

ignore_undetected

A flag indicating whether to ignore undetected values when calculating the average deviation and identifying outliers.

large_only

A flag indicating whether only large values which exceed the sds should be identified as outliers.

delete_outliers

A flag indicating whether to delete outliers or merely flag them.

remove_blanks

Should blanks be removed? Blanks are assumed to be denoted by a value of "Blank..." in the SAMPLE_CLASS column. Default FALSE

messages

A flag indicating whether to print messages.

FUN

The function to use for summaries, e.g. median, mean, or max. Default mean

Details

If there are three or more replicates with a coefficient of variation (CV) in exceedance of max_cv then the replicates with the highest absolute deviation is dropped until the CV is less than or equal to max_cv or only two values remain. By default all values are averaged.

A max_cv value of 1.29 is exceeded by two zero and one positive value (CV = 1.73) or by two identical positive values and a third value an order or magnitude greater (CV = 1.30). It is not exceed by one zero and two identical positive values (CV = 0.87).

See Also

calc_limits and standardize_wqdata

Examples

clean_wqdata(wqbc::dummy, messages = TRUE)

bcgov/wqbc documentation built on Feb. 11, 2023, 11:15 p.m.