TADA_AggregateMeasurements: Aggregate multiple result values to a min, max, or mean
In USEPA/TADA: EPA TADA (Tools for Automated Data Analysis) R Package

TADA_AggregateMeasurements

R Documentation

Aggregate multiple result values to a min, max, or mean

Description

This function groups TADA data by user-defined columns and aggregates the TADA.ResultMeasureValue to a minimum, maximum, or average value.

Usage

TADA_AggregateMeasurements(
  .data,
  grouping_cols = c("ActivityStartDate", "TADA.MonitoringLocationIdentifier",
    "TADA.ComparableDataIdentifier", "ResultDetectionConditionText", "ActivityTypeCode"),
  agg_fun = c("max", "min", "mean"),
  clean = TRUE
)

Arguments

`.data`	A TADA dataframe
`grouping_cols`	The column names used to group the data
`agg_fun`	The aggregation function used on the grouped data. This can either be 'min', 'max', or 'mean'.
`clean`	Boolean. Determines whether other measurements from the group aggregation should be removed or kept in the dataframe. If clean = FALSE, additional measurements are indicated in the TADA.ResultValueAggregation.Flag as "Used in aggregation function but not selected".

Value

A TADA dataframe with aggregated values combined into one row. If the agg_fun is 'min' or 'max', the function will select the row matching the aggregation condition and flag it as the selected measurement. If the agg_fun is 'mean', the function will select a random row from the aggregated rows to represent the metadata associated with the mean value, and gives the row a unique ResultIdentifier: the original ResultIdentifier with the prefix "TADA-". Function adds a TADA.ResultValueAggregation.Flag to indicate which rows have been aggregated.

Examples

# Load example dataset
data(Data_6Tribes_5y)
# Select maximum value per day, site, comparable data identifier, result detection condition,
# and activity type code. Clean all non-maximum measurements from grouped data.
Data_6Tribes_5y_agg <- TADA_AggregateMeasurements(Data_6Tribes_5y,
  grouping_cols = c(
    "ActivityStartDate", "TADA.MonitoringLocationIdentifier",
    "TADA.ComparableDataIdentifier", "ResultDetectionConditionText",
    "ActivityTypeCode"
  ),
  agg_fun = "max", clean = TRUE
)

# Calculate a mean value per day, site, comparable data identifier, result detection condition,
# and activity type code. Keep all measurements used to calculate mean measurement.
Data_6Tribes_5y_agg <- TADA_AggregateMeasurements(Data_6Tribes_5y,
  grouping_cols = c(
    "ActivityStartDate", "TADA.MonitoringLocationIdentifier",
    "TADA.ComparableDataIdentifier", "ResultDetectionConditionText",
    "ActivityTypeCode"
  ),
  agg_fun = "mean", clean = FALSE
)

USEPA/TADA documentation built on April 12, 2025, 1:47 p.m.