TADA_AggregateMeasurements: Aggregate multiple result values to a min, max, or mean

View source: R/Utilities.R

TADA_AggregateMeasurementsR Documentation

Aggregate multiple result values to a min, max, or mean

Description

This function groups TADA data by user-defined columns and aggregates the TADA.ResultMeasureValue to a minimum, maximum, or average value.

Usage

TADA_AggregateMeasurements(
  .data,
  grouping_cols = c("ActivityStartDate", "TADA.MonitoringLocationIdentifier",
    "TADA.ComparableDataIdentifier", "ResultDetectionConditionText", "ActivityTypeCode"),
  agg_fun = c("max", "min", "mean"),
  clean = TRUE
)

Arguments

.data

A TADA dataframe

grouping_cols

The column names used to group the data

agg_fun

The aggregation function used on the grouped data. This can either be 'min', 'max', or 'mean'.

clean

Boolean. Determines whether other measurements from the group aggregation should be removed or kept in the dataframe. If clean = FALSE, additional measurements are indicated in the TADA.ResultValueAggregation.Flag as "Used in aggregation function but not selected".

Value

A TADA dataframe with aggregated values combined into one row. If the agg_fun is 'min' or 'max', the function will select the row matching the aggregation condition and flag it as the selected measurement. If the agg_fun is 'mean', the function will select a random row from the aggregated rows to represent the metadata associated with the mean value, and gives the row a unique ResultIdentifier: the original ResultIdentifier with the prefix "TADA-". Function adds a TADA.ResultValueAggregation.Flag to indicate which rows have been aggregated.

Examples

# Load example dataset
data(Data_6Tribes_5y)
# Select maximum value per day, site, comparable data identifier, result detection condition,
# and activity type code. Clean all non-maximum measurements from grouped data.
Data_6Tribes_5y_agg <- TADA_AggregateMeasurements(Data_6Tribes_5y,
  grouping_cols = c(
    "ActivityStartDate", "TADA.MonitoringLocationIdentifier",
    "TADA.ComparableDataIdentifier", "ResultDetectionConditionText",
    "ActivityTypeCode"
  ),
  agg_fun = "max", clean = TRUE
)

# Calculate a mean value per day, site, comparable data identifier, result detection condition,
# and activity type code. Keep all measurements used to calculate mean measurement.
Data_6Tribes_5y_agg <- TADA_AggregateMeasurements(Data_6Tribes_5y,
  grouping_cols = c(
    "ActivityStartDate", "TADA.MonitoringLocationIdentifier",
    "TADA.ComparableDataIdentifier", "ResultDetectionConditionText",
    "ActivityTypeCode"
  ),
  agg_fun = "mean", clean = FALSE
)

USEPA/TADA documentation built on April 12, 2025, 1:47 p.m.