metric.stats: Calculate metric statistics
In leppott/BioMonTools: Tools for Biomonitoring and Bioassessment

metric.stats

R Documentation

Calculate metric statistics

Description

This function calculates metric statistics for use with developing a multi-metric index.

Inputs are a data frame with

Usage

metric.stats(
  fun.DF,
  col_metrics,
  col_SampID = "SAMPLEID",
  col_RefStatus = "Ref_Status",
  RefStatus_Ref = "Ref",
  RefStatus_Str = "Str",
  RefStatus_Oth = "Oth",
  col_DataType = "Data_Type",
  DataType_Cal = "Cal",
  DataType_Ver = "Ver",
  col_Subset = NULL,
  Subset_Value = NULL
)

Arguments

`fun.DF`	Data frame.
`col_metrics`	Column names for metrics.
`col_SampID`	Column name for unique sample identifier. Default = "SAMPLEID".
`col_RefStatus`	Column name for Reference Status. Default = "Ref_Status"
`RefStatus_Ref`	Reference Status name for Reference used in col_ RefStatus. Default = “Ref”. Use NULL if you don't use this value.
`RefStatus_Str`	Reference Status name for Stressed used in col_ RefStatus. Default = “Str”. Use NULL if you don't use this value.
`RefStatus_Oth`	Reference Status name for Other used in col_ RefStatus. Default = “Oth”. Use NULL if you don't use this value.
`col_DataType`	Column name for Data Type – Validation vs. Calibration. Default = "Data_Type"
`DataType_Cal`	Datatype name for Calibration used in col_DataType. Default = “Cal”. Use NULL if you don't use this value.
`DataType_Ver`	Datatype name for Verification used in col_DataType. Default = “Ver”. Use NULL if you don't use this value.
`col_Subset`	Column name to subset the data and run on each subset. Default = NULL. If NULL then no subset will be generated.
`Subset_Value`	Subset name to be used for creating subset. Default = NULL.

Details

Summary statistics for the data are calculated.

The data is filtered by the column Subset for only a single value given by the user. If need further subsets re-run the function. If no subset is given the entire data set is used.

Statistics will be generated for up to 6 combinations for RefStatus (Ref, Oth, Str) and DataType (Cal, Ver).

The resulting dataframe will have the statistics in columns with the first 4 columns as: INDEX_CLASS (if col_Subset not provided), col_RefStatus, col_DataType, and Metric_Name.

The following statistics are generated with na.rm = TRUE.

* n = number

* min = minimum

* max = maximum

* mean = mean

* median = median

* range = range (max - min)

* sd = standard deviation

* cv = coefficient of variation (sd/mean)

* q05 = quantile, 5

* q10 = quantile, 10

* q25 = quantile, 25

* q50 = quantile, 50

* q75 = quantile, 75

* q90 = quantile, 90

* q95 = quantile, 95

Value

data frame of metrics (rows) and statistics (columns). This is in long format with columns for INDEX_CLASS, RefStatus, and DataType.

Examples

# data, benthos
df_bugs <- data_mmi_dev

# Munge Names
names(df_bugs)[names(df_bugs) %in% "BenSampID"] <- "SAMPLEID"
names(df_bugs)[names(df_bugs) %in% "TaxaID"] <- "TAXAID"
names(df_bugs)[names(df_bugs) %in% "Individuals"] <- "N_TAXA"
names(df_bugs)[names(df_bugs) %in% "Exclude"] <- "EXCLUDE"
names(df_bugs)[names(df_bugs) %in% "Class"] <- "INDEX_CLASS"
names(df_bugs)[names(df_bugs) %in% "Unique_ID"] <- "SITEID"

# Calc Metrics
cols_keep <- c("Ref_v1", "CalVal_Class4", "SITEID", "CollDate", "CollMeth")
# INDEX_NAME and INDEX_CLASS kept by default
df_metval <- metric.values(df_bugs, "bugs", fun.cols2keep = cols_keep)

# Calc Stats
col_metrics   <- names(df_metval)[9:ncol(df_metval)]
col_SampID    <- "SAMPLEID"
col_RefStatus <- "REF_V1"
RefStatus_Ref <- "Ref"
RefStatus_Str <- "Strs"
RefStatus_Oth <- "Other"
col_DataType  <- "CALVAL_CLASS4"
DataType_Cal  <- "cal"
DataType_Ver  <- "verif"
col_Subset    <- "INDEX_CLASS"
Subset_Value  <- "CENTRALHILLS"
df_stats <- metric.stats(df_metval
                         , col_metrics
                         , col_SampID
                         , col_RefStatus
                         , RefStatus_Ref
                         , RefStatus_Str
                         , RefStatus_Oth
                         , col_DataType
                         , DataType_Cal
                         , DataType_Ver
                         , col_Subset
                         , Subset_Value)

## Not run: 
# Save Results
write.table(df_stats
            , file.path(tempdir(), "metric.stats.tsv")
            , col.names = TRUE
            , row.names = FALSE
            , sep = "\t")

## End(Not run)

leppott/BioMonTools documentation built on June 10, 2025, 9:41 a.m.