metric.stats: Calculate metric statistics

View source: R/metric_stats.R

metric.statsR Documentation

Calculate metric statistics

Description

This function calculates metric statistics for use with developing a multi-metric index.

Inputs are a data frame with

Usage

metric.stats(
  fun.DF,
  col_metrics,
  col_SampID = "SAMPLEID",
  col_RefStatus = "Ref_Status",
  RefStatus_Ref = "Ref",
  RefStatus_Str = "Str",
  RefStatus_Oth = "Oth",
  col_DataType = "Data_Type",
  DataType_Cal = "Cal",
  DataType_Ver = "Ver",
  col_Subset = NULL,
  Subset_Value = NULL
)

Arguments

fun.DF

Data frame.

col_metrics

Column names for metrics.

col_SampID

Column name for unique sample identifier. Default = "SAMPLEID".

col_RefStatus

Column name for Reference Status. Default = "Ref_Status"

RefStatus_Ref

Reference Status name for Reference used in col_ RefStatus. Default = “Ref”. Use NULL if you don't use this value.

RefStatus_Str

Reference Status name for Stressed used in col_ RefStatus. Default = “Str”. Use NULL if you don't use this value.

RefStatus_Oth

Reference Status name for Other used in col_ RefStatus. Default = “Oth”. Use NULL if you don't use this value.

col_DataType

Column name for Data Type – Validation vs. Calibration. Default = "Data_Type"

DataType_Cal

Datatype name for Calibration used in col_DataType. Default = “Cal”. Use NULL if you don't use this value.

DataType_Ver

Datatype name for Verification used in col_DataType. Default = “Ver”. Use NULL if you don't use this value.

col_Subset

Column name to subset the data and run on each subset. Default = NULL. If NULL then no subset will be generated.

Subset_Value

Subset name to be used for creating subset. Default = NULL.

Details

Summary statistics for the data are calculated.

The data is filtered by the column Subset for only a single value given by the user. If need further subsets re-run the function. If no subset is given the entire data set is used.

Statistics will be generated for up to 6 combinations for RefStatus (Ref, Oth, Str) and DataType (Cal, Ver).

The resulting dataframe will have the statistics in columns with the first 4 columns as: INDEX_CLASS (if col_Subset not provided), col_RefStatus, col_DataType, and Metric_Name.

The following statistics are generated with na.rm = TRUE.

* n = number

* min = minimum

* max = maximum

* mean = mean

* median = median

* range = range (max - min)

* sd = standard deviation

* cv = coefficient of variation (sd/mean)

* q05 = quantile, 5

* q10 = quantile, 10

* q25 = quantile, 25

* q50 = quantile, 50

* q75 = quantile, 75

* q90 = quantile, 90

* q95 = quantile, 95

Value

data frame of metrics (rows) and statistics (columns). This is in long format with columns for INDEX_CLASS, RefStatus, and DataType.

Examples

# data, benthos
df_bugs <- data_mmi_dev

# Munge Names
names(df_bugs)[names(df_bugs) %in% "BenSampID"] <- "SAMPLEID"
names(df_bugs)[names(df_bugs) %in% "TaxaID"] <- "TAXAID"
names(df_bugs)[names(df_bugs) %in% "Individuals"] <- "N_TAXA"
names(df_bugs)[names(df_bugs) %in% "Exclude"] <- "EXCLUDE"
names(df_bugs)[names(df_bugs) %in% "Class"] <- "INDEX_CLASS"
names(df_bugs)[names(df_bugs) %in% "Unique_ID"] <- "SITEID"

# Calc Metrics
cols_keep <- c("Ref_v1", "CalVal_Class4", "SITEID", "CollDate", "CollMeth")
# INDEX_NAME and INDEX_CLASS kept by default
df_metval <- metric.values(df_bugs, "bugs", fun.cols2keep = cols_keep)

# Calc Stats
col_metrics   <- names(df_metval)[9:ncol(df_metval)]
col_SampID    <- "SAMPLEID"
col_RefStatus <- "REF_V1"
RefStatus_Ref <- "Ref"
RefStatus_Str <- "Strs"
RefStatus_Oth <- "Other"
col_DataType  <- "CALVAL_CLASS4"
DataType_Cal  <- "cal"
DataType_Ver  <- "verif"
col_Subset    <- "INDEX_CLASS"
Subset_Value  <- "CENTRALHILLS"
df_stats <- metric.stats(df_metval
                         , col_metrics
                         , col_SampID
                         , col_RefStatus
                         , RefStatus_Ref
                         , RefStatus_Str
                         , RefStatus_Oth
                         , col_DataType
                         , DataType_Cal
                         , DataType_Ver
                         , col_Subset
                         , Subset_Value)

## Not run: 
# Save Results
write.table(df_stats
            , file.path(tempdir(), "metric.stats.tsv")
            , col.names = TRUE
            , row.names = FALSE
            , sep = "\t")

## End(Not run)

leppott/BioMonTools documentation built on March 1, 2025, 7:18 a.m.