CSCI: Score samples using the CSCI tool
In SCCWRP/CSCI: Tools for California Freshwater Biotic Assessment

CSCI	R Documentation

Score samples using the CSCI tool

Description

A function that aggregates many of the steps involved in the scoring of the California Stream Condition Index (CSCI) into a single function. These steps include data quality flagging, conversions of taxonomic names, iterative subsampling (20 iterations), metric calculations, prediction of expected taxa and metric values, scoring, and aggregation into a final index. Input data includes sample-wise raw, unprocessed taxonomy in a flat format, and station-wise predictor data in a crosstab format. See example data (bugs_stations) for reference. A complete description of the index is provided in Mazor et al. (in review). The O/E component of this function is adapted from John van Sickle's RIVPACS model building scripts.

Usage

CSCI(bugs, stations, rand = sample.int(10000, 1), distinct = TRUE)

Arguments

`bugs`	A data frame with BMI data (see details)
`stations`	A data frame with environmental data, one row per station (see details)
`rand`	An integer to control the random number generator (RNG) seed for the subsampling. By default set to `sample.int(10000, 1)`
`distinct`	A logical value to overwrite the `Distinct` column in `bugs` with `NA` values, default (`FALSE`) is leave as is.

Details

A valid "bugs" data frame consists of the following columns: StationCode, SampleID, FinalID (i.e., taxa names), LifeStageCode ("A", "L", "P", or "X"), BAResult (i.e., taxa counts), and Distinct (a positive integer where the taxonomist has indicated distinctiveness, else left blank or 0). Values for FinalID and LifeStageCode must conform to values from SWAMP lookup tables (http://swamp.mpsl.mlml.calstate.edu/). See CSCI guidance document for details on these fields.

A valid "stations" data frame consists of the following columns: StationCode (must match with same column in the "bugs" data frame), BDH_AVE, ELEV_RANGE, KFCT_AVE, P_MEAN, LogWSA, New_Lat, New_Long, PPT_00_09, SITE_ELEV, SumAve_P, TEMP_00_09. See CSCI guidance document for details on these fields.

The data frames are also subject to the following constraints: no missing blank cells in any field in either data frame (except for the Distinct column); all values under StationCode in the "bugs" data frame must be represented under StationCode in the "stations" data frame; every SampleID must be associated with only a single StationCode; no duplicated data in either data frame (e.g., every combination of the SampleID, FinalID, LifeStageCode, and Distinct should be unique in the "bugs" data frame).

In order to produce replicable results, the RNG seed can be controlled using the rand argument. Any integer may be entered, which will be passed to set.seed.

Value

A list of data frames that serve as reports in varying detail:

`core`	A summary of the CSCI results, and data quality flags, averaged across 20 iterations.
`Suppl1_mmi`	A detailed breakdown of the results of the MMI component of the CSCI, averaged across 20 iterations.
`Suppl1_grps`	Probability of biotic group membership in a SampleID by Group format
`Suppl1_OE`	A detailed breakdown of the results of the O/E component of the CSCI, averaged across 20 iterations. Capture probabilities and mean abundances of each OTU are provided.
`Suppl2_mmi`	Similar to Suppl1_mmi, except broken down by iteration
`Suppl2_OE`	Similar to Suppl1_OE, except brown down by replicatesiteration. Iteration-wise O/E scores are also provided.

Author(s)

Mark Engeln marke@sccwrp.org

Raphael Mazor raphaelm@sccwrp.org

References

R.D. Mazor, A. Rehn, P. R. Ode, M. Engeln, K. Schiff. (2013) Development of a bioassessment tool for streams in heterogeneous regions: Accommodating environmental complexity through site specificity in the California Stream Condition Index. In review.

J. Van Sickle. (2010) R code to make predictions of O/E for a new set of sites based on a Random Forest predictive model (Version 4.2)[R script]

Examples

data(bugs_stations) #A list of two data frames: bugs and stations
results <- CSCI(bugs = bugs_stations[[1]], stations = bugs_stations[[2]])
ls(results) #see all the components of the report
results$core #see the core report

SCCWRP/CSCI documentation built on June 12, 2025, 11:40 p.m.