knitr::opts_chunk$set(collapse = TRUE , comment = "#>")
MBSStools
was created by Tetra Tech, Inc. in 2017 to meet the needs of Maryland Department of Natural Resources (DNR), Maryland Biological Stream Survey (MBSS) staff to complete data manipulations in an efficient and reproducible manner.
It is envisioned that this library will be a living program and will add additional functions and routines in the future.
Many of the examples in this vignette are included in the example sections in the corresponding functions. Each function in the MBSStools
library includes example data from MBSS.
The R library is saved on GitHub (https://github.com/leppott/MBSStools) for ease of updating and distribution. Issues can be tracked, fixed, and code is available for download. Current users of MBSStools
can update the library using the same code used to install the library (see below). Devtools is required to download from GitHub. At this time, there are no plans to submit MBSStools
to CRAN (R's library repository).
install.packages("devtools") library(devtools) install_github("leppott/MBSStools")
To contact the author directly email Erik.Leppo@tetratech.com.
Several packages were used to build the functionality in MBSStools
.
FlowSum; no extra packages
IonContrib; no extra packages
MapTaxaObs; readxl
and rgdal
metric.scores; dplyr
metric.values; dplyr
PHIcalc; no extra packages
These packages should install automatically when MBSStools
installs. But if
you encounter issues with a function not working ensure that the necessary
package dependencies are installed.
To install packages follow the example of the code below for installing dplyr
.
install.packages("dplyr")
There are several functions included in the library each with a particular focus on a dataset and the necessary calculations for data analysis.
Index of Biological Integrity (IBI) calculations for fish and benthic macroinvertebrates.
Taxonomic distribution maps for fish.
Stream discharge calculation.
Ion contribution to conductivity.
Physical Habitat Index (PHI) calculation.
Example data are included with each function.
MBSStools
calculates both the benthic macroinvertebrate IBI (BIBI) and fish
IBI (FIBI). The IBIs are described in the document Southerland et al. 2005 that
is included in the /extdata folder of the library.
There are two functions; metric.values() to calculate the metrics (e.g., total individuals or total taxa) and metric.scores() that converts the metric values to the appropriate score (e.g., 1, 3, or 5). The functions are the same for fish and benthic macroinvertebrates. Only the argument fun.Community needs to be changed to "bugs" or "fish".
MBSStools
does not aggregate or compile taxa lists. Taxa data with
site/sample information, counts, phylogenetic (phylum, class, order, etc), and
autecological (habit, tolerance value, and function feeding group) information
is required as an input to the functions. Fish data needs the additional values
of watershed (catchment) area (in acres), average stream width, and reach
length.
A background table (metrics_scoring
) is included and provides a table with the
metric names and scoring thresholds for each IBI and index region. This table
is populated with the appropriate values for the MBSS IBIs. This table can be
modified.
The functions metric.values() and metric.scores() require the dplyr
library.
Calculates the fish IBI.
The adjustment of metrics based on watershed area is automatic for fish metrics.
Fish metric values assumes the following fields (all upper case)
SITE = MBSS sample identifier.
FIBISTRATA = Fish region (COASTAL, COLD, EPIEDMONT, HIGHLAND).
ACREAGE = Catchment area in acres.
LEN_SAMP = Length of stream sampled; typically 75-meters.
AVWID = Average stream width (meters) for sampled site.
SPECIES = MBSS fish taxa name.
TOTAL = Number of sampled individuals corresponding to specified taxon.
TYPE = Fish group identifier (ALL CAPS); SCULPIN, DARTER, MADTOM, etc.
TROPHIC_MBSS = MBSS tropic status designations (ALL CAPS); OM, GE, IS, IV, etc.
PTOLR = Pollution tolerance level (ALL CAPS); T, I, NO TYPE.
** Sites with a single record indicating "No Fish Observed" receive a default score of 1.00.
# Metrics, Fish library(MBSStools) #(generate values then scores) myIndex <- "MBSS.2005.Fish" # Thresholds thresh <- metrics_scoring # get metric names for myIndex (myMetrics.Fish <- as.character(unique(thresh[thresh[,"Index.Name"]==myIndex,"MetricName.Other"]))) # Taxa Data myDF.Fish <- taxa_fish myMetric.Values.Fish <- metric.values(myDF.Fish, "fish", myMetrics.Fish, TRUE) knitr::kable(head(myMetric.Values.Fish)) # SCORE Metrics.Fish.Scores <- metric.scores(myMetric.Values.Fish , myMetrics.Fish , "Index.Name" , "FIBISTRATA" , thresh) # View Results knitr::kable(head(Metrics.Fish.Scores)) #Write results to csv file #write.csv(Metrics.Fish.Scores, "Filename.csv")
Bug metric values assumes the following fields (all upper case)
SITE = MBSS sample identifier.
TAXON = MBSS benthic macroinvertebrate name.
N_TAXA = Number of sampled individuals corresponding to specified taxon.
EXCLUDE = Non-unique taxa (i.e., parent taxon with one or more children taxa present in sample). "Y" = do not include in taxa richness metrics.
STRATA_R = Benthic macroinvertebrate region (COASTAL, EPIEDMONT, or HIGHLAND).
Phylogenetic fields:
PHYLUM, CLASS, ORDER, FAMILY, GENUS, OTHER_TAXA, TRIBE
Autoecological fields:
Valid values for FFG: COLLECTOR, FILTERER, PIERCER, PREDATOR, SCRAPER, SHREDDER
Valid values for HABIT: BU, CB, CN, SP, SW
Maryland Stream Waders (MSW) data should be first combined to family level and EXCLUDE recalculated.
Additional fields needed:
In addition to the metric scores a QC check on samples with too many (>120) or too few (<60) organisms is returned in the results. Those with more than the maximum (120) should be subsampled down to within the target range.
# Metrics, Index, Benthic Macroinvertebrates, genus # (generate values then scores) myIndex <- "MBSS.2005.Bugs" # Thresholds thresh <- metrics_scoring # get metric names for myIndex (myMetrics.Bugs.MBSS <- as.character(unique(thresh[thresh[,"Index.Name"]==myIndex,"MetricName.Other"]))) # Taxa Data myDF.Bugs.MBSS <- taxa_bugs_genus myMetric.Values.Bugs.MBSS <- metric.values(myDF.Bugs.MBSS, "bugs", myMetrics.Bugs.MBSS) # SCORE Metrics.Bugs.Scores.MBSS <- metric.scores(myMetric.Values.Bugs.MBSS, myMetrics.Bugs.MBSS, "INDEX.NAME", "STRATA_R", thresh) #Write results to csv file #write.csv(Metrics.Bugs.Scores.MBSS, "Filename.csv") # QC bug count Metrics.Bugs.Scores.MBSS[Metrics.Bugs.Scores.MBSS[,"totind"]>120,"QC_Count"] <- "LARGE" Metrics.Bugs.Scores.MBSS[Metrics.Bugs.Scores.MBSS[,"totind"]<60,"QC_Count"] <- "SMALL" Metrics.Bugs.Scores.MBSS[is.na(Metrics.Bugs.Scores.MBSS[,"QC_Count"]),"QC_Count"] <- "OK" # table of QC_Count knitr::kable(head(Metrics.Bugs.Scores.MBSS))
The Maryland Stream Wader index is from Stribling et al. 1999. Metric values and scores shown below for example data.
SITE = Sample identifier.
TAXON = MSW benthic macroinvertebrate name (Family-level identification).
N_TAXA = Number of taxon collected in sample.
EXCLUDE = Non-unique taxa (i.e., parent taxon with one or more children taxa present in sample).
"Y" = do not include in taxa richness metrics.
STRATA_R =This index has only two bioregions (CP=Coastal Plain, NCP=Non-Coastal Plain).
Phylogenetic fields:
PHYLUM, CLASS, ORDER, FAMILY
Autoecological field:
# Metrics, MSW Index, Benthic Macroinvertebrates, family myIndex <- "MSW.1999.Bugs" # Thresholds thresh <- metrics_scoring # get metric names for myIndex (myMetrics.Bugs.MSW <- as.character(unique(thresh[thresh[,"Index.Name"]==myIndex,"MetricName.Other"]))) # Taxa Data myDF.Bugs.MSW <- taxa_bugs_family myMetric.Values.Bugs.MSW <- metric.values(myDF.Bugs.MSW, "bugs", myMetrics.Bugs.MSW) knitr::kable(head(myMetric.Values.Bugs.MSW)) # SCORE Metrics.Bugs.Scores.MSW <- metric.scores(myMetric.Values.Bugs.MSW, myMetrics.Bugs.MSW, "INDEX.NAME", "STRATA_R", thresh) # View Results knitr::kable(head(Metrics.Bugs.Scores.MSW))
The intent of this function is to recreate the taxa distribution maps on the MBSS website: http://eyesonthebay.dnr.maryland.gov/mbss/fishes.cfm. The maps for all taxa can be generated from a single line of code. The maps generated with this function use a 'crosswalk' table that converts taxa names to map names.
The function MapTaxaObs() requires the readxl
and rgdal
packages.
The user will need GIS files for the state of Maryland for State, County, and Water.
The inputs for this function, besides the three GIS shapefiles, are taxa observations and a taxa name crosswalk (translation) table between taxa names and map names.
All inputs are included in the MBSStools
library and can be copied to the current directory with the code below.
# Set Working Directory wd <- getwd() # Create Example Data if Needed ## Create Directories myDir.create <- file.path(wd,"Data") ifelse(dir.exists(myDir.create)==FALSE,dir.create(myDir.create),"Directory already exists") myDir.create <- file.path(wd,"GIS") ifelse(dir.exists(myDir.create)==FALSE,dir.create(myDir.create),"Directory already exists") myDir.create <- file.path(wd,"Maps") ifelse(dir.exists(myDir.create)==FALSE,dir.create(myDir.create),"Directory already exists") ## Create Data ### Taxa Data myFiles <- c("AllFish_95to16.xls", "TaxaMapsCrossWalk20170731.xlsx") file.copy(file.path(path.package("MBSStools"),"extdata",myFiles),file.path(wd,"Data",myFiles)) ### GIS unzip(file.path(path.package("MBSStools"),"extdata","MD_GIS.zip"),exdir=file.path(wd,"GIS"))
Observation data with three columns needed for the tool to work (common name, latitude, and longitude). Additional columns can be appended (e.g., SiteYr) and will not affect the functionality.
# library library(readxl) # data files obs <- "AllFish_95to16.xls" xWalk <- "TaxaMapsCrossWalk20170731.xlsx" all.taxa <- system.file("extdata", obs, package = "MBSStools") taxa.lu <- system.file("extdata", xWalk, package = "MBSStools") # Read in Taxon observations df.taxa.obs <- readxl::read_excel(all.taxa, sheet=1,col_names=TRUE,skip=0) df.taxa.obs <- as.data.frame(df.taxa.obs) df.taxa.obs[,1] <- tolower(df.taxa.obs[,1]) #df.taxa.obs <- as.data.frame(cbind(tolower(df.taxa.obs[,"TaxaName"]), df.taxa.obs[,2:3])) colnames(df.taxa.obs)[1] <- "CommonName" head(df.taxa.obs)
Crosswalk data with common name and map name.
# library library(readxl) # data files obs <- "AllFish_95to16.xls" xWalk <- "TaxaMapsCrossWalk20170731.xlsx" all.taxa <- system.file("extdata", obs, package = "MBSStools") taxa.lu <- system.file("extdata", xWalk, package = "MBSStools") # Read in TaxaMapsCrossWalk.xlsx df.lu.taxa <- readxl::read_excel(taxa.lu,sheet=1,col_names=TRUE,skip=0) df.lu.taxa <- as.data.frame(df.lu.taxa[,c("CommonName","MapName")]) df.lu.taxa[,"CommonName"] <- tolower(df.lu.taxa[,"CommonName"]) # df.lu.taxa <- as.data.frame(cbind(tolower(df.lu.taxa[,"CommonName"]),df.lu.taxa[,"MapName"])) # colnames(df.lu.taxa)[1:2] <- c("CommonName","MapName") head(df.lu.taxa)
The code below can be used to generate the maps (n=131) associated with the 1995 to 2016 data. Any taxa without a match in the crosswalk table (n=54) are listed in a CSV file and the number reported to the user in the R console, but maps are not generated (by default). If the user wants all maps then the parameter 'onlymatches' should be set to 'FALSE' (as in the example below).
Updated in 2024 to replace the rgdal
package (deprecated in 2023) with the
sf
package.
# Set Working Directory wd <- getwd() # Inputs Obs <- "AllFish_95to16.xls" XWalk <- "TaxaMapsCrossWalk20170731.xlsx" wd <- getwd() # Create maps MapTaxaObs(Obs, XWalk, wd, onlymatches = FALSE)
The intent behind this function was to recreate the fish taxa distribution maps on the DNR website. The data inputs are not specific to fish and can be used for benthic macroinvertebrate data.
The taxa distribution map for brown trout is shown below.
Points with latitude and longitude are imported as WGS84 (CRS = 4326) then
converted to NAD83 Maryland (meters) (CRS = 26985) before plotting using the
sf
package.
# example map. # include only brown trout # brntrout.png # assume already created directories # Packages library(MBSStools) library(readxl) library(sf) # Set Working Directory wd <- tempdir() # Create Example Data if Needed ## Create Directories ## Create a “Data” folder, a “GIS” folder, and a “Maps” folder myDir.create <- file.path(wd, "Data") ifelse(dir.exists(myDir.create) == FALSE, dir.create(myDir.create), "Directory already exists") myDir.create <- file.path(wd, "GIS") ifelse(dir.exists(myDir.create) == FALSE, dir.create(myDir.create), "Directory already exists") myDir.create <- file.path(wd, "Maps") ifelse(dir.exists(myDir.create) ==FALSE, dir.create(myDir.create), "Directory already exists") ## Create Data ## Make copies of datafiles and place into “Data” folder ### Taxa Data myFiles <- c("AllFish_95to16.xls", "TaxaMapsCrossWalk20170731.xlsx") file.copy(file.path(path.package("MBSStools"), "extdata", myFiles), file.path(wd, "Data", myFiles)) ### GIS # Unzip GIS file in MBSS tools “extdata” folder and place in “GIS” folder unzip(file.path(path.package("MBSStools"), "extdata","MD_GIS.zip"), exdir = file.path(wd, "GIS")) # data files obs <- "AllFish_95to16.xls" xWalk <- "TaxaMapsCrossWalk20170731.xlsx" dirMain <- wd dirData <- "Data" dirGIS <- "GIS" verbose <- TRUE onlymatches <- TRUE all.taxa <- system.file("extdata", obs, package = "MBSStools") taxa.lu <- system.file("extdata", xWalk, package = "MBSStools") # Read in Taxon observations df.taxa.obs <- readxl::read_excel(all.taxa, sheet = 1, col_names = TRUE, skip = 0) df.taxa.obs <- as.data.frame(df.taxa.obs) df.taxa.obs[,1] <- tolower(df.taxa.obs[, 1]) colnames(df.taxa.obs)[1] <- "CommonName" ## Trim to browntrout df.taxa.obs <- df.taxa.obs[df.taxa.obs[, "CommonName"] == "brown trout", ] # Read in TaxaMapsCrossWalk.xlsx df.lu.taxa <- readxl::read_excel(taxa.lu, sheet = 1, col_names = TRUE, skip = 0) df.lu.taxa <- as.data.frame(df.lu.taxa[, c("CommonName", "MapName")]) df.lu.taxa[,"CommonName"] <- tolower(df.lu.taxa[, "CommonName"]) # 4. Munge Data #### # Compare TaxaName to CommonName taxa.names <- as.vector(df.lu.taxa[, "CommonName"]) matches <- as.vector(df.taxa.obs[, "CommonName"]) %in% taxa.names df.taxa.nomatch <- as.data.frame( unique(sort(df.taxa.obs[,"CommonName"][!matches]))) colnames(df.taxa.nomatch)[1] <- "CommonName" df.taxa.match <- as.data.frame( unique(sort(df.taxa.obs[, "CommonName"][matches]))) colnames(df.taxa.match)[1] <- "CommonName" #if (onlymatches == TRUE) { # Create a data frame of common names and filenames for matching taxa map.taxa <- subset(df.lu.taxa, CommonName %in% df.taxa.match[,"CommonName"], select = c(CommonName,MapName)) # 5. Mapping #### ppi <- 72 dsn <- file.path(dirMain, dirGIS) state <- sf::st_read(dsn = dsn, layer = "MD_State_Boundary") coastline <- sf::st_read(dsn = dsn, layer = "MD_Coast_Hydrology") counties <- sf::st_read(dsn = dsn, layer = "MD_Boundary_County_Detailed") i<-1 taxon <- as.character(map.taxa$CommonName[i]) filename <- map.taxa$MapName[i] df.taxon.sites <- subset(df.taxa.obs, df.taxa.obs[, "CommonName"] == taxon) df.taxon.sites <- subset(df.taxon.sites, !is.na(df.taxon.sites["Latitude83"])) plot(sf::st_geometry(state), col = "white", border = "gray") plot(sf::st_geometry(coastline), add = TRUE, col = "light blue", border = FALSE) plot(sf::st_geometry(counties), add = TRUE, col = "white", border = "darkslategray", lwd = 0.5) ## sites # 4326, WGS 84 # 6487, NAD83(2011)/Maryland in meters # 6488, NAD83(2011)/Maryland in US survey feet # 26985, NAD83/Maryland *this one* # 2248 sites_wgs84 <- sf::st_as_sf(df.taxon.sites, coords = c("Longitude83", "Latitude83"), crs = 4326) sites_nad83md_m <- sf::st_transform(sites_wgs84, crs = 26985) ### plot plot(sites_nad83md_m, pch = 21, col = "black", bg = "green", cex = 1.0, type = "p", add = TRUE) # open temp dir (Windows only) #shell.exec(tempdir())
Calculate stream discharge based on field measurements. Side channels that are properly identified in the data will be included.
The function FlowSum() requires no additional packages.
library(MBSStools) # data MBSS.flow <- MBSS.flow # calculate flow flow.cell <- FlowSum(MBSS.flow, returnType="cell") flow.sample <- FlowSum(MBSS.flow) # examine data View(flow.cell) View(flow.sample) # Example Save file (tab delimited text file) datetime <- format(Sys.time(),"%Y%m%d_%H%M%S") myYear <- "15" #write.table(flow.sample,paste0("SumFlow",myYear,"_",datetime,".tab"),row.names=FALSE,sep="\t")
The IonContrib() function calculates conductivity contribution from provided ion concentrations using Kohlrausch's Law. If a total conductivity measurement is not provided, the total conductivity is calculated as the sum of all ions present in the data. If the user provides a conductivity measurement, then ion contributions are a percentage of that number. In the latter case, "Other" is added as an ion to capture any percentage of total conductivity not represented by the provided ions. Plotting is done outside of this function.
Data will need to be in "wide" format. That is, a single record for each sample and the ions to be analyzed in columns.
A reference table of ions and their equivalent ionic conductance at infinite dilution is provided with the function as "MBSS.Ion.Ref". The function allows for input of a user supplied data frame if this table needs updating with more ions.
The function IonContrib() requires no additional packages.
knitr::kable(MBSS.Ion.Ref)
The columns for the ions need to match the reference table names. The function IonContrib() includes a reference table that is used by default but the user can supply their own reference table.
knitr::kable(head(MBSS.Ion.Data))
Example code is below. The user would then save the output (e.g., write.csv()).
library(MBSStools) # Load Data # MBSS.Ion.Data = example dataset data.ion <- MBSS.Ion.Data # Load Reference Table ref.ion <- MBSS.Ion.Ref # Calculate Percent Contribution by Ion (data table only) contrib.ion <- IonContrib(data.ion) knitr::kable(head(contrib.ion))
Dot charts are a good way to display the data. But any chart of the user's preference can be used.
# Load Data data.ion <- MBSS.Ion.Data # Load Reference Table ref.ion <- MBSS.Ion.Ref # Calculate Percent Contribution by Ion (data table only) contrib.ion <- IonContrib(data.ion) knitr::kable(contrib.ion) # Calculate Percent Contribution by Ion (data table and reference table) contrib.ion.2 <- IonContrib(data.ion, ref.ion) knitr::kable(contrib.ion.2) # Calculate Percent Contribution by Ion (data table, reference table, and conductivity values) # ## add dummy conductivity values for each sample myCond <- "Cond" data.ion[,myCond] <- 4000 contrib.ion.3 <- IonContrib(data.ion, ref.ion, myCond) knitr::kable(contrib.ion.3) # Save Results (use write.table()) # Plot Results myIons <- c("Chloride (mg/L)", "Bromide (mg/L)", "Nitrate-N (mg/L)", "Sulfate (mg/L)", "Sodium (mg/L)", "Potassium (mg/L)", "Magnesium (mg/L)", "Calcium (mg/L)" ) myIons.Contrib <- paste0("PctContrib.",myIons) mySite <- "BELK-101-X" data.plot <- subset(contrib.ion, contrib.ion[,"Site"]==mySite, select=c("Site","Cond.Total",myIons.Contrib,myIons)) ## Plot, one site (with cond value) dotchart(as.numeric(as.vector(data.plot[,myIons.Contrib])), labels=myIons, main=mySite, xlab="Percent Contribution", pch=19, pt.cex=1.2) mtext(paste0("Conductivity (uS/cm) = ",round(data.plot[,"Cond.Total"],1))) ## Plot all sites using panel.dotplot() in the lattice package # ## Plot as a horizontal bar plot # need to change margins to fit labels #par(mai=c(1,2,1,1)) barplot(as.numeric(as.vector(data.plot[,myIons.Contrib])), names.arg=myIons, main=mySite, xlab="Percent Contribution" , axes=TRUE, horiz=TRUE, las=1) mtext(paste0("Conductivity (uS/cm) = ",round(data.plot[,"Cond.Total"],1)))
The PHIcalc() function calculates the MBSS Physical Habitat Index (PHI), Paul et al. 2003. The report is included in the /extdata folder of the library.
There are two versions of the calculation based on pre- and post-2000. The function determines the correct calculation based on the data provided.
The input is a data frame with column names matching the variables collected in the field along with SampleID, bioregion, and area (acres).
The function PHIcalc() requires no additional packages.
x <- MBSS.PHI knitr::kable(head(x))
The function returns a dataframe of results that need to be saved (e.g., write.table()). The result table from the example data is shown below.
library(MBSStools) # data # MBSS.PHI = example dataset myData <- MBSS.PHI # calculate PHI PHI <- PHIcalc(myData) knitr::kable(head(PHI))
Included in the package is an example Shiny package to calculate the three IBI's for MBSS. The the function runShiny() requires the “shinyjs” package.
library(MBSStools) library(shinyjs) runShiny()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.