Disclaimer

The Benthos package is currently in the early stages of development. Be aware that the Benthos package will be constantly updated and major changes may occur. Additionally, errors may exist. I ask that if you do find an error please send me an email (zsmith@icprb.org) or visit my GitHub page (https://github.com/zsmith27). If you are R savvy, the easiest way to communicate issues and solutions is to add an issue on my GitHub page.

Description

The Benthos package is useful for the general assessment of benthic macroinvertebrate assemblages. More than 80 functions have been written to quickly assess common assemblage metrics. This document provides an overview of the functions contained within the package and their intended use. \s\s

Installing the Package

Currently, the package is only available on GitHub. GitHub (https://github.com/) provides version control for R-packages and allows the package to be easily shared or copied. Once the Benthos package has been thoroughly vetted the ultimate goal will be to add the package to CRAN.

To install the Benthos package you will first need to install the package dplyr.

install.packages("dplyr")

You only need to install dplyr once to your computer. After you have installed dplyr, you will need to load it into your library and then use the function install_github() to install the Benthos package.

library(devtools)
install_github("zsmith27/Benthos", build_vignettes = TRUE)

I recommend that you run the script above each time you use the Benthos package to insure that you have the most up-to-date version.

Overview of Workflow

  1. Prepare data in R or another program to meet the formatting requirements (See Data Preparation).
  2. Use the check_data function to identify potential formatting issues or issues merging the taxa count data with the master (See Check Data).
  3. Calculate macroinvertebrate metrics.

Master Taxa List

The Master Taxa List (MTL) refers to a table containing taxonomic hierarchy information and taxonomic attributes (e.g., tolerance values, functional feeding groups, and habits). The Benthos package relies entirely on the MTL (master) to calculate metrics, which is embedded within the package. Use the following script to load the MTL:

data(master)

Many metrics require that a taxon be rolled up to lower taxonomic resolution. For example, if all of the taxa are identified to the genus-level, the counts must be rolled up and aggregated at the order-level to calculate the percentage of Ephemeroptera taxa. The master includes all of the necessary information for aggregating the reported taxa at various taxonomic levels.

The master also includes the available taxonomic attributes for calculating tolerance metrics, functional feeding group metrics, and habit metrics. During metric calculations the master is referenced to assign the appropriate taxonomic attribute.

If a taxon or taxa in your dataset are not present in the master, metrics cannot be correctly calculated. Therefore, you will receive an error if you attempt to include a taxon in your dataset that is not present in the master. Users are welcome to modify the master in R or to export the file as a .csv for modifications in other programs. To export as a .csv:

write.csv(master, "master.csv", row.names = FALSE)

If the master is missing a taxon or taxa that you would like to be included, please email me and I will add the taxon to the list as soon as possible. Additionally, it is possible to append accessory taxonomic attributes assigned by your agency to the master. I urge you to send me your agencies taxonomic attributes to be included in the publically available master. My goal is to create a comprehensive table of all of the available taxonomic attributes. This will allow agencies to fill gaps in their attribute assignments or to compare their attribute assignments to those of other agencies. If you would like your agencies attributes to be publically available in the Benthos package, please send me an email with the attributes in a .csv or .xlsx file. Please try to match the format of the master as closely as possible before sending your attribute table.

Data Preparation

Preparing your data for the Benthos package will be the most time consuming and difficult aspect of using the package. It is too difficult to accommodate various data formats. Therefore, I have created a strict data format that must be followed to use this package. For government agencies, I will consider adding a specific function to make data preparation quick and easy.

It is required that your data is in a long format with the following eight columns:

  1. UNIQUE_ID - A name that represents each unique sample. Data will be aggregate by the UNIQUE_ID during metric calculations. Therefore, you must make sure that each sample you want treated separately has a unique UNIQUE_ID. If you do not already have an UNIQUE_ID, I recommend that you concatenate STATION_ID, AGENCY_CODE, DATE, and SAMPLE_NUMBER columns. This can easily be done in R with the paste function. Example:
my.df$UNIQUE_ID <- paste(my.df$STATION_ID, my.df$AGENCY, my.df$DATE, my.df$REPLICATE, sep = "_")
  1. STATION_ID - Typically a one word or string of characters that represents a particular sampling location. These may be duplicated because samples were collected from the station multiple times or if replicates were collected.
  2. AGENCY_CODE - For most users this column will be unnecessary and you can fill it with any constant variable that you choose. However, some users may have data from multiple agencies or projects. This column allows you to carry that information through the metric calculation process.
  3. DATE - The date that the sample was collected. I do not convert this column to R's class Date. Therefore, the DATE column can be in any date format you prefer and time can be included as well.
  4. METHOD - The collection method.
  5. SAMPLE_NUMBER - Indicates the number of samples collected during a sampling event. If you do not have any replicates, then fill this column with 1's.
  6. FINAL_ID - The taxon collected.
    • Must be a single string of characters. To include species as the final ID, genus and species must be concatenated with an underscore separation (e.g., HAGENIUS_BREVISTYLUS).
    • No special or additional characters (e.g., "Caenis sp." or "Undetermined Chironomidae"). The final ID will be matched with a row in the master. Therefore, special or additional characters will prevent the taxon from being properly merged with the appropriate taxonomic hierarchy and attributes.
      • Advanced users can modify the master to include final IDs that they prefer or to add additional taxa to the list. If you do find that a taxon or taxa are missing from the master, please send me an email with as much information on the taxon as possible and I will add it to the master contained within this package.
  7. REPORTING_VALUE - The number of each taxon collected.

All of the column names and the FINAL_ID are capitalized to enable quick identification of typos that I find difficult with the Camel Case format. The prep_caps() function will automatically convert the column names and the characters in the FINAL_ID column to all capital letters.

Importing Taxa Counts into R

To import your data into R, I recommend you save you table as a .csv file and use the following script:

# Set working directory. 
# This should specify where your taxonomic counts table is located on your computer.
# Note: must be separated by forward slashes (/).
setwd("C:/Users/zsmith/Desktop")

# Specify the file you want to import and assign the dataframe a name.
# In this case I named the dataframe "taxa.df".
taxa.df <- read.csv("my_taxa.csv", stringsAsFactors = FALSE)

Data Check

Calculate All Metrics

The user has the ability to calculate metrics individually but it is recommended that the user calculate all metrics at once. The function all_metrics() does all of the work necessary to transform the data and calculate all of the available metrics in the Benthos package.

metrics.df <- all_metrics(long.df = taxa.df, master.df = master, taxa.rank = "FAMILY", pct_un = 0)
  1. long.df = Taxonomic counts in a long data format.
  2. master.df = The master taxa list contains taxonomic ranks from phylum to species and known taxonomic attributes.
    • If you modified the Master Taxa List (master), then you need to specify the modified Master Taxa List here.
  3. taxa.rank = The lowest taxonomic rank ("ORDER", "FAMILY", or "GENUS") used to calculate the metrics. If the majority of your taxa are identified to the family level, then it would be inappropriate to perform metric calculations at the genus level.

The wide function



zsmith27/Benthos documentation built on May 5, 2019, 2:38 a.m.