knitr::opts_chunk$set(echo=TRUE)
library(knitr)
library(CASTfxn)

Introduction

CASTfxn was created by Tetra Tech, Inc. for in 2018 to support the Causal Assessment Screening Tool. This tool aides data managers to more quickly analyze large amounts of data for causal assessment. It is not meant to be the final arbiter. But it can aid in the filtering the data.

It is envisioned that this library can be a living package and will add additional functions and routines in the future.

Many of the examples in this vignette are included in the examples in the corresponding functions. Each function in the CASTfxn library includes example data.

The typical user of this package is envisioned as a data manager familiar with the structure of the data being used as well as the process of causal assessment. A basic understanding in the use of R is recommended.

Installation

The R library is saved on GitHub for ease of updating and distribution. Issues can be tracked, fixed, and code made available for download. Current users can update the library using the same code used to install the library (see below). Devtools is required for the installation.

install.packages("devtools")
library(devtools)
install_github("leppott/CASTfxn")

To install a specific version of the tool the release can be added to the install command. For example to install version 9.2 (not a valid version at this point) the code would be install_github("leppott/CASTfxn@v9.2").

To contact the authors directly email Erik.Leppo@tetratech.com or Ann.Lincoln@tetratech.com.

Contents

  1. Preparing Raw Data

  2. CASTfxn Operations

Preparing Raw Data

The CASTfxn library makes some assumptions on file names and structure.

File Names

The details section of each function names the files and required fields necessary to run that function.

File Formats

The package does not import any files so there is no standard file format. The user is expected to have the ability and skill to import the files into R.

File Structure

The first line are the field (column) headers.

CASTfxn Functions

There is no single overall function but a series of functions.

Operations

The operations of the CASTfxn package are listed below but will be explained in more detail in their own section. The functions are listed not in alphabetical order but in the order the functions are intended to run. That is, some functions require the output from other functions.

The library assumes that there are the following folders in the working directory:

The folder "Results" is for some of the outputs.

Some of the functions assume that some datasets not included in the function parameters are loaded into the global environment. These are mentioned in the Details section of each function. It is anticipated that a future version of the package will explicitly list these datasets in the parameters of each function.

getSiteInfo

Output is a jpg map to the "Results" directory of the working directory. And a summary list; SiteInfo, Samps , BMImetrics, AlgMetrics, ReachInfo, COMID, ClustIDs, impair, and mods.

The summary list is returned to the console. It is not saved to file. The map is saved as SitID.map.jpg, e.g., SRCKN001.61.map.jpg.

TargetSiteID <- "SRCKN001.61"
dir_results <- file.path(getwd(), "Results")

CurrentDir <- getwd()
myDir.Data <- paste(CurrentDir,"data/",sep="/")

# data, example included with package
data.Stations.Info <- data_Sites
data.SampSummary   <- data_SampSummary
data.303d.ComID    <- data_303d
data.bmi.metrics   <- data_BMIMetrics
data.algae.metrics <- data_AlgMetrics
data.cluster       <- data_Cluster_Hi
data.mod           <- data_ReachMod

# Map data
map_flowline  <- data_GIS_Flow_HI
map_flowline2 <- data_GIS_Flow_LO
map_outline   <- data_GIS_AZ_Outline
# Project site data to USGS Albers Equal Area
usgs.aea <- "+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23
              +lon_0=-96 +x_0=0 +y_0=0 +datum=NAD83
              +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0"
# projection for outline
my.aea <- "+proj=aea +lat_1=20 +lat_2=60 +lat_0=40 +lon_0=-96 +x_0=0 +y_0=0 
           +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0"
map_proj <- my.aea

# Run getSiteInfo
list.SiteSummary <- getSiteInfo(TargetSiteID
                                , dir_results
                                , data.Stations.Info
                                , data.SampSummary
                                , data.303d.ComID
                                , data.bmi.metrics
                                , data.algae.metrics
                                , data.cluster
                                , data.mod
                                , map_proj
                                , map_outline
                                , map_flowline)

The structure of the data is below.

str(list.SiteSummary)

The map (shown below) includes: state border (black line) NHDplus flowlines (light blue lines) all sites (dark gray close circles) cluster sites (cyan closed circles) reference sites (blue open circles) target site (red closed triangle)

TargetSiteID <- "SRCKN001.61"
myJPG <- paste0(TargetSiteID,".map.jpg")
fn.img <- file.path(getwd(), "Results", TargetSiteID, myJPG)
include_graphics(fn.img)

getChemDataSubsets

Get chemical data subsets.

Requires the output of:

The return is a summary list; ref.sites, ref.reaches, cluster.samps, chem.info , all.chems, cluster.chem, and site.chem.

TargetSiteID <- "SRCKN001.61"

CurrentDir<-getwd()
myDir.Data <- paste(CurrentDir,"data/",sep="/")
dir_results <- file.path(getwd(), "Results")

# Run getSiteInfo
# data, example included with package
data.Stations.Info <- data_Sites
data.SampSummary   <- data_SampSummary
data.303d.ComID    <- data_303d
data.bmi.metrics   <- data_BMIMetrics
data.algae.metrics <- data_AlgMetrics
data.mod           <- data_ReachMod
#
#' # Cluster based on elevation category  # need for getSiteInfo and getChemDataSubsets
elev_cat <- toupper(data.Stations.Info[data.Stations.Info[,"StationID_Master"]==TargetSiteID, "ElevCategory"])
if(elev_cat=="HI"){
   data.cluster <- data_Cluster_Hi
} else if(elev_cat=="LO") {
   data.cluster <- data_Cluster_Lo
}
#
# Map data
# AZ
map_flowline  <- data_GIS_Flow_HI
map_flowline2 <- data_GIS_Flow_LO
map_outline   <- data_GIS_AZ_Outline
# Project site data to USGS Albers Equal Area
usgs.aea <- "+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23
              +lon_0=-96 +x_0=0 +y_0=0 +datum=NAD83
              +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0"
# projection for outline
my.aea <- "+proj=aea +lat_1=20 +lat_2=60 +lat_0=40 +lon_0=-96 +x_0=0 +y_0=0 
           +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0"
map_proj <- my.aea

#
list.SiteSummary <- getSiteInfo(TargetSiteID
                                , dir_results
                                , data.Stations.Info
                                , data.SampSummary
                                , data.303d.ComID
                                , data.bmi.metrics
                                , data.algae.metrics
                                , data.cluster
                                , data.mod
                                , map_proj
                                , map_outline
                                , map_flowline)

# Data getChemDataSubset
site.COMID <- list.SiteSummary$COMID
site.Clusters <- list.SiteSummary$ClustIDs
data.chem.raw      <- data_Chem
data.chem.info     <- data_ChemInfo

# Run getChemDataSubsets
list.data <- getChemDataSubsets(TargetSiteID
                                , comid=site.COMID
                                , cluster=site.Clusters
                                , data.cluster=data.cluster
                                , data.Stations.Info=data.Stations.Info
                                , data.chem.raw=data.chem.raw
                                , data.chem.info=data.chem.info)

The structure of the data is below.

str(list.data)

getStressorList

Box plots of each stressor, grouped by category.

Requires the output of:

Returns a jpeg in the "Results" subdirectory of the working directory with box plots. Also returns a list of stressors; stressors and site.stressor.pctrank.

The plots are named as SiteID.boxes.category.jpg, e.g., SRCKN001.61.boxes.Nutrients.jpg.

TargetSiteID <- "SRCKN001.61"

CurrentDir<-getwd()
myDir.Data <- paste(CurrentDir,"data/",sep="/")
dir_results <- file.path(getwd(), "Results")

# Data getSiteInfo
# data, example included with package
data.Stations.Info <- data_Sites        # need for getSiteInfo and getChemDataSubsets
data.SampSummary   <- data_SampSummary
data.303d.ComID    <- data_303d
data.bmi.metrics   <- data_BMIMetrics
data.algae.metrics <- data_AlgMetrics
data.mod           <- data_ReachMod

# Cluster based on elevation category  # need for getSiteInfo and getChemDataSubsets
elev_cat <- toupper(data.Stations.Info[data.Stations.Info[,"StationID_Master"]==TargetSiteID, "ElevCategory"])
if(elev_cat=="HI"){
   data.cluster <- data_Cluster_Hi
} else if(elev_cat=="LO") {
   data.cluster <- data_Cluster_Lo
}

# Map data
# AZ
map_flowline  <- data_GIS_Flow_HI
map_flowline2 <- data_GIS_Flow_LO
map_outline   <- data_GIS_AZ_Outline
# Project site data to USGS Albers Equal Area
usgs.aea <- "+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23
              +lon_0=-96 +x_0=0 +y_0=0 +datum=NAD83
              +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0"
# projection for outline
my.aea <- "+proj=aea +lat_1=20 +lat_2=60 +lat_0=40 +lon_0=-96 +x_0=0 +y_0=0 
           +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0"
map_proj <- my.aea

# Run getSiteInfo
list.SiteSummary <- getSiteInfo(TargetSiteID, dir_results, data.Stations.Info
                                , data.SampSummary, data.303d.ComID
                                , data.bmi.metrics, data.algae.metrics
                                , data.cluster, data.mod
                                , map_proj, map_outline, map_flowline)

# Data getChemDataSubsets
# data import, example 
# data.chem.raw <- read.delim(paste(myDir.Data,"data.chem.raw.tab",sep=""),na.strings = c(""," "))
# data.chem.info <- read.delim(paste(myDir.Data,"data.chem.info.tab",sep=""))
# data, example included with package
site.COMID <- list.SiteSummary$COMID
site.Clusters <- list.SiteSummary$ClustIDs
data.chem.raw <- data_Chem
data.chem.info <- data_ChemInfo

# Run getChemDataSubsets
list.data <- getChemDataSubsets(TargetSiteID, comid=site.COMID, cluster=site.Clusters
                                , data.cluster=data.cluster, data.Stations.Info=data.Stations.Info
                                , data.chem.raw=data.chem.raw, data.chem.info=data.chem.info)

# datasets getStressorList
chem.info <- list.data$chem.info
cluster.chem <- list.data$cluster.chem
cluster.samps <- list.data$cluster.samps
ref.sites <- list.data$ref.sites
site.chem <- list.data$site.chem

# set cutoff for possible stressor identification
probsLow <- 0.10
probsHigh <- 0.90 

# Run getStressorList
list.stressors <- getStressorList(TargetSiteID, site.Clusters, chem.info, cluster.chem
                                 , cluster.samps, ref.sites, site.chem
                                 , probsHigh, probsLow)

The structure of the returned data is below.

str(list.stressors)

One text file is created.

And plots are created and saved per stressor group. The "nutrient" plot is shown below.

TargetSiteID <- "SRCKN001.61"
# filename
fn.pctrank <- file.path(getwd(), "Results", TargetSiteID
                        , paste0(TargetSiteID, ".chem.pctrank.txt"))
# read
df.pctrank <- read.delim(fn.pctrank)
# Structure
str(df.pctrank)
# create table
# first 6 rows and columns
kable(df.pctrank[1:6, 1:6], caption="partial table")
# box plot
myJPG <- paste0(TargetSiteID,".boxes.Nutrients.jpg")
fn.img <- file.path(getwd(), "Results", TargetSiteID, myJPG)
include_graphics(fn.img)

getClusterInfo

Summary cluster information.

Requires the output of:

The return is a jpeg in the "Results" subdirectory of the working directory.
The plots are names with SiteID.cluster.parameter.jpg, e.g., SRCKN001.61.cluster.ElevWs.jpg.

In this case example 95 plots are created.

TargetSiteID <- "SRCKN001.61"

CurrentDir<-getwd()
myDir.Data <- paste(CurrentDir,"data/",sep="/")
dir_results <- file.path(getwd(), "Results")

# Data getSiteInfo
# data, example included with package
data.Stations.Info <- data_Sites       # need for getSiteInfo and getChemDataSubsets
data.SampSummary   <- data_SampSummary
data.303d.ComID    <- data_303d
data.bmi.metrics   <- data_BMIMetrics
data.algae.metrics <- data_AlgMetrics
data.mod           <- data_ReachMod

# Cluster based on elevation category  # need for getSiteInfo and getChemDataSubsets
elev_cat <- toupper(data.Stations.Info[data.Stations.Info[,"StationID_Master"]==TargetSiteID, "ElevCategory"])
if(elev_cat=="HI"){
   data.cluster <- data_Cluster_Hi
} else if(elev_cat=="LO") {
   data.cluster <- data_Cluster_Lo
}

# Map data
# AZ
map_flowline  <- data_GIS_Flow_HI
map_flowline2 <- data_GIS_Flow_LO
map_outline   <- data_GIS_AZ_Outline
# Project site data to USGS Albers Equal Area
usgs.aea <- "+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23
              +lon_0=-96 +x_0=0 +y_0=0 +datum=NAD83
              +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0"
# projection for outline
my.aea <- "+proj=aea +lat_1=20 +lat_2=60 +lat_0=40 +lon_0=-96 +x_0=0 +y_0=0 
           +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0"
map_proj <- my.aea

# Run getSiteInfo
list.SiteSummary <- getSiteInfo(TargetSiteID, dir_results, data.Stations.Info
                                , data.SampSummary, data.303d.ComID
                                , data.bmi.metrics, data.algae.metrics
                                , data.cluster, data.mod
                                , map_proj, map_outline, map_flowline)

# Data getChemDataSubsets
# data, example included with package
data.chem.raw <- data_Chem
data.chem.info <- data_ChemInfo
site.COMID <- list.SiteSummary$COMID
site.Clusters <- list.SiteSummary$ClustIDs

#
# Run getChemDataSubsets
list.data <- getChemDataSubsets(TargetSiteID, comid=site.COMID, cluster=site.Clusters
                                , data.cluster=data.cluster, data.Stations.Info=data.Stations.Info
                                , data.chem.raw=data.chem.raw, data.chem.info=data.chem.info)

# Data getClusterInfo
ref.reaches <- list.data$ref.reaches
refSiteCOMIDs <- list.data$ref.reaches

# Run getClusterInfo
getClusterInfo(site.COMID, site.Clusters, ref.reaches, dir_results)

For each variable in the test data set a "Ws" (Watershed) and "Cat" (Catchment) version are created. This is how the variables exist in the data.

TargetSiteID <- "SRCKN001.61"
myJPG <- paste0(TargetSiteID,".cluster.ElevWs.jpg")
fn.img <- file.path(getwd(), "Results", TargetSiteID, myJPG)
include_graphics(fn.img)

getBioMatches, BMI

Get BMI and chem sample matches.

Requires the output of:

The return is a summary list; all.b.str, cl.b.str, site.b.str, all.b.rsp , cl.b.rsp, and site.b.rsp. No files are created or saved.

TargetSiteID <- "SRCKN001.61"

CurrentDir<-getwd()
myDir.Data <- paste(CurrentDir,"data/",sep="/")
dir_results <- file.path(getwd(), "Results")

# Data getSiteInfo
# data, example included with package
data.Stations.Info <- data_Sites        # need for getSiteInfo and getChemDataSubsets
data.SampSummary   <- data_SampSummary
data.303d.ComID    <- data_303d
data.bmi.metrics   <- data_BMIMetrics
data.algae.metrics <- data_AlgMetrics
data.cluster       <- data_Cluster_Hi   # need for getSiteInfo and getChemDataSubsets
data.mod           <- data_ReachMod

# Map data
# AZ
map_flowline  <- data_GIS_Flow_HI
map_flowline2 <- data_GIS_Flow_LO
map_outline   <- data_GIS_AZ_Outline
# Project site data to USGS Albers Equal Area
usgs.aea <- "+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23
              +lon_0=-96 +x_0=0 +y_0=0 +datum=NAD83
              +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0"
# projection for outline
my.aea <- "+proj=aea +lat_1=20 +lat_2=60 +lat_0=40 +lon_0=-96 +x_0=0 +y_0=0 
           +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0"
map_proj <- my.aea

# Run getSiteInfo
list.SiteSummary <- getSiteInfo(TargetSiteID, dir_results, data.Stations.Info
                                , data.SampSummary, data.303d.ComID
                                , data.bmi.metrics, data.algae.metrics
                                , data.cluster, data.mod
                                , map_proj, map_outline, map_flowline)

# Data getChemDataSubsets
site.COMID <- list.SiteSummary$COMID
site.Clusters <- list.SiteSummary$ClustIDs
# data, example included with package
data.chem.raw <- data_Chem
data.chem.info <- data_ChemInfo

# Run getChemDataSubsets
list.data <- getChemDataSubsets(TargetSiteID, comid=site.COMID, cluster=site.Clusters
                                , data.cluster=data.cluster, data.Stations.Info=data.Stations.Info
                                , data.chem.raw=data.chem.raw, data.chem.info=data.chem.info)

# Data getStressorList
chem.info <- list.data$chem.info
cluster.chem <- list.data$cluster.chem
cluster.samps <- list.data$cluster.samps
ref.sites <- list.data$ref.sites
site.chem <- list.data$site.chem

# set cutoff for possible stressor identification
probsLow <- 0.10
probsHigh <- 0.90 

# Run getStressorList
list.stressors <- getStressorList(TargetSiteID, site.Clusters, chem.info, cluster.chem
                                 , cluster.samps, ref.sites, site.chem
                                 , probsHigh, probsLow)

# Data getBioMatches
## remove "none"
stressors <- list.stressors$stressors[list.stressors$stressors != "none"]


# Run getBioMatches
biocomm <- "BMI"
data.bio.metrics <- data_BMIMetrics
list.MatchBMIData <- getBioMatches(stressors, list.data,  list.SiteSummary, data.SampSummary, data.chem.raw, data.bio.metrics, biocomm)

The structure of the returned data is below.

str(list.MatchBMIData)

getBioStressorResponses

Get biologica stressor responses.

Requires the output of:

The return is jpg in "Results" folder of working directory for each parameter. And a tab-delimited text file of stressor correlations.

Each parameter is plotted and scored. SCORING = ???????

The jpg files are in teh format of SiteID.BMI.SR.parameter.jpg, e.g., SRCKN001.61BMI.SR.Arsenic_inorg_uf_ug_L_DipTaxSPL_Sc.jpg.

The text file is in the format of ......

predint <- 0.75
varLegLoc <- "topright"

TargetSiteID <- "SRCKN001.61"

CurrentDir<-getwd()
myDir.Data <- paste(CurrentDir,"data/",sep="/")
dir_results <- file.path(getwd(), "Results")

# datasets getSiteInfo
# data, example included with package
data.Stations.Info <- data_Sites       # need for getSiteInfo and getChemDataSubsets
data.SampSummary   <- data_SampSummary
data.303d.ComID    <- data_303d
data.bmi.metrics   <- data_BMIMetrics
data.algae.metrics <- data_AlgMetrics
data.mod           <- data_ReachMod

# Cluster based on elevation category  # need for getSiteInfo and getChemDataSubsets
elev_cat <- toupper(data.Stations.Info[data.Stations.Info[,"StationID_Master"]==TargetSiteID, "ElevCategory"])
if(elev_cat=="HI"){
   data.cluster <- data_Cluster_Hi
} else if(elev_cat=="LO") {
   data.cluster <- data_Cluster_Lo
}

# Map data
# AZ
map_flowline  <- data_GIS_Flow_HI
map_flowline2 <- data_GIS_Flow_LO
map_outline   <- data_GIS_AZ_Outline
# Project site data to USGS Albers Equal Area
usgs.aea <- "+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23
              +lon_0=-96 +x_0=0 +y_0=0 +datum=NAD83
              +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0"
# projection for outline
my.aea <- "+proj=aea +lat_1=20 +lat_2=60 +lat_0=40 +lon_0=-96 +x_0=0 +y_0=0 
           +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0"
map_proj <- my.aea

# Run getSiteInfo
list.SiteSummary <- getSiteInfo(TargetSiteID, dir_results, data.Stations.Info
                                , data.SampSummary, data.303d.ComID
                                , data.bmi.metrics, data.algae.metrics
                                , data.cluster, data.mod
                                , map_proj, map_outline, map_flowline)

# Data getChemDataSubsets
# data import, example 
site.COMID <- list.SiteSummary$COMID
site.Clusters <- list.SiteSummary$ClustIDs
# data, example included with package
data.chem.raw <- data_Chem
data.chem.info <- data_ChemInfo

# Run getChemDataSubsets
list.data <- getChemDataSubsets(TargetSiteID, comid=site.COMID, cluster=site.Clusters
                                , data.cluster=data.cluster, data.Stations.Info=data.Stations.Info
                                , data.chem.raw=data.chem.raw, data.chem.info=data.chem.info)

# Data getStressorList
chem.info <- list.data$chem.info
cluster.chem <- list.data$cluster.chem
cluster.samps <- list.data$cluster.samps
ref.sites <- list.data$ref.sites
site.chem <- list.data$site.chem

# set cutoff for possible stressor identification
probsLow <- 0.10
probsHigh <- 0.90 

# Run getStressorList
list.stressors <- getStressorList(TargetSiteID, site.Clusters, chem.info, cluster.chem
                                 , cluster.samps, ref.sites, site.chem
                                 , probsHigh, probsLow)

# Data getBioMatches
## remove "none"
stressors <- list.stressors$stressors[list.stressors$stressors != "none"]
stressors_logtransf <- list.stressors$stressors_LogTransf[list.stressors$stressors != "none"]
LogTransf <- stressors_logtransf

# Run getBioMatches
biocomm <- "BMI"
list.MatchBioData <- getBioMatches(stressors, list.data, biocomm)     

# Data getBMIStressorResponses   
biocomm <- "BMI"    
BioResp <- c("IBI", "TotalTaxSPL_Sc", "DipTaxSPL_Sc"
             , "IntolTaxSPL_Sc", "HBISPL_Sc", "PlecoPct_Sc", "ScrapPctSPL_Sc"
             , "TrichTax_Sc", "EphemTax_Sc", "EphemPct_Sc", "Dom01PctSPL_Sc") 

# Run getBMIStressorResponses                 
getBioStressorResponses(TargetSiteID, stressors, BioResp, list.MatchBioData
                        , LogTransf, ref.sites, predint, varLegLoc, biocomm)

The structure of the returned data is below.

str(list.MatchBMIData)

getBioMatches, Algae

Get Algae and chem sample matches.

Requires the output of:

The return is a summary list; all.b.str, cl.b.str, site.b.str, all.b.rsp, cl.b.rsp , and site.b.rsp.

The example data included with the package does not include algal data so no outputs are produced. The output is similar to getBMImatches.

TargetSiteID <- "LCBEN002.57"

CurrentDir<-getwd()
myDir.Data <- paste(CurrentDir,"data/",sep="/")
dir_results <- file.path(getwd(), "Results")

# Data getSiteInfo
# data, example included with package
data.Stations.Info <- data_Sites        # need for getSiteInfo and getChemDataSubsets
data.SampSummary   <- data_SampSummary
data.303d.ComID    <- data_303d
data.bmi.metrics   <- data_BMIMetrics
data.algae.metrics <- data_AlgMetrics
data.mod           <- data_ReachMod

# Cluster based on elevation category  # need for getSiteInfo and getChemDataSubsets
elev_cat <- toupper(data.Stations.Info[data.Stations.Info[,"StationID_Master"]==TargetSiteID, "ElevCategory"])
if(elev_cat=="HI"){
   data.cluster <- data_Cluster_Hi
} else if(elev_cat=="LO") {
   data.cluster <- data_Cluster_Lo
}

# Map data
# San Diego
#flowline <- rgdal::readOGR(dsn = "data_gis/NHDv2_Flowline_Ecoreg85", layer = "NHDv2_eco85_Project")
#outline <- rgdal::readOGR(dsn = "data_gis/Eco85", layer = "Ecoregion85")
# AZ
map_flowline  <- data_GIS_Flow_HI
map_flowline2 <- data_GIS_Flow_LO
map_outline   <- data_GIS_AZ_Outline
# Project site data to USGS Albers Equal Area
usgs.aea <- "+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23
              +lon_0=-96 +x_0=0 +y_0=0 +datum=NAD83
              +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0"
# projection for outline
my.aea <- "+proj=aea +lat_1=20 +lat_2=60 +lat_0=40 +lon_0=-96 +x_0=0 +y_0=0 
           +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0"
map_proj <- my.aea

# Run getSiteInfo
list.SiteSummary <- getSiteInfo(TargetSiteID, dir_results, data.Stations.Info
                                , data.SampSummary, data.303d.ComID
                                , data.bmi.metrics, data.algae.metrics
                                , data.cluster, data.mod
                                , map_proj, map_outline, map_flowline)

# Data getChemDataSubsets
# data, example included with package
data.chem.raw <- data_Chem
data.chem.info <- data_ChemInfo
site.COMID <- list.SiteSummary$COMID
site.Clusters <- list.SiteSummary$ClustIDs

# Run getChemDataSubsets
list.data <- getChemDataSubsets(TargetSiteID, comid=site.COMID, cluster=site.Clusters
                                , data.cluster=data.cluster, data.Stations.Info=data.Stations.Info
                                , data.chem.raw=data.chem.raw, data.chem.info=data.chem.info)

# Data getStressorList
chem.info <- list.data$chem.info
cluster.chem <- list.data$cluster.chem
cluster.samps <- list.data$cluster.samps
ref.sites <- list.data$ref.sites
site.chem <- list.data$site.chem

# set cutoff for possible stressor identification
probsLow <- 0.10
probsHigh <- 0.90 

# Run getStressorList
list.stressors <- getStressorList(TargetSiteID, site.Clusters, chem.info, cluster.chem
                                 , cluster.samps, ref.sites, site.chem
                                 , probsHigh, probsLow)

# Data getBioMatches
## remove "none"
stressors <- list.stressors$stressors[list.stressors$stressors != "none"]

# Run getBioMatches
biocomm <- "algae"
list.MatchAlgData <- getBioMatches(stressors, list.data, biocomm)

The structure of the returned data is below.

str(list.MatchAlgData)

getBioStressorResponses, Algae

Algae stressor regressions.

Requires the output of:

The return is a jpg in "Results" folder of working directory for each paramter. And a tab-delimited text file of stressor correlations.

The example data included with the package does not include algal data so no outputs are produced. The output is similar to getBMIStressorResponses

predint <- 0.75
varLegLoc <- "topright"

TargetSiteID <- "LCBEN002.57"

CurrentDir<-getwd()
myDir.Data <- paste(CurrentDir,"data/",sep="/")
dir_results <- file.path(getwd(), "Results")

# datasets getSiteInfo
# data, example included with package
data.Stations.Info <- data_Sites       # need for getSiteInfo and getChemDataSubsets
data.SampSummary   <- data_SampSummary
data.303d.ComID    <- data_303d
data.bmi.metrics   <- data_BMIMetrics
data.algae.metrics <- data_AlgMetrics
data.mod           <- data_ReachMod

#' # Cluster based on elevation category  # need for getSiteInfo and getChemDataSubsets
elev_cat <- toupper(data.Stations.Info[data.Stations.Info[,"StationID_Master"]==TargetSiteID, "ElevCategory"])
if(elev_cat=="HI"){
   data.cluster <- data_Cluster_Hi
} else if(elev_cat=="LO") {
   data.cluster <- data_Cluster_Lo
}

# Map data
# San Diego
# AZ
map_flowline  <- data_GIS_Flow_HI
map_flowline2 <- data_GIS_Flow_LO
map_outline   <- data_GIS_AZ_Outline
# Project site data to USGS Albers Equal Area
usgs.aea <- "+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23
              +lon_0=-96 +x_0=0 +y_0=0 +datum=NAD83
              +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0"
# projection for outline
my.aea <- "+proj=aea +lat_1=20 +lat_2=60 +lat_0=40 +lon_0=-96 +x_0=0 +y_0=0 
           +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0"
map_proj <- my.aea

# Run getSiteInfo
list.SiteSummary <- getSiteInfo(TargetSiteID, dir_results, data.Stations.Info
                                , data.SampSummary, data.303d.ComID
                                , data.bmi.metrics, data.algae.metrics
                                , data.cluster, data.mod
                                , map_proj, map_outline, map_flowline)

# Data getChemDataSubsets
# data, example included with package
data.chem.raw <- data_Chem
data.chem.info <- data_ChemInfo
site.COMID <- list.SiteSummary$COMID
site.Clusters <- list.SiteSummary$ClustIDs

# Run getChemDataSubsets
list.data <- getChemDataSubsets(TargetSiteID, comid=site.COMID, cluster=site.Clusters
                                , data.cluster=data.cluster, data.Stations.Info=data.Stations.Info
                                , data.chem.raw=data.chem.raw, data.chem.info=data.chem.info)

# Data getStressorList
chem.info <- list.data$chem.info
cluster.chem <- list.data$cluster.chem
cluster.samps <- list.data$cluster.samps
ref.sites <- list.data$ref.sites
site.chem <- list.data$site.chem

# set cutoff for possible stressor identification
probsLow <- 0.10
probsHigh <- 0.90 

# Run getStressorList
list.stressors <- getStressorList(TargetSiteID, site.Clusters, chem.info, cluster.chem
                                 , cluster.samps, ref.sites, site.chem
                                 , probsHigh, probsLow)

# Data getBioMatches
## remove "none"
stressors <- list.stressors$stressors[list.stressors$stressors != "none"]
stressors_logtransf <- list.stressors$stressors_LogTransf[list.stressors$stressors != "none"]
LogTransf <- stressors_logtransf

# Run getBioMatches
biocomm <- "algae"
list.MatchBioData <- getBioMatches(stressors, list.data, biocomm)

# Data getBioStressorResponses
data.algae.metrics <- data_AlgMetrics
BioResp <- colnames(data.algae.metrics[6:48])
biocomm <- "algae"

# Run getAlgStressorResponses
getBioStressorResponses(TargetSiteID, stressors, BioResp, list.MatchBioData
                       , LogTransf, ref.sites, predint, varLegLoc, biocomm)

The structure of the returned data is below.

str(list.MatchAlgData)

getStressorSpecificRegressions

Get stressor specific regressions.

Requires the output of:

The return is Jpeg files to "Results" folder in working directory. And a tab-delimited text file.

predint <- 0.75
varLegLoc <- "topright"

TargetSiteID <- "SRCKN001.61"

CurrentDir <- getwd()
myDir.Data <- paste(CurrentDir,"data/",sep="/")

# datasets getSiteInfo
# data, example included with package
data.Stations.Info <- data_Sites
data.SampSummary   <- data_SampSummary
data.303d.ComID    <- data_303d
data.bmi.metrics   <- data_BMIMetrics
data.algae.metrics <- data_AlgMetrics
data.cluster       <- data_Cluster_Hi
data.mod           <- data_ReachMod
#
# Run getSiteInfo
list.SiteSummary <- getSiteInfo(TargetSiteID)

# datasets getChemDataSubsets
site.COMID <- list.SiteSummary$COMID
site.Clusters <- list.SiteSummary$ClustIDs

# data import, example 
# data.chem.raw <- read.delim(paste(myDir.Data,"data.chem.raw.tab",sep=""),na.strings = c(""," "))
# data.chem.info <- read.delim(paste(myDir.Data,"data.chem.info.tab",sep=""))

# data, example included with package
data.chem.raw <- data_Chem
data.chem.info <- data_ChemInfo

# Run getChemDataSubsets
list.data <- getChemDataSubsets(TargetSiteID, site.COMID, site.Clusters)

# datasets getStressorList
chem.info <- list.data$chem.info
cluster.chem <- list.data$cluster.chem
cluster.samps <- list.data$cluster.samps
ref.sites <- list.data$ref.sites
site.chem <- list.data$site.chem

# set cutoff for possible stressor identification
probsLow <- 0.10
probsHigh <- 0.90 

# Run getStressorList
list.stressors <- getStressorList(TargetSiteID, site.Clusters, chem.info, cluster.chem
                                 , cluster.samps, ref.sites, site.chem
                                 , probsHigh, probsLow)

# datasets getBMIMatches
## remove "none"
stressors <- list.stressors$stressors[list.stressors$stressors != "none"]

# Run getBMIMatches
list.MatchBMIData <- getBMIMatches(stressors, list.data)  

# datasets getStressorSpecificRegressions
# data import, example
# data.bmi.taxa.raw <- read.delim(paste(myDir.Data,"data.bmi.taxa.raw.tab",sep=""))
# data.SSTV.totabund <- read.delim(paste(myDir.Data,"data.totabund.bySample.tab",sep=""))
#
# data, example included with package
data.bmi.taxa.raw <- data_BMIcounts
data.SSTV.totabund <- data_BMIRelAbund

# Run getStressorSpecificRegressions
getStressorSpecificRegressions(list.MatchBMIData)

The structure of the returned data is below.

str(list.MatchAlgData)

getSSDplot

Species Sensitify Distribution (SSD) plots

Generates plots of the proportion of species affected at different exposure levels in laboratory toxicity tests.

Plots are generated from a single set of inputs. They are created with ggplot and are returned to the console as a variable (e.g., "p1"). It is up to the user to save the output.

The plot below reproduces the plot contained in the ssd_generator_v1.xlsm file on the CADDIS website.

# Example 3 
# https://www.epa.gov/caddis-vol4/caddis-volume-4-data-analysis-download-software
# ssd_generator_v1.xlsm
myDF <- data_SSD_generator
myRT   <- "ResponseType"
myTaxa <- "Taxa"
myExp  <- "Exposure"
# Run function
p3 <- getSSDplot(myDF, myRT, myTaxa, myExp)
p3

For information on interpretation see the CADDIS website.

https://www.epa.gov/caddis-vol4/caddis-volume-4-data-analysis-advanced-analyses-controlling-for-natural-variability#tab-5

getCoOccur

Generates a PDF with 2 plots per site for co-occurrence of measured chem and bio data.

Saves PDF of plots and a scores files (tab separated) to user defined directory. Each SiteID is a different directory.

The example data for this function is from a different state from the other example data so SiteID's and stressor data will not match.

#Load Data
df.data <- data_CoOccur
#
col.Group     <- "Group"
col.Bio       <- "CSCI"
col.Stressors <- c("DO_uf_mg_L", "SpecCond_uf_µS_cm", "TN_uf_mg_L", "TP_mg_L")
col.ID        <- c("StationID_Master")
#
Bio.Nar.Brk <- c(-2, 0.62, 0.799, 0.919, 2)
Bio.Nar.Lab <- c("very likely altered", "likely altered"
                , "possibly altered ", "likely intact")
Bio.Deg.Brk <- c(-2, 0.799, 2)
Bio.Deg.Lab <- c("Degraded", "Good")
dir.plots <- file.path(getwd(), "Results")
#
ID.plot <- c("SMC08335", "901SJSJC9", "911TCAM01", "403STC004")
#
getCoOccur(df.data, ID.plot, col.ID, col.Group, col.Bio, col.Stressors
        , Bio.Nar.Brk, Bio.Nar.Lab, Bio.Deg.Brk, Bio.Deg.Lab 
        , dir.plots
        )

The output is a PDF with a box plot of within cluster data for a single parameter where the biological score was better than the test site. Points are the data that contribute to the box plot but are color coded by biological condition; Degraded (gray) and Good (cyan). The red dashed line represents the parameter value of the test site. If there are multiple lines then there are multiple samples for the test site. The x-axis has the name of the parameter, the number of comparator sites (n), and the cooccurrence score for the site.

Samples are scored:

  1. Supports the case for candidate cause. Stressor levels at the test sites are above the 75th percentile of comparator sites having higher biological quality.

  2. Indeterminate. Stressor levels at the test site are below the 50th percentile of comparator sites having higher biological quality.

-1. Weakens the case for the candidate cause. Stressor levels at the test sites are between the 50th and 75th percentile of comparator sites having higher biological quality.

The second plot is the relative probablity of degraded condition. The red dashed line is the value of the parameter at the test site. The lines are the 20th and 50th percentiles.

The scores text file has the biological score, cluster, parameter, value,
statistics for the cluster (number of samples, 25th-, 50th-, and 75th-percentiles), and the cooccurence score.

(In ext data folder) 901SJSJC9.CoOccurrence_Scores_20180802_123825.tsv

901SJSJC9_CoOccurrence_20180802_123825.pdf

system('open "901SJSJC9_CoOccurrence_20180802_123825.pdf")

file.path(path.package("CASTfxn"), "extdata")

dir.pkg <- file.path(path.package("CASTfxn"), "extdata")
fn.pdf <- file.path(dir.pkg, "901SJSJC9_CoOccurrence_20180802_123825.pdf")
fn.tsv <- file.path(dir.pkg, "901SJSJC9.CoOccurrence_Scores_20180802_123825.tsv")
fn.jpg <- file.path(dir.pkg, "901SJSJC9_CoOccurrence_DO.jpg")
#
# Open PDF
#system(paste0("open ",fn.pdf, ""))
#
# data table
df.tsv <- read.delim(fn.tsv)
kable(df.tsv, caption="CoOccurence Scores")
#
# image
#include_graphics(fn.jpg)
# Output

It is up to the user to interpret the output of CAST.  CAST is meant as a 
screening tool so the user will need to use best professional judgment to draw
any conclusions.

## Code
Example code is below to generate the outputs from each of the CAST functions.
This example uses a mix of the internal example data files and user files.
```r
# Install CASTfxn ####
# library(devtools)
# install_github("ALincolnTt/CASTfxn")

# Set up ####
library(CASTfxn)
library(readxl)

#Dir.Base <- "C:/Users/ann.lincoln/OneDrive - Tetra Tech, Inc/_ActiveProjects/AZ/RCodeData/ChemData"
#Dir.Base <- "P:/Current/OtherGov/City of San Diego/FY2018/CAST_2018/QC_20180705/data"
Dir.Base <- "C:/Users/Erik.Leppo/OneDrive - Tetra Tech, Inc/MyDocs_OneDrive/GitHub/CASTfxn/vignettes/Results"
#setwd("C:\\Users\\Erik.Leppo\\OneDrive - Tetra Tech, Inc\\MyDocs_OneDrive\\GitHub\\CASTfxn_Ann\\CASTfxn")

#Dir.Base <- "C:\\Users\\Erik.Leppo\\OneDrive - Tetra Tech, Inc\\MyDocs_OneDrive\\SanDiego\\Results"
setwd(Dir.Base)

# Site Selection ####
TargetSiteID <- "LCNUT011.29"  # "VRELL009.02"
TargetElevation <- "HI"

# Read Files ####
#Read all data files
data.SampSummary <- data_SampSummary
data.mod         <- data_ReachMod
data.chem.info   <- data_ChemInfo
data.303d.ComID  <- data_303d

# Subset ####
# These need subsetting
if (TargetElevation == "HI") {
    data.cluster   <- data_Cluster_Hi
    fn.coOccur.hi  <- file.path(Dir.Base,"AZCoOccurData_HI.tab")
    data.coOccur   <- read.table(fn.coOccur.hi, header = TRUE, sep = "\t")
    ibi.thresholds <- c(45,52)
} else {
    data.cluster   <- data_Cluster_Lo
    fn.coOccur.lo  <- file.path(Dir.Base, "AZCoOccurData_LO.tab")
    data.coOccur   <- read.table(fn.coOccur.lo, header = TRUE, sep = "\t")
    ibi.thresholds <- c(39,50)
}

# CAST, Stations ####
data.Stations.Info <- data_Sites
data.Stations.Info <- data.Stations.Info[data.Stations.Info$ElevCategory == TargetElevation,]

# CAST, Chem ####
## Use data rather than example file
data.chem.raw <- data_Chem
#
fn.cheminfo2   <- file.path(Dir.Base, "AZStressorInfoFinal.xlsx")
data.chem.info2 <- read_excel(fn.cheminfo2, sheet = 1, skip = 0)
# #
# fn.chem <- file.path(Dir.Base, "AZStressorDataFinal.tab")
# data.chem.raw <- read.delim(fn.chem, na.strings = "NA")
#
analytes      <- data.chem.info$StdParamName[data.chem.info$UseInStressorID == 1]
data.chem.raw <- data.chem.raw[data.chem.raw$StdParamName %in% analytes,]
data.chem.raw <- data.chem.raw[data.chem.raw$ElevCategory == TargetElevation,]

# CAST, BMI, metrics ####
data.bmi.metrics <- data_BMIMetrics
data.bmi.metrics <- data.bmi.metrics[data.bmi.metrics$ElevCategory == TargetElevation,]
data.bmi.metrics <- data.bmi.metrics[data.bmi.metrics$StationID_Master != "VROAK042.78",]
data.bmi.metrics <- data.bmi.metrics[data.bmi.metrics$StationID_Master != "SRHAG007.47",]
data.bmi.metrics <- data.bmi.metrics[,c("StationID_Master","BMISampID",
                                        "BMI.Metrics.SampID","CollDate",
                                        "ElevCategory","NarRat","IBI",
                                        "TotalTaxSPL_Sc","DipTaxSPL_Sc",
                                        "IntolTaxSPL_Sc","HBISPL_Sc","PlecoPct_Sc",
                                        "ScrapPctSPL_Sc","ScrapTaxSPL_Sc",
                                        "TrichTax_Sc","EphemTax_Sc","EphemPct_Sc",
                                        "Dom01PctSPL_Sc")]
data.bmi.metrics <- data.bmi.metrics[, unlist(lapply(data.bmi.metrics, 
                                      function(x) !all(is.na(x))))]

# CAST, Alg, metrics ####
fn.alg.metrics     <- file.path(Dir.Base, "AZAlgaeMetrics.tab")
data.algae.metrics <- read.table(fn.alg.metrics, header = TRUE, sep = "\t")
data.algae.metrics <- data.algae.metrics[!is.na(data.algae.metrics),]

# CAST, misc ####
fn.bmi.raw        <- file.path(Dir.Base, "AZBenthicCountsFinal.tab")
data.bmi.taxa.raw <- read.table(fn.bmi.raw, header = TRUE, sep = "\t")

fn.MT.bmi   <- file.path(Dir.Base, "AZBenthicMasterTaxa.tab")
data.MT.bmi <- read.table(fn.MT.bmi, header = TRUE, sep = "\t", 
                          stringsAsFactors = FALSE)

# read LkpDir (.tab file)
# fn.lkpdir <- "C:/Users/ann.lincoln/OneDrive - Tetra Tech, Inc/_ActiveProjects/AZ/RCodeData/ChemData/AZLkpDir.tab"
# data.lkp.dir <- read.table(fn.lkpdir, header = TRUE, sep = "\t", 
#                            col.names = c("StdParamName","DipTaxSPL_Sc",
#                                          "Dom01PctSPL_Sc","EphemPct_Sc",
#                                          "EphemTax_Sc","HBISPL_Sc","IBI",
#                                          "IntolTaxSPL_Sc","PlecoPct_Sc",
#                                          "ScrapPctSPL_Sc","ScrapTaxSPL_Sc",
#                                          "TotalTaxSPL_Sc","TrichTax_Sc"),
#                            colClasses = c("character","numeric","numeric", 
#                                           "numeric","numeric","numeric","numeric",
#                                           "numeric","numeric","numeric",
#                                           "numeric","numeric","numeric"))
# data.lkp.dir <- data_LkpDir

list.SiteSummary <- getSiteInfo(TargetSiteID)
# Returns: mySiteSummary <- list(SiteInfo = mySiteInfo, Samps = mySamps, 
                                            # BMImetrics = myBMImetrics, 
                                            # AlgMetrics = myAlgaeMetrics, 
                                            # ReachInfo = myReachInfo, 
                                            # COMID = myCOMID, 
                                            # ClustIDs = myClustIDs)
site.COMID <- list.SiteSummary$COMID
site.Clusters <- list.SiteSummary$ClustIDs
SiteInfo <- list.SiteSummary$SiteInfo

list.data <- getChemDataSubsets(TargetSiteID, site.COMID, site.Clusters)
# mySubsets <- list(ref.sites = refSiteIDs, 
                    # ref.reaches = refSiteCOMIDs, 
                    # cluster.samps = cluster.chem.samps, 
                    # chem.info = chems.groups.sort, 
                    # all.chems = all.chems3, 
                    # cluster.chem = cluster.chem.tab5, 
                    # site.chem = site.chem4)
ref.sites <- list.data$ref.sites
ref.reaches <- list.data$ref.reaches
cluster.samps <- list.data$cluster.samps
cluster.chem <- list.data$cluster.chem
site.chem <- list.data$site.chem
chem.info <- list.data$chem.info

# Get Cluster Info
getClusterInfo(site.COMID, site.Clusters, ref.reaches, dir_results)

# CAST, Stressors ####
# Get Stressor List
list.stressors <- getStressorList(TargetSiteID, site.Clusters, chem.info, 
                                  cluster.chem, cluster.samps, ref.sites, 
                                  site.chem, probsHigh=0.75, probsLow=0.25)
# myStressors <- list(stressors = stressorlist, site.stressor.pctrank = site.pctrank)

stressors <- list.stressors$stressors

if ((length(stressors) == 1) && stressors[1] == "none") {
    # No stressors returned
    print(paste("No stressors identified for site", TargetSiteID, sep = " "))
    utils::flush.console()
} else {
  #
    stressors <- c(stressors[2:length(stressors)])
    # Get BMI matches
    list.MatchBMIData <- getBMIMatches(stressors, list.data)
    # myBMIMatchData <- list(all.b.str = all.mbmi.stress
    #                        , cl.b.str = cl.mbmi.stress
    #                        , site.b.str = site.mbmi.stress
    #                        , all.b.rsp = all.mbmi.resp
    #                        , cl.b.rsp = cl.mbmi.resp
    #                        , site.b.rsp = site.mbmi.resp)

    # Get BMI Stressor Responses
    getBMIStressorResponses(stressors, list.MatchBMIData)

    if (TargetSiteID %in% unique(data.coOccur$StationID_Master)) {
        getCoOccur(data.coOccur, ID.plot=TargetSiteID, col.ID="StationID_Master"
                , col.Group="Group", col.Bio="IBI", col.Stressors=c(stressors)
                , Bio.Nar.Brk=c(0, ibi.thresholds, 100)
                , Bio.Nar.Lab=c("Impaired","Marginal","Good")
                , Bio.Deg.Brk=c(0, min(ibi.thresholds), 100)
                , Bio.Deg.Lab=c("Impaired", "Good"))
    }

    # Get Stressor-specific regressions
    predint <- 0.75
    varLegLoc <- "topright"
    myDir.Data <- getwd()
    # read in relative abundance file
    fn.RelAbund <- "AZ_Benthics_RelAbund.xlsx"
    BMIRelAbund <- read_excel(fn.RelAbund, sheet=1)
    #data.SSTV.totabund <- data_BMIRelAbund
    data.SSTV.totabund <- BMIRelAbund

    getStressorSpecificRegressions(list.MatchBMIData, predint, varLegLoc)
    # }

    # Erik, 20180712, commented out since no data
    # list.MatchAlgData <- getAlgMatches(stressors, list.data)
    # # myAlgMatchData <- list(all.a.str = all.malg.stress
    # #                        , cl.a.str = cl.malg.stress
    # #                        , site.a.str = site.malg.stress
    # #                        , all.a.rsp = all.malg.resp
    # #                        , cl.a.rsp = cl.malg.resp
    # #                        , site.a.rsp = site.malg.resp)
    # 
    # # Get Algal Stressor Responses
    # getAlgStressorResponses(stressors, list.MatchAlgData)

    # getSSDs
    #getSSDplot(Data, ResponseType, Taxa, Exposure)
    myDF <- data_SSD_generator
    myRT   <- "ResponseType"
    myTaxa <- "Taxa"
    myExp  <- "Exposure"
    # Run function
    p3 <- getSSDplot(myDF, myRT, myTaxa, myExp)

}

EXTRA

Describe for each output type: 1. Format and number. 2. describe the format (e.g., plot). 3. how to interpret



leppott/CASTfxn documentation built on Sept. 6, 2019, 11:04 p.m.