knitr::opts_chunk$set(echo=TRUE) library(knitr) library(CASTfxn)
CASTfxn
was created by Tetra Tech, Inc. for in 2018 to support the Causal Assessment Screening Tool. This tool aides data managers to more quickly analyze large amounts of data for causal assessment. It is not meant to be the final arbiter. But it can aid in the filtering the data.
It is envisioned that this library can be a living package and will add additional functions and routines in the future.
Many of the examples in this vignette are included in the examples in the corresponding functions. Each function in the CASTfxn
library includes example data.
The typical user of this package is envisioned as a data manager familiar with the structure of the data being used as well as the process of causal assessment. A basic understanding in the use of R is recommended.
The R library is saved on GitHub for ease of updating and distribution. Issues can be tracked, fixed, and code made available for download. Current users can update the library using the same code used to install the library (see below). Devtools is required for the installation.
install.packages("devtools") library(devtools) install_github("leppott/CASTfxn")
To install a specific version of the tool the release can be added to the install command. For example to install version 9.2 (not a valid version at this point) the code would be install_github("leppott/CASTfxn@v9.2").
To contact the authors directly email Erik.Leppo@tetratech.com or Ann.Lincoln@tetratech.com.
Preparing Raw Data
CASTfxn Operations
The CASTfxn
library makes some assumptions on file names and structure.
The details section of each function names the files and required fields necessary to run that function.
The package does not import any files so there is no standard file format. The user is expected to have the ability and skill to import the files into R.
The first line are the field (column) headers.
There is no single overall function but a series of functions.
The operations of the CASTfxn
package are listed below but will be explained
in more detail in their own section. The functions are listed not in
alphabetical order but in the order the functions are intended to run. That is,
some functions require the output from other functions.
The library assumes that there are the following folders in the working directory:
The folder "Results" is for some of the outputs.
Some of the functions assume that some datasets not included in the function parameters are loaded into the global environment. These are mentioned in the Details section of each function. It is anticipated that a future version of the package will explicitly list these datasets in the parameters of each function.
Output is a jpg map to the "Results" directory of the working directory. And a summary list; SiteInfo, Samps , BMImetrics, AlgMetrics, ReachInfo, COMID, ClustIDs, impair, and mods.
The summary list is returned to the console. It is not saved to file. The map is saved as SitID.map.jpg, e.g., SRCKN001.61.map.jpg.
TargetSiteID <- "SRCKN001.61" dir_results <- file.path(getwd(), "Results") CurrentDir <- getwd() myDir.Data <- paste(CurrentDir,"data/",sep="/") # data, example included with package data.Stations.Info <- data_Sites data.SampSummary <- data_SampSummary data.303d.ComID <- data_303d data.bmi.metrics <- data_BMIMetrics data.algae.metrics <- data_AlgMetrics data.cluster <- data_Cluster_Hi data.mod <- data_ReachMod # Map data map_flowline <- data_GIS_Flow_HI map_flowline2 <- data_GIS_Flow_LO map_outline <- data_GIS_AZ_Outline # Project site data to USGS Albers Equal Area usgs.aea <- "+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 +lon_0=-96 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0" # projection for outline my.aea <- "+proj=aea +lat_1=20 +lat_2=60 +lat_0=40 +lon_0=-96 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0" map_proj <- my.aea # Run getSiteInfo list.SiteSummary <- getSiteInfo(TargetSiteID , dir_results , data.Stations.Info , data.SampSummary , data.303d.ComID , data.bmi.metrics , data.algae.metrics , data.cluster , data.mod , map_proj , map_outline , map_flowline)
The structure of the data is below.
str(list.SiteSummary)
The map (shown below) includes: state border (black line) NHDplus flowlines (light blue lines) all sites (dark gray close circles) cluster sites (cyan closed circles) reference sites (blue open circles) target site (red closed triangle)
TargetSiteID <- "SRCKN001.61" myJPG <- paste0(TargetSiteID,".map.jpg") fn.img <- file.path(getwd(), "Results", TargetSiteID, myJPG) include_graphics(fn.img)
Get chemical data subsets.
Requires the output of:
The return is a summary list; ref.sites, ref.reaches, cluster.samps, chem.info , all.chems, cluster.chem, and site.chem.
TargetSiteID <- "SRCKN001.61" CurrentDir<-getwd() myDir.Data <- paste(CurrentDir,"data/",sep="/") dir_results <- file.path(getwd(), "Results") # Run getSiteInfo # data, example included with package data.Stations.Info <- data_Sites data.SampSummary <- data_SampSummary data.303d.ComID <- data_303d data.bmi.metrics <- data_BMIMetrics data.algae.metrics <- data_AlgMetrics data.mod <- data_ReachMod # #' # Cluster based on elevation category # need for getSiteInfo and getChemDataSubsets elev_cat <- toupper(data.Stations.Info[data.Stations.Info[,"StationID_Master"]==TargetSiteID, "ElevCategory"]) if(elev_cat=="HI"){ data.cluster <- data_Cluster_Hi } else if(elev_cat=="LO") { data.cluster <- data_Cluster_Lo } # # Map data # AZ map_flowline <- data_GIS_Flow_HI map_flowline2 <- data_GIS_Flow_LO map_outline <- data_GIS_AZ_Outline # Project site data to USGS Albers Equal Area usgs.aea <- "+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 +lon_0=-96 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0" # projection for outline my.aea <- "+proj=aea +lat_1=20 +lat_2=60 +lat_0=40 +lon_0=-96 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0" map_proj <- my.aea # list.SiteSummary <- getSiteInfo(TargetSiteID , dir_results , data.Stations.Info , data.SampSummary , data.303d.ComID , data.bmi.metrics , data.algae.metrics , data.cluster , data.mod , map_proj , map_outline , map_flowline) # Data getChemDataSubset site.COMID <- list.SiteSummary$COMID site.Clusters <- list.SiteSummary$ClustIDs data.chem.raw <- data_Chem data.chem.info <- data_ChemInfo # Run getChemDataSubsets list.data <- getChemDataSubsets(TargetSiteID , comid=site.COMID , cluster=site.Clusters , data.cluster=data.cluster , data.Stations.Info=data.Stations.Info , data.chem.raw=data.chem.raw , data.chem.info=data.chem.info)
The structure of the data is below.
str(list.data)
Box plots of each stressor, grouped by category.
Requires the output of:
Returns a jpeg in the "Results" subdirectory of the working directory with box plots. Also returns a list of stressors; stressors and site.stressor.pctrank.
The plots are named as SiteID.boxes.category.jpg, e.g., SRCKN001.61.boxes.Nutrients.jpg.
TargetSiteID <- "SRCKN001.61" CurrentDir<-getwd() myDir.Data <- paste(CurrentDir,"data/",sep="/") dir_results <- file.path(getwd(), "Results") # Data getSiteInfo # data, example included with package data.Stations.Info <- data_Sites # need for getSiteInfo and getChemDataSubsets data.SampSummary <- data_SampSummary data.303d.ComID <- data_303d data.bmi.metrics <- data_BMIMetrics data.algae.metrics <- data_AlgMetrics data.mod <- data_ReachMod # Cluster based on elevation category # need for getSiteInfo and getChemDataSubsets elev_cat <- toupper(data.Stations.Info[data.Stations.Info[,"StationID_Master"]==TargetSiteID, "ElevCategory"]) if(elev_cat=="HI"){ data.cluster <- data_Cluster_Hi } else if(elev_cat=="LO") { data.cluster <- data_Cluster_Lo } # Map data # AZ map_flowline <- data_GIS_Flow_HI map_flowline2 <- data_GIS_Flow_LO map_outline <- data_GIS_AZ_Outline # Project site data to USGS Albers Equal Area usgs.aea <- "+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 +lon_0=-96 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0" # projection for outline my.aea <- "+proj=aea +lat_1=20 +lat_2=60 +lat_0=40 +lon_0=-96 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0" map_proj <- my.aea # Run getSiteInfo list.SiteSummary <- getSiteInfo(TargetSiteID, dir_results, data.Stations.Info , data.SampSummary, data.303d.ComID , data.bmi.metrics, data.algae.metrics , data.cluster, data.mod , map_proj, map_outline, map_flowline) # Data getChemDataSubsets # data import, example # data.chem.raw <- read.delim(paste(myDir.Data,"data.chem.raw.tab",sep=""),na.strings = c(""," ")) # data.chem.info <- read.delim(paste(myDir.Data,"data.chem.info.tab",sep="")) # data, example included with package site.COMID <- list.SiteSummary$COMID site.Clusters <- list.SiteSummary$ClustIDs data.chem.raw <- data_Chem data.chem.info <- data_ChemInfo # Run getChemDataSubsets list.data <- getChemDataSubsets(TargetSiteID, comid=site.COMID, cluster=site.Clusters , data.cluster=data.cluster, data.Stations.Info=data.Stations.Info , data.chem.raw=data.chem.raw, data.chem.info=data.chem.info) # datasets getStressorList chem.info <- list.data$chem.info cluster.chem <- list.data$cluster.chem cluster.samps <- list.data$cluster.samps ref.sites <- list.data$ref.sites site.chem <- list.data$site.chem # set cutoff for possible stressor identification probsLow <- 0.10 probsHigh <- 0.90 # Run getStressorList list.stressors <- getStressorList(TargetSiteID, site.Clusters, chem.info, cluster.chem , cluster.samps, ref.sites, site.chem , probsHigh, probsLow)
The structure of the returned data is below.
str(list.stressors)
One text file is created.
And plots are created and saved per stressor group. The "nutrient" plot is shown below.
TargetSiteID <- "SRCKN001.61" # filename fn.pctrank <- file.path(getwd(), "Results", TargetSiteID , paste0(TargetSiteID, ".chem.pctrank.txt")) # read df.pctrank <- read.delim(fn.pctrank) # Structure str(df.pctrank) # create table # first 6 rows and columns kable(df.pctrank[1:6, 1:6], caption="partial table") # box plot myJPG <- paste0(TargetSiteID,".boxes.Nutrients.jpg") fn.img <- file.path(getwd(), "Results", TargetSiteID, myJPG) include_graphics(fn.img)
The stressor categories are:
The data ploted is only within cluster data (denoted on the x-axis).
Stressors are on the y-axis and correspond to column names in the data.
The x-axis is percent rank (0-1).
Box and whisker plots have the following attributes:
Summary cluster information.
Requires the output of:
The return is a jpeg in the "Results" subdirectory of the working directory.
The plots are names with SiteID.cluster.parameter.jpg, e.g.,
SRCKN001.61.cluster.ElevWs.jpg.
In this case example 95 plots are created.
TargetSiteID <- "SRCKN001.61" CurrentDir<-getwd() myDir.Data <- paste(CurrentDir,"data/",sep="/") dir_results <- file.path(getwd(), "Results") # Data getSiteInfo # data, example included with package data.Stations.Info <- data_Sites # need for getSiteInfo and getChemDataSubsets data.SampSummary <- data_SampSummary data.303d.ComID <- data_303d data.bmi.metrics <- data_BMIMetrics data.algae.metrics <- data_AlgMetrics data.mod <- data_ReachMod # Cluster based on elevation category # need for getSiteInfo and getChemDataSubsets elev_cat <- toupper(data.Stations.Info[data.Stations.Info[,"StationID_Master"]==TargetSiteID, "ElevCategory"]) if(elev_cat=="HI"){ data.cluster <- data_Cluster_Hi } else if(elev_cat=="LO") { data.cluster <- data_Cluster_Lo } # Map data # AZ map_flowline <- data_GIS_Flow_HI map_flowline2 <- data_GIS_Flow_LO map_outline <- data_GIS_AZ_Outline # Project site data to USGS Albers Equal Area usgs.aea <- "+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 +lon_0=-96 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0" # projection for outline my.aea <- "+proj=aea +lat_1=20 +lat_2=60 +lat_0=40 +lon_0=-96 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0" map_proj <- my.aea # Run getSiteInfo list.SiteSummary <- getSiteInfo(TargetSiteID, dir_results, data.Stations.Info , data.SampSummary, data.303d.ComID , data.bmi.metrics, data.algae.metrics , data.cluster, data.mod , map_proj, map_outline, map_flowline) # Data getChemDataSubsets # data, example included with package data.chem.raw <- data_Chem data.chem.info <- data_ChemInfo site.COMID <- list.SiteSummary$COMID site.Clusters <- list.SiteSummary$ClustIDs # # Run getChemDataSubsets list.data <- getChemDataSubsets(TargetSiteID, comid=site.COMID, cluster=site.Clusters , data.cluster=data.cluster, data.Stations.Info=data.Stations.Info , data.chem.raw=data.chem.raw, data.chem.info=data.chem.info) # Data getClusterInfo ref.reaches <- list.data$ref.reaches refSiteCOMIDs <- list.data$ref.reaches # Run getClusterInfo getClusterInfo(site.COMID, site.Clusters, ref.reaches, dir_results)
For each variable in the test data set a "Ws" (Watershed) and "Cat" (Catchment) version are created. This is how the variables exist in the data.
Ws = watershed = entire upstream area
Cat = catchment = local polygon associated with reach
TargetSiteID <- "SRCKN001.61" myJPG <- paste0(TargetSiteID,".cluster.ElevWs.jpg") fn.img <- file.path(getwd(), "Results", TargetSiteID, myJPG) include_graphics(fn.img)
The data ploted is between cluster data (clusters are denoted on the x-axis).
Stressors are on the y-axis and correspond to column names in the data.
The x-axis is the range of values in the data.
Box and whisker plots have the following attributes:
Get BMI and chem sample matches.
Requires the output of:
The return is a summary list; all.b.str, cl.b.str, site.b.str, all.b.rsp , cl.b.rsp, and site.b.rsp. No files are created or saved.
TargetSiteID <- "SRCKN001.61" CurrentDir<-getwd() myDir.Data <- paste(CurrentDir,"data/",sep="/") dir_results <- file.path(getwd(), "Results") # Data getSiteInfo # data, example included with package data.Stations.Info <- data_Sites # need for getSiteInfo and getChemDataSubsets data.SampSummary <- data_SampSummary data.303d.ComID <- data_303d data.bmi.metrics <- data_BMIMetrics data.algae.metrics <- data_AlgMetrics data.cluster <- data_Cluster_Hi # need for getSiteInfo and getChemDataSubsets data.mod <- data_ReachMod # Map data # AZ map_flowline <- data_GIS_Flow_HI map_flowline2 <- data_GIS_Flow_LO map_outline <- data_GIS_AZ_Outline # Project site data to USGS Albers Equal Area usgs.aea <- "+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 +lon_0=-96 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0" # projection for outline my.aea <- "+proj=aea +lat_1=20 +lat_2=60 +lat_0=40 +lon_0=-96 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0" map_proj <- my.aea # Run getSiteInfo list.SiteSummary <- getSiteInfo(TargetSiteID, dir_results, data.Stations.Info , data.SampSummary, data.303d.ComID , data.bmi.metrics, data.algae.metrics , data.cluster, data.mod , map_proj, map_outline, map_flowline) # Data getChemDataSubsets site.COMID <- list.SiteSummary$COMID site.Clusters <- list.SiteSummary$ClustIDs # data, example included with package data.chem.raw <- data_Chem data.chem.info <- data_ChemInfo # Run getChemDataSubsets list.data <- getChemDataSubsets(TargetSiteID, comid=site.COMID, cluster=site.Clusters , data.cluster=data.cluster, data.Stations.Info=data.Stations.Info , data.chem.raw=data.chem.raw, data.chem.info=data.chem.info) # Data getStressorList chem.info <- list.data$chem.info cluster.chem <- list.data$cluster.chem cluster.samps <- list.data$cluster.samps ref.sites <- list.data$ref.sites site.chem <- list.data$site.chem # set cutoff for possible stressor identification probsLow <- 0.10 probsHigh <- 0.90 # Run getStressorList list.stressors <- getStressorList(TargetSiteID, site.Clusters, chem.info, cluster.chem , cluster.samps, ref.sites, site.chem , probsHigh, probsLow) # Data getBioMatches ## remove "none" stressors <- list.stressors$stressors[list.stressors$stressors != "none"] # Run getBioMatches biocomm <- "BMI" data.bio.metrics <- data_BMIMetrics list.MatchBMIData <- getBioMatches(stressors, list.data, list.SiteSummary, data.SampSummary, data.chem.raw, data.bio.metrics, biocomm)
The structure of the returned data is below.
str(list.MatchBMIData)
Get biologica stressor responses.
Requires the output of:
The return is jpg in "Results" folder of working directory for each parameter. And a tab-delimited text file of stressor correlations.
Each parameter is plotted and scored. SCORING = ???????
The jpg files are in teh format of SiteID.BMI.SR.parameter.jpg, e.g., SRCKN001.61BMI.SR.Arsenic_inorg_uf_ug_L_DipTaxSPL_Sc.jpg.
The text file is in the format of ......
predint <- 0.75 varLegLoc <- "topright" TargetSiteID <- "SRCKN001.61" CurrentDir<-getwd() myDir.Data <- paste(CurrentDir,"data/",sep="/") dir_results <- file.path(getwd(), "Results") # datasets getSiteInfo # data, example included with package data.Stations.Info <- data_Sites # need for getSiteInfo and getChemDataSubsets data.SampSummary <- data_SampSummary data.303d.ComID <- data_303d data.bmi.metrics <- data_BMIMetrics data.algae.metrics <- data_AlgMetrics data.mod <- data_ReachMod # Cluster based on elevation category # need for getSiteInfo and getChemDataSubsets elev_cat <- toupper(data.Stations.Info[data.Stations.Info[,"StationID_Master"]==TargetSiteID, "ElevCategory"]) if(elev_cat=="HI"){ data.cluster <- data_Cluster_Hi } else if(elev_cat=="LO") { data.cluster <- data_Cluster_Lo } # Map data # AZ map_flowline <- data_GIS_Flow_HI map_flowline2 <- data_GIS_Flow_LO map_outline <- data_GIS_AZ_Outline # Project site data to USGS Albers Equal Area usgs.aea <- "+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 +lon_0=-96 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0" # projection for outline my.aea <- "+proj=aea +lat_1=20 +lat_2=60 +lat_0=40 +lon_0=-96 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0" map_proj <- my.aea # Run getSiteInfo list.SiteSummary <- getSiteInfo(TargetSiteID, dir_results, data.Stations.Info , data.SampSummary, data.303d.ComID , data.bmi.metrics, data.algae.metrics , data.cluster, data.mod , map_proj, map_outline, map_flowline) # Data getChemDataSubsets # data import, example site.COMID <- list.SiteSummary$COMID site.Clusters <- list.SiteSummary$ClustIDs # data, example included with package data.chem.raw <- data_Chem data.chem.info <- data_ChemInfo # Run getChemDataSubsets list.data <- getChemDataSubsets(TargetSiteID, comid=site.COMID, cluster=site.Clusters , data.cluster=data.cluster, data.Stations.Info=data.Stations.Info , data.chem.raw=data.chem.raw, data.chem.info=data.chem.info) # Data getStressorList chem.info <- list.data$chem.info cluster.chem <- list.data$cluster.chem cluster.samps <- list.data$cluster.samps ref.sites <- list.data$ref.sites site.chem <- list.data$site.chem # set cutoff for possible stressor identification probsLow <- 0.10 probsHigh <- 0.90 # Run getStressorList list.stressors <- getStressorList(TargetSiteID, site.Clusters, chem.info, cluster.chem , cluster.samps, ref.sites, site.chem , probsHigh, probsLow) # Data getBioMatches ## remove "none" stressors <- list.stressors$stressors[list.stressors$stressors != "none"] stressors_logtransf <- list.stressors$stressors_LogTransf[list.stressors$stressors != "none"] LogTransf <- stressors_logtransf # Run getBioMatches biocomm <- "BMI" list.MatchBioData <- getBioMatches(stressors, list.data, biocomm) # Data getBMIStressorResponses biocomm <- "BMI" BioResp <- c("IBI", "TotalTaxSPL_Sc", "DipTaxSPL_Sc" , "IntolTaxSPL_Sc", "HBISPL_Sc", "PlecoPct_Sc", "ScrapPctSPL_Sc" , "TrichTax_Sc", "EphemTax_Sc", "EphemPct_Sc", "Dom01PctSPL_Sc") # Run getBMIStressorResponses getBioStressorResponses(TargetSiteID, stressors, BioResp, list.MatchBioData , LogTransf, ref.sites, predint, varLegLoc, biocomm)
The structure of the returned data is below.
str(list.MatchBMIData)
Get Algae and chem sample matches.
Requires the output of:
The return is a summary list; all.b.str, cl.b.str, site.b.str, all.b.rsp, cl.b.rsp , and site.b.rsp.
The example data included with the package does not include algal data so no outputs are produced. The output is similar to getBMImatches.
TargetSiteID <- "LCBEN002.57" CurrentDir<-getwd() myDir.Data <- paste(CurrentDir,"data/",sep="/") dir_results <- file.path(getwd(), "Results") # Data getSiteInfo # data, example included with package data.Stations.Info <- data_Sites # need for getSiteInfo and getChemDataSubsets data.SampSummary <- data_SampSummary data.303d.ComID <- data_303d data.bmi.metrics <- data_BMIMetrics data.algae.metrics <- data_AlgMetrics data.mod <- data_ReachMod # Cluster based on elevation category # need for getSiteInfo and getChemDataSubsets elev_cat <- toupper(data.Stations.Info[data.Stations.Info[,"StationID_Master"]==TargetSiteID, "ElevCategory"]) if(elev_cat=="HI"){ data.cluster <- data_Cluster_Hi } else if(elev_cat=="LO") { data.cluster <- data_Cluster_Lo } # Map data # San Diego #flowline <- rgdal::readOGR(dsn = "data_gis/NHDv2_Flowline_Ecoreg85", layer = "NHDv2_eco85_Project") #outline <- rgdal::readOGR(dsn = "data_gis/Eco85", layer = "Ecoregion85") # AZ map_flowline <- data_GIS_Flow_HI map_flowline2 <- data_GIS_Flow_LO map_outline <- data_GIS_AZ_Outline # Project site data to USGS Albers Equal Area usgs.aea <- "+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 +lon_0=-96 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0" # projection for outline my.aea <- "+proj=aea +lat_1=20 +lat_2=60 +lat_0=40 +lon_0=-96 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0" map_proj <- my.aea # Run getSiteInfo list.SiteSummary <- getSiteInfo(TargetSiteID, dir_results, data.Stations.Info , data.SampSummary, data.303d.ComID , data.bmi.metrics, data.algae.metrics , data.cluster, data.mod , map_proj, map_outline, map_flowline) # Data getChemDataSubsets # data, example included with package data.chem.raw <- data_Chem data.chem.info <- data_ChemInfo site.COMID <- list.SiteSummary$COMID site.Clusters <- list.SiteSummary$ClustIDs # Run getChemDataSubsets list.data <- getChemDataSubsets(TargetSiteID, comid=site.COMID, cluster=site.Clusters , data.cluster=data.cluster, data.Stations.Info=data.Stations.Info , data.chem.raw=data.chem.raw, data.chem.info=data.chem.info) # Data getStressorList chem.info <- list.data$chem.info cluster.chem <- list.data$cluster.chem cluster.samps <- list.data$cluster.samps ref.sites <- list.data$ref.sites site.chem <- list.data$site.chem # set cutoff for possible stressor identification probsLow <- 0.10 probsHigh <- 0.90 # Run getStressorList list.stressors <- getStressorList(TargetSiteID, site.Clusters, chem.info, cluster.chem , cluster.samps, ref.sites, site.chem , probsHigh, probsLow) # Data getBioMatches ## remove "none" stressors <- list.stressors$stressors[list.stressors$stressors != "none"] # Run getBioMatches biocomm <- "algae" list.MatchAlgData <- getBioMatches(stressors, list.data, biocomm)
The structure of the returned data is below.
str(list.MatchAlgData)
Algae stressor regressions.
Requires the output of:
The return is a jpg in "Results" folder of working directory for each paramter. And a tab-delimited text file of stressor correlations.
The example data included with the package does not include algal data so no outputs are produced. The output is similar to getBMIStressorResponses
predint <- 0.75 varLegLoc <- "topright" TargetSiteID <- "LCBEN002.57" CurrentDir<-getwd() myDir.Data <- paste(CurrentDir,"data/",sep="/") dir_results <- file.path(getwd(), "Results") # datasets getSiteInfo # data, example included with package data.Stations.Info <- data_Sites # need for getSiteInfo and getChemDataSubsets data.SampSummary <- data_SampSummary data.303d.ComID <- data_303d data.bmi.metrics <- data_BMIMetrics data.algae.metrics <- data_AlgMetrics data.mod <- data_ReachMod #' # Cluster based on elevation category # need for getSiteInfo and getChemDataSubsets elev_cat <- toupper(data.Stations.Info[data.Stations.Info[,"StationID_Master"]==TargetSiteID, "ElevCategory"]) if(elev_cat=="HI"){ data.cluster <- data_Cluster_Hi } else if(elev_cat=="LO") { data.cluster <- data_Cluster_Lo } # Map data # San Diego # AZ map_flowline <- data_GIS_Flow_HI map_flowline2 <- data_GIS_Flow_LO map_outline <- data_GIS_AZ_Outline # Project site data to USGS Albers Equal Area usgs.aea <- "+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 +lon_0=-96 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0" # projection for outline my.aea <- "+proj=aea +lat_1=20 +lat_2=60 +lat_0=40 +lon_0=-96 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0" map_proj <- my.aea # Run getSiteInfo list.SiteSummary <- getSiteInfo(TargetSiteID, dir_results, data.Stations.Info , data.SampSummary, data.303d.ComID , data.bmi.metrics, data.algae.metrics , data.cluster, data.mod , map_proj, map_outline, map_flowline) # Data getChemDataSubsets # data, example included with package data.chem.raw <- data_Chem data.chem.info <- data_ChemInfo site.COMID <- list.SiteSummary$COMID site.Clusters <- list.SiteSummary$ClustIDs # Run getChemDataSubsets list.data <- getChemDataSubsets(TargetSiteID, comid=site.COMID, cluster=site.Clusters , data.cluster=data.cluster, data.Stations.Info=data.Stations.Info , data.chem.raw=data.chem.raw, data.chem.info=data.chem.info) # Data getStressorList chem.info <- list.data$chem.info cluster.chem <- list.data$cluster.chem cluster.samps <- list.data$cluster.samps ref.sites <- list.data$ref.sites site.chem <- list.data$site.chem # set cutoff for possible stressor identification probsLow <- 0.10 probsHigh <- 0.90 # Run getStressorList list.stressors <- getStressorList(TargetSiteID, site.Clusters, chem.info, cluster.chem , cluster.samps, ref.sites, site.chem , probsHigh, probsLow) # Data getBioMatches ## remove "none" stressors <- list.stressors$stressors[list.stressors$stressors != "none"] stressors_logtransf <- list.stressors$stressors_LogTransf[list.stressors$stressors != "none"] LogTransf <- stressors_logtransf # Run getBioMatches biocomm <- "algae" list.MatchBioData <- getBioMatches(stressors, list.data, biocomm) # Data getBioStressorResponses data.algae.metrics <- data_AlgMetrics BioResp <- colnames(data.algae.metrics[6:48]) biocomm <- "algae" # Run getAlgStressorResponses getBioStressorResponses(TargetSiteID, stressors, BioResp, list.MatchBioData , LogTransf, ref.sites, predint, varLegLoc, biocomm)
The structure of the returned data is below.
str(list.MatchAlgData)
Get stressor specific regressions.
Requires the output of:
The return is Jpeg files to "Results" folder in working directory. And a tab-delimited text file.
predint <- 0.75 varLegLoc <- "topright" TargetSiteID <- "SRCKN001.61" CurrentDir <- getwd() myDir.Data <- paste(CurrentDir,"data/",sep="/") # datasets getSiteInfo # data, example included with package data.Stations.Info <- data_Sites data.SampSummary <- data_SampSummary data.303d.ComID <- data_303d data.bmi.metrics <- data_BMIMetrics data.algae.metrics <- data_AlgMetrics data.cluster <- data_Cluster_Hi data.mod <- data_ReachMod # # Run getSiteInfo list.SiteSummary <- getSiteInfo(TargetSiteID) # datasets getChemDataSubsets site.COMID <- list.SiteSummary$COMID site.Clusters <- list.SiteSummary$ClustIDs # data import, example # data.chem.raw <- read.delim(paste(myDir.Data,"data.chem.raw.tab",sep=""),na.strings = c(""," ")) # data.chem.info <- read.delim(paste(myDir.Data,"data.chem.info.tab",sep="")) # data, example included with package data.chem.raw <- data_Chem data.chem.info <- data_ChemInfo # Run getChemDataSubsets list.data <- getChemDataSubsets(TargetSiteID, site.COMID, site.Clusters) # datasets getStressorList chem.info <- list.data$chem.info cluster.chem <- list.data$cluster.chem cluster.samps <- list.data$cluster.samps ref.sites <- list.data$ref.sites site.chem <- list.data$site.chem # set cutoff for possible stressor identification probsLow <- 0.10 probsHigh <- 0.90 # Run getStressorList list.stressors <- getStressorList(TargetSiteID, site.Clusters, chem.info, cluster.chem , cluster.samps, ref.sites, site.chem , probsHigh, probsLow) # datasets getBMIMatches ## remove "none" stressors <- list.stressors$stressors[list.stressors$stressors != "none"] # Run getBMIMatches list.MatchBMIData <- getBMIMatches(stressors, list.data) # datasets getStressorSpecificRegressions # data import, example # data.bmi.taxa.raw <- read.delim(paste(myDir.Data,"data.bmi.taxa.raw.tab",sep="")) # data.SSTV.totabund <- read.delim(paste(myDir.Data,"data.totabund.bySample.tab",sep="")) # # data, example included with package data.bmi.taxa.raw <- data_BMIcounts data.SSTV.totabund <- data_BMIRelAbund # Run getStressorSpecificRegressions getStressorSpecificRegressions(list.MatchBMIData)
The structure of the returned data is below.
str(list.MatchAlgData)
Species Sensitify Distribution (SSD) plots
Generates plots of the proportion of species affected at different exposure levels in laboratory toxicity tests.
Plots are generated from a single set of inputs. They are created with ggplot and are returned to the console as a variable (e.g., "p1"). It is up to the user to save the output.
The plot below reproduces the plot contained in the ssd_generator_v1.xlsm file on the CADDIS website.
# Example 3 # https://www.epa.gov/caddis-vol4/caddis-volume-4-data-analysis-download-software # ssd_generator_v1.xlsm myDF <- data_SSD_generator myRT <- "ResponseType" myTaxa <- "Taxa" myExp <- "Exposure" # Run function p3 <- getSSDplot(myDF, myRT, myTaxa, myExp) p3
For information on interpretation see the CADDIS website.
https://www.epa.gov/caddis-vol4/caddis-volume-4-data-analysis-advanced-analyses-controlling-for-natural-variability#tab-5
Generates a PDF with 2 plots per site for co-occurrence of measured chem and bio data.
Saves PDF of plots and a scores files (tab separated) to user defined directory. Each SiteID is a different directory.
The example data for this function is from a different state from the other example data so SiteID's and stressor data will not match.
#Load Data df.data <- data_CoOccur # col.Group <- "Group" col.Bio <- "CSCI" col.Stressors <- c("DO_uf_mg_L", "SpecCond_uf_µS_cm", "TN_uf_mg_L", "TP_mg_L") col.ID <- c("StationID_Master") # Bio.Nar.Brk <- c(-2, 0.62, 0.799, 0.919, 2) Bio.Nar.Lab <- c("very likely altered", "likely altered" , "possibly altered ", "likely intact") Bio.Deg.Brk <- c(-2, 0.799, 2) Bio.Deg.Lab <- c("Degraded", "Good") dir.plots <- file.path(getwd(), "Results") # ID.plot <- c("SMC08335", "901SJSJC9", "911TCAM01", "403STC004") # getCoOccur(df.data, ID.plot, col.ID, col.Group, col.Bio, col.Stressors , Bio.Nar.Brk, Bio.Nar.Lab, Bio.Deg.Brk, Bio.Deg.Lab , dir.plots )
The output is a PDF with a box plot of within cluster data for a single parameter where the biological score was better than the test site. Points are the data that contribute to the box plot but are color coded by biological condition; Degraded (gray) and Good (cyan). The red dashed line represents the parameter value of the test site. If there are multiple lines then there are multiple samples for the test site. The x-axis has the name of the parameter, the number of comparator sites (n), and the cooccurrence score for the site.
Samples are scored:
Supports the case for candidate cause. Stressor levels at the test sites are above the 75th percentile of comparator sites having higher biological quality.
Indeterminate. Stressor levels at the test site are below the 50th percentile of comparator sites having higher biological quality.
-1. Weakens the case for the candidate cause. Stressor levels at the test sites are between the 50th and 75th percentile of comparator sites having higher biological quality.
The second plot is the relative probablity of degraded condition. The red dashed line is the value of the parameter at the test site. The lines are the 20th and 50th percentiles.
The scores text file has the biological score, cluster, parameter, value,
statistics for the cluster (number of samples, 25th-, 50th-, and 75th-percentiles),
and the cooccurence score.
(In ext data folder) 901SJSJC9.CoOccurrence_Scores_20180802_123825.tsv
901SJSJC9_CoOccurrence_20180802_123825.pdf
system('open "901SJSJC9_CoOccurrence_20180802_123825.pdf")
file.path(path.package("CASTfxn"), "extdata")
dir.pkg <- file.path(path.package("CASTfxn"), "extdata") fn.pdf <- file.path(dir.pkg, "901SJSJC9_CoOccurrence_20180802_123825.pdf") fn.tsv <- file.path(dir.pkg, "901SJSJC9.CoOccurrence_Scores_20180802_123825.tsv") fn.jpg <- file.path(dir.pkg, "901SJSJC9_CoOccurrence_DO.jpg") # # Open PDF #system(paste0("open ",fn.pdf, "")) # # data table df.tsv <- read.delim(fn.tsv) kable(df.tsv, caption="CoOccurence Scores") # # image #include_graphics(fn.jpg)
# Output It is up to the user to interpret the output of CAST. CAST is meant as a screening tool so the user will need to use best professional judgment to draw any conclusions. ## Code Example code is below to generate the outputs from each of the CAST functions. This example uses a mix of the internal example data files and user files. ```r # Install CASTfxn #### # library(devtools) # install_github("ALincolnTt/CASTfxn") # Set up #### library(CASTfxn) library(readxl) #Dir.Base <- "C:/Users/ann.lincoln/OneDrive - Tetra Tech, Inc/_ActiveProjects/AZ/RCodeData/ChemData" #Dir.Base <- "P:/Current/OtherGov/City of San Diego/FY2018/CAST_2018/QC_20180705/data" Dir.Base <- "C:/Users/Erik.Leppo/OneDrive - Tetra Tech, Inc/MyDocs_OneDrive/GitHub/CASTfxn/vignettes/Results" #setwd("C:\\Users\\Erik.Leppo\\OneDrive - Tetra Tech, Inc\\MyDocs_OneDrive\\GitHub\\CASTfxn_Ann\\CASTfxn") #Dir.Base <- "C:\\Users\\Erik.Leppo\\OneDrive - Tetra Tech, Inc\\MyDocs_OneDrive\\SanDiego\\Results" setwd(Dir.Base) # Site Selection #### TargetSiteID <- "LCNUT011.29" # "VRELL009.02" TargetElevation <- "HI" # Read Files #### #Read all data files data.SampSummary <- data_SampSummary data.mod <- data_ReachMod data.chem.info <- data_ChemInfo data.303d.ComID <- data_303d # Subset #### # These need subsetting if (TargetElevation == "HI") { data.cluster <- data_Cluster_Hi fn.coOccur.hi <- file.path(Dir.Base,"AZCoOccurData_HI.tab") data.coOccur <- read.table(fn.coOccur.hi, header = TRUE, sep = "\t") ibi.thresholds <- c(45,52) } else { data.cluster <- data_Cluster_Lo fn.coOccur.lo <- file.path(Dir.Base, "AZCoOccurData_LO.tab") data.coOccur <- read.table(fn.coOccur.lo, header = TRUE, sep = "\t") ibi.thresholds <- c(39,50) } # CAST, Stations #### data.Stations.Info <- data_Sites data.Stations.Info <- data.Stations.Info[data.Stations.Info$ElevCategory == TargetElevation,] # CAST, Chem #### ## Use data rather than example file data.chem.raw <- data_Chem # fn.cheminfo2 <- file.path(Dir.Base, "AZStressorInfoFinal.xlsx") data.chem.info2 <- read_excel(fn.cheminfo2, sheet = 1, skip = 0) # # # fn.chem <- file.path(Dir.Base, "AZStressorDataFinal.tab") # data.chem.raw <- read.delim(fn.chem, na.strings = "NA") # analytes <- data.chem.info$StdParamName[data.chem.info$UseInStressorID == 1] data.chem.raw <- data.chem.raw[data.chem.raw$StdParamName %in% analytes,] data.chem.raw <- data.chem.raw[data.chem.raw$ElevCategory == TargetElevation,] # CAST, BMI, metrics #### data.bmi.metrics <- data_BMIMetrics data.bmi.metrics <- data.bmi.metrics[data.bmi.metrics$ElevCategory == TargetElevation,] data.bmi.metrics <- data.bmi.metrics[data.bmi.metrics$StationID_Master != "VROAK042.78",] data.bmi.metrics <- data.bmi.metrics[data.bmi.metrics$StationID_Master != "SRHAG007.47",] data.bmi.metrics <- data.bmi.metrics[,c("StationID_Master","BMISampID", "BMI.Metrics.SampID","CollDate", "ElevCategory","NarRat","IBI", "TotalTaxSPL_Sc","DipTaxSPL_Sc", "IntolTaxSPL_Sc","HBISPL_Sc","PlecoPct_Sc", "ScrapPctSPL_Sc","ScrapTaxSPL_Sc", "TrichTax_Sc","EphemTax_Sc","EphemPct_Sc", "Dom01PctSPL_Sc")] data.bmi.metrics <- data.bmi.metrics[, unlist(lapply(data.bmi.metrics, function(x) !all(is.na(x))))] # CAST, Alg, metrics #### fn.alg.metrics <- file.path(Dir.Base, "AZAlgaeMetrics.tab") data.algae.metrics <- read.table(fn.alg.metrics, header = TRUE, sep = "\t") data.algae.metrics <- data.algae.metrics[!is.na(data.algae.metrics),] # CAST, misc #### fn.bmi.raw <- file.path(Dir.Base, "AZBenthicCountsFinal.tab") data.bmi.taxa.raw <- read.table(fn.bmi.raw, header = TRUE, sep = "\t") fn.MT.bmi <- file.path(Dir.Base, "AZBenthicMasterTaxa.tab") data.MT.bmi <- read.table(fn.MT.bmi, header = TRUE, sep = "\t", stringsAsFactors = FALSE) # read LkpDir (.tab file) # fn.lkpdir <- "C:/Users/ann.lincoln/OneDrive - Tetra Tech, Inc/_ActiveProjects/AZ/RCodeData/ChemData/AZLkpDir.tab" # data.lkp.dir <- read.table(fn.lkpdir, header = TRUE, sep = "\t", # col.names = c("StdParamName","DipTaxSPL_Sc", # "Dom01PctSPL_Sc","EphemPct_Sc", # "EphemTax_Sc","HBISPL_Sc","IBI", # "IntolTaxSPL_Sc","PlecoPct_Sc", # "ScrapPctSPL_Sc","ScrapTaxSPL_Sc", # "TotalTaxSPL_Sc","TrichTax_Sc"), # colClasses = c("character","numeric","numeric", # "numeric","numeric","numeric","numeric", # "numeric","numeric","numeric", # "numeric","numeric","numeric")) # data.lkp.dir <- data_LkpDir list.SiteSummary <- getSiteInfo(TargetSiteID) # Returns: mySiteSummary <- list(SiteInfo = mySiteInfo, Samps = mySamps, # BMImetrics = myBMImetrics, # AlgMetrics = myAlgaeMetrics, # ReachInfo = myReachInfo, # COMID = myCOMID, # ClustIDs = myClustIDs) site.COMID <- list.SiteSummary$COMID site.Clusters <- list.SiteSummary$ClustIDs SiteInfo <- list.SiteSummary$SiteInfo list.data <- getChemDataSubsets(TargetSiteID, site.COMID, site.Clusters) # mySubsets <- list(ref.sites = refSiteIDs, # ref.reaches = refSiteCOMIDs, # cluster.samps = cluster.chem.samps, # chem.info = chems.groups.sort, # all.chems = all.chems3, # cluster.chem = cluster.chem.tab5, # site.chem = site.chem4) ref.sites <- list.data$ref.sites ref.reaches <- list.data$ref.reaches cluster.samps <- list.data$cluster.samps cluster.chem <- list.data$cluster.chem site.chem <- list.data$site.chem chem.info <- list.data$chem.info # Get Cluster Info getClusterInfo(site.COMID, site.Clusters, ref.reaches, dir_results) # CAST, Stressors #### # Get Stressor List list.stressors <- getStressorList(TargetSiteID, site.Clusters, chem.info, cluster.chem, cluster.samps, ref.sites, site.chem, probsHigh=0.75, probsLow=0.25) # myStressors <- list(stressors = stressorlist, site.stressor.pctrank = site.pctrank) stressors <- list.stressors$stressors if ((length(stressors) == 1) && stressors[1] == "none") { # No stressors returned print(paste("No stressors identified for site", TargetSiteID, sep = " ")) utils::flush.console() } else { # stressors <- c(stressors[2:length(stressors)]) # Get BMI matches list.MatchBMIData <- getBMIMatches(stressors, list.data) # myBMIMatchData <- list(all.b.str = all.mbmi.stress # , cl.b.str = cl.mbmi.stress # , site.b.str = site.mbmi.stress # , all.b.rsp = all.mbmi.resp # , cl.b.rsp = cl.mbmi.resp # , site.b.rsp = site.mbmi.resp) # Get BMI Stressor Responses getBMIStressorResponses(stressors, list.MatchBMIData) if (TargetSiteID %in% unique(data.coOccur$StationID_Master)) { getCoOccur(data.coOccur, ID.plot=TargetSiteID, col.ID="StationID_Master" , col.Group="Group", col.Bio="IBI", col.Stressors=c(stressors) , Bio.Nar.Brk=c(0, ibi.thresholds, 100) , Bio.Nar.Lab=c("Impaired","Marginal","Good") , Bio.Deg.Brk=c(0, min(ibi.thresholds), 100) , Bio.Deg.Lab=c("Impaired", "Good")) } # Get Stressor-specific regressions predint <- 0.75 varLegLoc <- "topright" myDir.Data <- getwd() # read in relative abundance file fn.RelAbund <- "AZ_Benthics_RelAbund.xlsx" BMIRelAbund <- read_excel(fn.RelAbund, sheet=1) #data.SSTV.totabund <- data_BMIRelAbund data.SSTV.totabund <- BMIRelAbund getStressorSpecificRegressions(list.MatchBMIData, predint, varLegLoc) # } # Erik, 20180712, commented out since no data # list.MatchAlgData <- getAlgMatches(stressors, list.data) # # myAlgMatchData <- list(all.a.str = all.malg.stress # # , cl.a.str = cl.malg.stress # # , site.a.str = site.malg.stress # # , all.a.rsp = all.malg.resp # # , cl.a.rsp = cl.malg.resp # # , site.a.rsp = site.malg.resp) # # # Get Algal Stressor Responses # getAlgStressorResponses(stressors, list.MatchAlgData) # getSSDs #getSSDplot(Data, ResponseType, Taxa, Exposure) myDF <- data_SSD_generator myRT <- "ResponseType" myTaxa <- "Taxa" myExp <- "Exposure" # Run function p3 <- getSSDplot(myDF, myRT, myTaxa, myExp) }
Stressor Response
Text files
Describe for each output type: 1. Format and number. 2. describe the format (e.g., plot). 3. how to interpret
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.