knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-" )
watertools
: An R-package containing functions used in watershed assessments, TMDLs, and water quality modeling. Specifically:
Download EPA Echo DMR reports
Flow and Load Duration Curves
* More to be added as I get time...
You can install watertools from github with:
install.packages("devtools") devtools::install_github("mps9506/watertools")
To access EPA ECHO wastewater discharge data you will need the NPDES permit number, date range of interest, and parameter number of interest. The echoGetEffluent
function will output a tibble
with various statisical reports (typically mean and max) the plant reported to the EPA during the requested period of interest.
## Load required packages library(watertools) library(ggplot2) library(tibble) ## The following variables are needed by the function wwtp <- "tx0119407" #NPDES permit number parameter <- "50050" #Parameter code, this is flow start <- c("2010-01-01") #Start date in yyyy-mm-dd format end <- c("2017-01-01") #End date in yyyy-mm-dd format df <- as.data.frame(echoGetEffluent(wwtp, parameter, start, end)) knitr::kable(head(df), format = "markdown")
Typically we are interested in the average discharge, to quickly plot this with ggplot
:
df <- df[df$StatisticalBaseDesc == "DAILY AV",] ggplot(df) + geom_line(aes(x = MonitoringPeriodEndDate, y = DMRValueNmbr)) + ylab("Average Discharge (MGD)") + xlab("Date")
Gathering data for multiple plants at once has not been included in the function yet. For the time being, a simple loop can be used. Note, I have not tried this with a large number of plants at one time.
# List of WWTPs wwtp <- c("tx0119407", "tx0113859", "tx0116785") # Parameters (Flow) parameter <- "50050" # Start start <- c("2010-01-01") # End end <- c("2017-01-01") # Create a data frame to populate x <- data_frame() # The for loop iterates through the wwtp list, and runs the function to retrieve data for each permit. The last line binds the retreived data into one dataframe for (i in 1:length(wwtp)) { df1 <- as.data.frame(echoGetEffluent(wwtp[i], parameter, start, end)) x <- rbind(x, df1) } knitr::kable(head(x), format = "markdown") # filters only records that report the daily average x <- x[x$StatisticalBaseDesc == "DAILY AV",] # assuming everything worked so far, this plots the values ggplot(x, aes(x = MonitoringPeriodEndDate, y = DMRValueNmbr)) + geom_line(aes(color=ID)) + ylab("Average Discharge (MGD)")
Use the dataRetrieval
package to quickly obtain sample USGS flow data.
library(dataRetrieval) siteNumber <- "08150000" ChoptankInfo <- readNWISsite(siteNumber) parameterCd <- "00060" # Raw daily data: rawDailyData <- readNWISdv(siteNumber, parameterCd, "2005-01-01", "2016-12-31") rawDailyData <- renameNWISColumns(rawDailyData) head(rawDailyData)
Format the data as needed for fdc()
dataSub <- data.frame("date" = rawDailyData$Date, "flow" = rawDailyData$Flow) head(dataSub) #generate the FDC data frame with fdc() fdcDF <- fdc(data = dataSub, filter = FALSE) head(fdcDF)
Identify flows at desired flow exceedance values (percent values between 0-1)
feDF <- flowExceedance(fdcDF, c(0.05,0.25,0.50,0.75,0.95)) feDF
Plot the flow duration curve.
plotFdc(fdcDF, feDF, breaks = c(0,10,40,60,90,100))
plotFdc
outputs a ggplot
object. So appearance can be customized using the standard ggplot
approach.
Currently this function only supports bacteria LDCs. I am still trying to determine the best approach to accomodate other parameters. Hard coding the parameters and conversions simplifies use of the function but limits the scenarios in which users can utilize the function. Conversely, requiring users input the conversions and units allows the most flexibility at the expense of simplicity. I plan on identifying some middle ground.
Using the flow data from above use ldc()
to create a data frame with the allowable load at each flow value. If we are interested in plotting observed measurements, the function will join a dataframe of observed values by date. The dataframe must have dates in the first column and values in the second column. Here the dataRetrieval
package is used to retrieve Storet data.
#download Storet water quality data from the nearby water quality station using dataRetrieval ecoli <- readWQPqw("TCEQMAIN-17425", parameterCd = "Escherichia coli", startDate = "2005-01-01", endDate = "2016-12-31") #clean this up to get the data we need library(dplyr) ecoli <- ecoli %>% filter(ResultMeasure.MeasureUnitCode != "hours") %>% #filter out the holding time values select(date = ActivityStartDate, Value = ResultMeasureValue) # rename columns #create the daataframe needed to plot the LDC ldcDF <- ldc(fdcDF, "ecoli", ecoli) head(ldcDF)
Plot it
plotLdc(ldcDF, breaks = c(0,10,40,60,90,100))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.