knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-"
)

watertools

Project Status: WIP – Initial development is in progress, but there has not yet been a stable, usable release suitable for the public. Build Status License: MIT

watertools: An R-package containing functions used in watershed assessments, TMDLs, and water quality modeling. Specifically: Download EPA Echo DMR reports Flow and Load Duration Curves * More to be added as I get time...

Installation

You can install watertools from github with:

install.packages("devtools")
devtools::install_github("mps9506/watertools")

Examples

Accessing EPA ECHO wastewater discharge data

To access EPA ECHO wastewater discharge data you will need the NPDES permit number, date range of interest, and parameter number of interest. The echoGetEffluent function will output a tibble with various statisical reports (typically mean and max) the plant reported to the EPA during the requested period of interest.

## Load required packages

library(watertools)
library(ggplot2)
library(tibble)

## The following variables are needed by the function
wwtp <- "tx0119407" #NPDES permit number
parameter <- "50050" #Parameter code, this is flow
start <- c("2010-01-01") #Start date in yyyy-mm-dd format
end <- c("2017-01-01") #End date in yyyy-mm-dd format

df <- as.data.frame(echoGetEffluent(wwtp, parameter, start, end))

knitr::kable(head(df), format = "markdown")

Typically we are interested in the average discharge, to quickly plot this with ggplot:

df <- df[df$StatisticalBaseDesc == "DAILY AV",]

ggplot(df) + 
 geom_line(aes(x = MonitoringPeriodEndDate, y = DMRValueNmbr)) + 
 ylab("Average Discharge (MGD)") + xlab("Date")

Gathering data for multiple plants at once has not been included in the function yet. For the time being, a simple loop can be used. Note, I have not tried this with a large number of plants at one time.

# List of WWTPs
wwtp <- c("tx0119407", "tx0113859", "tx0116785")

# Parameters (Flow)
parameter <- "50050"

# Start
start <- c("2010-01-01")

# End
end <- c("2017-01-01")

# Create a data frame to populate
x <- data_frame()

# The for loop iterates through the wwtp list, and runs the function to retrieve data for each permit. The last line binds the retreived data into one dataframe
for (i in 1:length(wwtp)) {
  df1 <- as.data.frame(echoGetEffluent(wwtp[i], parameter, start, end))
  x <- rbind(x, df1)
}

knitr::kable(head(x), format = "markdown")

# filters only records that report the daily average
x <- x[x$StatisticalBaseDesc == "DAILY AV",]

# assuming everything worked so far, this plots the values
ggplot(x, aes(x = MonitoringPeriodEndDate, y = DMRValueNmbr)) + geom_line(aes(color=ID)) +
  ylab("Average Discharge (MGD)")

Develop flow and load duration curves

Use the dataRetrieval package to quickly obtain sample USGS flow data.

library(dataRetrieval)
siteNumber <- "08150000"
ChoptankInfo <- readNWISsite(siteNumber)
parameterCd <- "00060"

# Raw daily data:
rawDailyData <- readNWISdv(siteNumber, parameterCd, "2005-01-01", 
    "2016-12-31")

rawDailyData <- renameNWISColumns(rawDailyData)
head(rawDailyData)

Format the data as needed for fdc()

dataSub <- data.frame("date" = rawDailyData$Date, "flow" = rawDailyData$Flow)
head(dataSub)

#generate the FDC data frame with fdc()
fdcDF <- fdc(data = dataSub, filter = FALSE)
head(fdcDF)

Identify flows at desired flow exceedance values (percent values between 0-1)

feDF <- flowExceedance(fdcDF, c(0.05,0.25,0.50,0.75,0.95))
feDF

Plot the flow duration curve.

plotFdc(fdcDF, feDF, breaks = c(0,10,40,60,90,100))

plotFdc outputs a ggplot object. So appearance can be customized using the standard ggplot approach.

Load duration curves

Currently this function only supports bacteria LDCs. I am still trying to determine the best approach to accomodate other parameters. Hard coding the parameters and conversions simplifies use of the function but limits the scenarios in which users can utilize the function. Conversely, requiring users input the conversions and units allows the most flexibility at the expense of simplicity. I plan on identifying some middle ground.

Using the flow data from above use ldc() to create a data frame with the allowable load at each flow value. If we are interested in plotting observed measurements, the function will join a dataframe of observed values by date. The dataframe must have dates in the first column and values in the second column. Here the dataRetrieval package is used to retrieve Storet data.

#download Storet water quality data from the nearby water quality station using dataRetrieval
ecoli <- readWQPqw("TCEQMAIN-17425", 
                   parameterCd = "Escherichia coli", 
                   startDate = "2005-01-01", 
                   endDate = "2016-12-31")
#clean this up to get the data we need
library(dplyr)
ecoli <- ecoli %>%
  filter(ResultMeasure.MeasureUnitCode != "hours") %>% #filter out the holding time values
  select(date = ActivityStartDate, Value = ResultMeasureValue) # rename columns

#create the daataframe needed to plot the LDC
ldcDF <- ldc(fdcDF, "ecoli", ecoli)
head(ldcDF)

Plot it

plotLdc(ldcDF, breaks = c(0,10,40,60,90,100))


mps9506/watertools documentation built on May 20, 2019, 3:32 p.m.