watertools
: An R-package containing functions used in watershed
assessments, TMDLs, and water quality modeling. Specifically: *
Download EPA Echo DMR reports * Flow and Load Duration Curves * More
to be added as I get time…
You can install watertools from github with:
install.packages("devtools")
devtools::install_github("mps9506/watertools")
To access EPA ECHO wastewater discharge data you will need the NPDES
permit number, date range of interest, and parameter number of interest.
The echoGetEffluent
function will output a tibble
with various
statisical reports (typically mean and max) the plant reported to the
EPA during the requested period of interest.
## Load required packages
library(watertools)
library(ggplot2)
library(tibble)
## The following variables are needed by the function
wwtp <- "tx0119407" #NPDES permit number
parameter <- "50050" #Parameter code, this is flow
start <- c("2010-01-01") #Start date in yyyy-mm-dd format
end <- c("2017-01-01") #End date in yyyy-mm-dd format
df <- as.data.frame(echoGetEffluent(wwtp, parameter, start, end))
#> [1] "# Status message: Success"
#> [2] "# Status message: OK"
#> [3] "# Status message: Success: (200) OK"
#> No encoding supplied: defaulting to UTF-8.
knitr::kable(head(df), format = "markdown")
Name
Outfall
ID
RegistryID
Location
City
State
Zip
Status
LimitBeginDate
LimitEndDate
LimitValueNmbr
LimitUnitCode
LimitUnitDesc
StdUnitCode
StdUnitDesc
LimitValueStdUnit
StatisticalBaseCode
StatisticalBaseDesc
StatisticalBaseTypeCode
StatisticalBaseTypeDesc
DMREventId
MonitoringPeriodEndDate
DMRFormValueId
ValueTypeCode
ValueTypeDesc
DMRValueId
DMRValueNmbr
DMRUnitCode
DMRUnitDesc
DMRValueStdUnits
DMRQualifierCode
ValueReceivedDate
DaysLate
NODICode
NODEDesc
ExceedancePct
NPDESViolations
SKIDMORE WSC WWTP
001
TX0119407
110009771693
1000 FT N OF THE END OF BLACK RANCH RD AND APPROX
SKIDMORE
TX
78387
78387
16648
18322
0.131
03
MGD
MGD
MGD
NA
DB
DAILY AV
AVG
Average
3403423151
16678
3442281023
Q1
Quantity1
3612155723
0.0492
03
MGD
0.0492
NA
16695
NA
NA
NA
NA
NA
SKIDMORE WSC WWTP
001
TX0119407
110009771693
1000 FT N OF THE END OF BLACK RANCH RD AND APPROX
SKIDMORE
TX
78387
78387
16648
18322
NA
03
MGD
MGD
MGD
NA
DD
DAILY MX
MAX
Maximum
3403423151
16678
3442281032
Q2
Quantity2
3612155724
0.0534
03
MGD
0.0534
NA
16695
NA
NA
NA
NA
NA
SKIDMORE WSC WWTP
001
TX0119407
110009771693
1000 FT N OF THE END OF BLACK RANCH RD AND APPROX
SKIDMORE
TX
78387
78387
16648
18322
0.131
03
MGD
MGD
MGD
NA
DB
DAILY AV
AVG
Average
3403423173
16708
3442281323
Q1
Quantity1
3613394763
0.0512
03
MGD
0.0512
NA
16728
NA
NA
NA
NA
NA
SKIDMORE WSC WWTP
001
TX0119407
110009771693
1000 FT N OF THE END OF BLACK RANCH RD AND APPROX
SKIDMORE
TX
78387
78387
16648
18322
NA
03
MGD
MGD
MGD
NA
DD
DAILY MX
MAX
Maximum
3403423173
16708
3442281328
Q2
Quantity2
3613394764
0.0710
03
MGD
0.0710
NA
16728
NA
NA
NA
NA
NA
SKIDMORE WSC WWTP
001
TX0119407
110009771693
1000 FT N OF THE END OF BLACK RANCH RD AND APPROX
SKIDMORE
TX
78387
78387
16648
18322
0.131
03
MGD
MGD
MGD
NA
DB
DAILY AV
AVG
Average
3403423185
16739
3442281496
Q1
Quantity1
3614856996
0.0480
03
MGD
0.0480
NA
16759
NA
NA
NA
NA
NA
SKIDMORE WSC WWTP
001
TX0119407
110009771693
1000 FT N OF THE END OF BLACK RANCH RD AND APPROX
SKIDMORE
TX
78387
78387
16648
18322
NA
03
MGD
MGD
MGD
NA
DD
DAILY MX
MAX
Maximum
3403423185
16739
3442281524
Q2
Quantity2
3614856997
0.0613
03
MGD
0.0613
NA
16759
NA
NA
NA
NA
NA
Typically we are interested in the average discharge, to quickly plot
this with ggplot
:
df <- df[df$StatisticalBaseDesc == "DAILY AV",]
ggplot(df) +
geom_line(aes(x = MonitoringPeriodEndDate, y = DMRValueNmbr)) +
ylab("Average Discharge (MGD)") + xlab("Date")
Gathering data for multiple plants at once has not been included in the function yet. For the time being, a simple loop can be used. Note, I have not tried this with a large number of plants at one time.
# List of WWTPs
wwtp <- c("tx0119407", "tx0113859", "tx0116785")
# Parameters (Flow)
parameter <- "50050"
# Start
start <- c("2010-01-01")
# End
end <- c("2017-01-01")
# Create a data frame to populate
x <- data_frame()
# The for loop iterates through the wwtp list, and runs the function to retrieve data for each permit. The last line binds the retreived data into one dataframe
for (i in 1:length(wwtp)) {
df1 <- as.data.frame(echoGetEffluent(wwtp[i], parameter, start, end))
x <- rbind(x, df1)
}
#> [1] "# Status message: Success"
#> [2] "# Status message: OK"
#> [3] "# Status message: Success: (200) OK"
#> No encoding supplied: defaulting to UTF-8.
#> [1] "# Status message: Success"
#> [2] "# Status message: OK"
#> [3] "# Status message: Success: (200) OK"
#> No encoding supplied: defaulting to UTF-8.
#> [1] "# Status message: Success"
#> [2] "# Status message: OK"
#> [3] "# Status message: Success: (200) OK"
#> No encoding supplied: defaulting to UTF-8.
knitr::kable(head(x), format = "markdown")
Name
Outfall
ID
RegistryID
Location
City
State
Zip
Status
LimitBeginDate
LimitEndDate
LimitValueNmbr
LimitUnitCode
LimitUnitDesc
StdUnitCode
StdUnitDesc
LimitValueStdUnit
StatisticalBaseCode
StatisticalBaseDesc
StatisticalBaseTypeCode
StatisticalBaseTypeDesc
DMREventId
MonitoringPeriodEndDate
DMRFormValueId
ValueTypeCode
ValueTypeDesc
DMRValueId
DMRValueNmbr
DMRUnitCode
DMRUnitDesc
DMRValueStdUnits
DMRQualifierCode
ValueReceivedDate
DaysLate
NODICode
NODEDesc
ExceedancePct
NPDESViolations
SKIDMORE WSC WWTP
001
TX0119407
110009771693
1000 FT N OF THE END OF BLACK RANCH RD AND APPROX
SKIDMORE
TX
78387
78387
16648
18322
0.131
03
MGD
MGD
MGD
NA
DB
DAILY AV
AVG
Average
3403423151
16678
3442281023
Q1
Quantity1
3612155723
0.0492
03
MGD
0.0492
NA
16695
NA
NA
NA
NA
NA
SKIDMORE WSC WWTP
001
TX0119407
110009771693
1000 FT N OF THE END OF BLACK RANCH RD AND APPROX
SKIDMORE
TX
78387
78387
16648
18322
NA
03
MGD
MGD
MGD
NA
DD
DAILY MX
MAX
Maximum
3403423151
16678
3442281032
Q2
Quantity2
3612155724
0.0534
03
MGD
0.0534
NA
16695
NA
NA
NA
NA
NA
SKIDMORE WSC WWTP
001
TX0119407
110009771693
1000 FT N OF THE END OF BLACK RANCH RD AND APPROX
SKIDMORE
TX
78387
78387
16648
18322
0.131
03
MGD
MGD
MGD
NA
DB
DAILY AV
AVG
Average
3403423173
16708
3442281323
Q1
Quantity1
3613394763
0.0512
03
MGD
0.0512
NA
16728
NA
NA
NA
NA
NA
SKIDMORE WSC WWTP
001
TX0119407
110009771693
1000 FT N OF THE END OF BLACK RANCH RD AND APPROX
SKIDMORE
TX
78387
78387
16648
18322
NA
03
MGD
MGD
MGD
NA
DD
DAILY MX
MAX
Maximum
3403423173
16708
3442281328
Q2
Quantity2
3613394764
0.0710
03
MGD
0.0710
NA
16728
NA
NA
NA
NA
NA
SKIDMORE WSC WWTP
001
TX0119407
110009771693
1000 FT N OF THE END OF BLACK RANCH RD AND APPROX
SKIDMORE
TX
78387
78387
16648
18322
0.131
03
MGD
MGD
MGD
NA
DB
DAILY AV
AVG
Average
3403423185
16739
3442281496
Q1
Quantity1
3614856996
0.0480
03
MGD
0.0480
NA
16759
NA
NA
NA
NA
NA
SKIDMORE WSC WWTP
001
TX0119407
110009771693
1000 FT N OF THE END OF BLACK RANCH RD AND APPROX
SKIDMORE
TX
78387
78387
16648
18322
NA
03
MGD
MGD
MGD
NA
DD
DAILY MX
MAX
Maximum
3403423185
16739
3442281524
Q2
Quantity2
3614856997
0.0613
03
MGD
0.0613
NA
16759
NA
NA
NA
NA
NA
# filters only records that report the daily average
x <- x[x$StatisticalBaseDesc == "DAILY AV",]
# assuming everything worked so far, this plots the values
ggplot(x, aes(x = MonitoringPeriodEndDate, y = DMRValueNmbr)) + geom_line(aes(color=ID)) +
ylab("Average Discharge (MGD)")
Use the dataRetrieval
package to quickly obtain sample USGS flow data.
library(dataRetrieval)
siteNumber <- "08150000"
ChoptankInfo <- readNWISsite(siteNumber)
parameterCd <- "00060"
# Raw daily data:
rawDailyData <- readNWISdv(siteNumber, parameterCd, "2005-01-01",
"2016-12-31")
rawDailyData <- renameNWISColumns(rawDailyData)
head(rawDailyData)
#> agency_cd site_no Date Flow Flow_cd
#> 1 USGS 08150000 2005-01-01 200 A
#> 2 USGS 08150000 2005-01-02 196 A
#> 3 USGS 08150000 2005-01-03 197 A
#> 4 USGS 08150000 2005-01-04 195 A
#> 5 USGS 08150000 2005-01-05 191 A
#> 6 USGS 08150000 2005-01-06 190 A
Format the data as needed for fdc()
dataSub <- data.frame("date" = rawDailyData$Date, "flow" = rawDailyData$Flow)
head(dataSub)
#> date flow
#> 1 2005-01-01 200
#> 2 2005-01-02 196
#> 3 2005-01-03 197
#> 4 2005-01-04 195
#> 5 2005-01-05 191
#> 6 2005-01-06 190
#generate the FDC data frame with fdc()
fdcDF <- fdc(data = dataSub, filter = FALSE)
head(fdcDF)
#> date flow cumdist
#> 1 2007-05-25 11700 0.0002281542
#> 2 2015-05-21 7570 0.0004563085
#> 3 2007-05-26 6760 0.0006844627
#> 4 2007-06-29 5940 0.0009126169
#> 5 2007-03-27 5910 0.0011407712
#> 6 2016-09-26 4710 0.0013689254
Identify flows at desired flow exceedance values (percent values between 0-1)
feDF <- flowExceedance(fdcDF, c(0.05,0.25,0.50,0.75,0.95))
feDF
#> exceedance flow
#> 5% 95 42.9
#> 25% 75 62.2
#> 50% 50 86.9
#> 75% 25 118.0
#> 95% 5 239.0
Plot the flow duration curve.
plotFdc(fdcDF, feDF, breaks = c(0,10,40,60,90,100))
plotFdc
outputs a ggplot
object. So appearance can be customized
using the standard ggplot
approach.
Currently this function only supports bacteria LDCs. I am still trying to determine the best approach to accomodate other parameters. Hard coding the parameters and conversions simplifies use of the function but limits the scenarios in which users can utilize the function. Conversely, requiring users input the conversions and units allows the most flexibility at the expense of simplicity. I plan on identifying some middle ground.
Using the flow data from above use ldc()
to create a data frame with
the allowable load at each flow value. If we are interested in plotting
observed measurements, the function will join a dataframe of observed
values by date. The dataframe must have dates in the first column and
values in the second column. Here the dataRetrieval
package is used to
retrieve Storet data.
#download Storet water quality data from the nearby water quality station using dataRetrieval
ecoli <- readWQPqw("TCEQMAIN-17425",
parameterCd = "Escherichia coli",
startDate = "2005-01-01",
endDate = "2016-12-31")
#clean this up to get the data we need
library(dplyr)
#> Warning: package 'dplyr' was built under R version 3.4.3
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
ecoli <- ecoli %>%
filter(ResultMeasure.MeasureUnitCode != "hours") %>% #filter out the holding time values
select(date = ActivityStartDate, Value = ResultMeasureValue) # rename columns
#create the daataframe needed to plot the LDC
ldcDF <- ldc(fdcDF, "ecoli", ecoli)
#> Joining, by = "date"
head(ldcDF)
#> date flow cumdist loadcurve Value measload
#> 1 2007-05-25 11700 0.0002281542 3.606736e+13 NA NA
#> 2 2015-05-21 7570 0.0004563085 2.333589e+13 NA NA
#> 3 2007-05-26 6760 0.0006844627 2.083892e+13 NA NA
#> 4 2007-06-29 5940 0.0009126169 1.831112e+13 NA NA
#> 5 2007-03-27 5910 0.0011407712 1.821864e+13 NA NA
#> 6 2016-09-26 4710 0.0013689254 1.451942e+13 NA NA
Plot it
plotLdc(ldcDF, breaks = c(0,10,40,60,90,100))
#> Warning: Removed 4361 rows containing missing values (geom_point).
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.