This will demonstrate a sample analysis of the stream discharge data from station WSC.08HA011 (Cowichan River near Duncan) which is provided as a sample dataset (WSC.08HA011) with this package. The basin has an area of 826 km**2. While the data fram runs from 1900 onwards, data was only populated in this example starting in 1965 onwards.
We first load the package, load the data and look at the first few rows starting in 1965.
library(ggplot2) library(plyr) # split-apply-combine paradigm library(reshape2) # for melting and casting library(BCWaterDischargeAnalysis) data(WSC.08HA011) head( WSC.08HA011[ as.numeric(format(WSC.08HA011$Date, "%Y"))==1965,])
Notice that the flow dataframe MUST have two variables: Date
and Q
- the daily discharge value.
Variable Description
Date The date of the reading in standard R date class format.
For example to convert a 4-digit year, numeric month, and numeric day
of the month use
flow$Date <- as.Date{paste(flow$Year,'-',flow$Month,'-',flow$Day,sep="")
To convert a character string (e.g. '1/15/2001') use the
flow$Date <- as.Date(flow$chardate, "%d/%m/%Y")
The formatting codes (%Y etc) are explained in the help for the strptime() function.
Other variables in the data frame will be ignored.
Missing values can be entered in two ways. First, the date with the missing value
of Q
can be dropped from the data frame. Second, the value of Q
can be set to missing (NA).
Both methods can be used together. An example of the impact of missing values occurs in the second half
of this vignette.
We now set the variables limiting the years for which the analysis should be done and setting up other needed variables for the analysis.
Station.Code <- 'WSC-08HA011' Station.Name <- 'Cowichan River near Duncan' Station.Area <- 826 # square km's start.year <- 1965 # when do you want the analysis to start at. end.year <- 2012 # what is last year with complete data
Before analyzing the discharge values, you should do some preliminary screening to check for unusual data values, count missing values etc.
Here are some suggestions:
cat("Number of illegal date values ", sum(is.na(WSC.08HA011$Date)), "\n") WSC.08HA011[ is.na(WSC.08HA011$Date),] # sorry Microsoft, but feb 29 1900 is NOT a leap year dim(WSC.08HA011) # size before removing illegal dates WSC.08HA011 <- WSC.08HA011[ !is.na(WSC.08HA011$Date),] dim(WSC.08HA011) # size after removing illegal dates # get a simple summary of the data WSC.08HA011$Year <- as.numeric(format(WSC.08HA011$Date, "%Y")) # add the year to the data frame # Table of simple statistics by year to help check for outliers, etc WSC.08HA011.sum <- plyr::ddply(WSC.08HA011[ WSC.08HA011$Year >= start.year & WSC.08HA011$Year <=end.year,], "Year", plyr::summarize, n.days = length(Year), n.Q = sum (!is.na(Q)), n.miss.Q = sum ( is.na(Q)), min.Q = min (Q, na.rm=TRUE), max.Q = max (Q, na.rm=TRUE), mean.Q = mean(Q,na.rm=TRUE), sd.Q = sd (Q,na.rm=TRUE)) WSC.08HA011.sum
We see that there are no missing values in any of the year between the start and end year. This
report is also generated by the compute.Q.stat.annual()
function.
Let's visuallize some of the statistics overtime
# visuallize the min, max, and mean plotdata <- reshape2::melt(WSC.08HA011.sum, id.var='Year', measure.var=c("min.Q","max.Q","mean.Q","sd.Q"), variable.name='Statistic', value.name='Value') ggplot2::ggplot(data=plotdata, aes(x=Year, y=Value))+ ggtitle("Summary statistics about Q over time")+ theme(plot.title = element_text(hjust = 0.5))+ geom_line()+ facet_wrap(~Statistic, ncol=2, scales="free_y")
We compute the plethora of annual statistics
stat.annual <- compute.Q.stat.annual( Station.Code =Station.Code, Station.Area =Station.Area, flow =WSC.08HA011, start.year =start.year, end.year =end.year)
This will create a list object with many elements:
names(stat.annual)
This has many, many statistics some of which are shown below:
head(stat.annual$Q.stat.annual[, 1:10])
Notice the default acton (refer to the missing-data
vignette) is to propagate missing values. Despite a complete data record starting
in 1965, we made this example have no data prior to (and including) 1964-12-31.
This implies that a three-day rolling average that
includes 1964-12-31 cannot be computed which gives rise to missing values for the 3-day rolling average minimum
flow for the first few days in 1965. This also occurs for other rolling averages that include any 1964 data such as the water year computation
for the 1965 water year which started on 1964-10-01.
These statistics can be then processed as you desire.
As well, the default arguments will also create several .csv
files with the summary statistics and
a .pdf
file with the trends plotted over time in the directory you run the code in. The
file names are given in the list object:
stat.annual$file.stat.csv stat.annual$file.stat.trans.csv stat.annual$file.stat.trend.pdf
We compute the long term summary statistics
stat.longterm <- compute.Q.stat.longterm( Station.Code =Station.Code, #Station.Area =Station.Area, # removed in recent version flow =WSC.08HA011, start.year =start.year, end.year =end.year)
This will create a list object with many elements:
names(stat.longterm)
This has the long term summary statistics (mean, min, max, etc) by month some of which are shown below:
head(stat.longterm$Q.cy.stat.longterm)
Because these statistics are computed on the calendar year basis and there were no missing values
in the period of record, there are no missing summary statistics (see the missing-data
vignette
for more details).
These statistics can be then processed as you desire.
As well, the default arguments will also create several .csv
file with the summary statistics and
a .pdf
file with the trends plotted over time in the directory you run the code in. The
file names are given in the list object:
stat.longterm$file.cy.stat.csv stat.longterm$file.cy.stat.trans.csv
We compute the long term percentile statistics
percentile.longterm <- compute.Q.percentile.longterm( Station.Code=Station.Code, #Station.Area=Station.Area, flow =WSC.08HA011, start.year =start.year, end.year =end.year)
This will create a list object with many elements:
names(percentile.longterm)
This has the long term summary statistics (mean, min, max, etc) by month for the calendary year some of which are shown below:
head(percentile.longterm$Q.cy.percentile.stat)
Because these statistics are computed on the calendar year basis and there were no missing values
in the period of record, there are no missing summary statistics (see missing-data
vignette
for more details). Similar statistics are available on the water year basis.
These statistics can be then processed as you desire.
As well, the default arguments will also create several .csv
file with the summary statistics
for the calendar and water years and
a .pdf
file with the trends plotted over time in the directory you run the code in. The
file names are given in the list object:
percentile.longterm$file.cy.stat.csv percentile.longterm$file.cy.stat.trans.csv
We compute the volume frequency analysis
vfa.analysis <- compute.volume.frequency.analysis( Station.Code =Station.Code, flow =WSC.08HA011, start.year =start.year, end.year =end.year)
Because these statistics are computed on the calendar year basis and there were no missing values in the period of record, there was no impact of missingness on the computations.
This will create a list object with many elements:
names(vfa.analysis)
A frequency plot is created as an object in the list, and also written to the directory
vfa.analysis$freqplot
The quantiles from the fitted distribution (the default is a Pearson Log III distribution) are also computed:
vfa.analysis$fitted.quantiles.trans
These statistics can be then processed as you desire.
As well, the default arguments will also create several .csv
file with the summary statistics and
a .pdf
file with the trends plotted over time in the directory you run the code in. The
file names are given in the list object:
vfa.analysis$file.stat.csv vfa.analysis$file.quantile.csv vfa.analysis$file.frequency.plot
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.