This will demonstrate a sample analysis of the stream discharge data from station WSC.08HA011 (Cowichan River near Duncan) which is provided as a sample dataset (WSC.08HA011) with this package. The basin has an area of 826 km**2. While the data fram runs from 1900 onwards, data was only populated in this example starting in 1965 onwards.

Loading the data and some initial plots

We first load the package, load the data and look at the first few rows starting in 1965.

library(ggplot2)
library(plyr)       # split-apply-combine paradigm
library(reshape2)   # for melting and casting

library(BCWaterDischargeAnalysis)


data(WSC.08HA011)
head( WSC.08HA011[ as.numeric(format(WSC.08HA011$Date, "%Y"))==1965,])

Notice that the flow dataframe MUST have two variables: Date and Q - the daily discharge value.


Variable Description


Date The date of the reading in standard R date class format.

For example to convert a 4-digit year, numeric month, and numeric day
of the month use

    flow$Date <- as.Date{paste(flow$Year,'-',flow$Month,'-',flow$Day,sep="")
To convert a character string (e.g. '1/15/2001') use the

    flow$Date <- as.Date(flow$chardate, "%d/%m/%Y")
The formatting codes (%Y etc) are explained in the help for the strptime() function.

Q The average daily flow as a numeric value.

Other variables in the data frame will be ignored.

Missing values can be entered in two ways. First, the date with the missing value of Q can be dropped from the data frame. Second, the value of Q can be set to missing (NA). Both methods can be used together. An example of the impact of missing values occurs in the second half of this vignette.

We now set the variables limiting the years for which the analysis should be done and setting up other needed variables for the analysis.

Station.Code <- 'WSC-08HA011'
Station.Name <- 'Cowichan River near Duncan'
Station.Area <- 826    # square km's

start.year   <- 1965  # when do you want the analysis to start at.
end.year     <- 2012  # what is last year with complete data

Preliminary screening

Before analyzing the discharge values, you should do some preliminary screening to check for unusual data values, count missing values etc.

Here are some suggestions:

cat("Number of illegal date values ", sum(is.na(WSC.08HA011$Date)), "\n")
WSC.08HA011[ is.na(WSC.08HA011$Date),]     # sorry Microsoft, but feb 29 1900 is NOT a leap year

dim(WSC.08HA011)  # size before removing illegal dates
WSC.08HA011 <- WSC.08HA011[ !is.na(WSC.08HA011$Date),]
dim(WSC.08HA011)  # size after removing illegal dates

# get a simple summary of the data
WSC.08HA011$Year <- as.numeric(format(WSC.08HA011$Date, "%Y")) # add the year to the data frame
# Table of simple statistics by year to help check for outliers, etc
WSC.08HA011.sum <- plyr::ddply(WSC.08HA011[ WSC.08HA011$Year >= start.year & WSC.08HA011$Year <=end.year,], "Year", plyr::summarize,
         n.days   = length(Year),
         n.Q      = sum (!is.na(Q)),
         n.miss.Q = sum ( is.na(Q)),
         min.Q    = min (Q, na.rm=TRUE),
         max.Q    = max (Q, na.rm=TRUE),
         mean.Q   = mean(Q,na.rm=TRUE),
         sd.Q     = sd  (Q,na.rm=TRUE))
WSC.08HA011.sum

We see that there are no missing values in any of the year between the start and end year. This report is also generated by the compute.Q.stat.annual() function.

Let's visuallize some of the statistics overtime

# visuallize the min, max, and mean
plotdata <- reshape2::melt(WSC.08HA011.sum,
             id.var='Year',
             measure.var=c("min.Q","max.Q","mean.Q","sd.Q"),
             variable.name='Statistic',
             value.name='Value')
ggplot2::ggplot(data=plotdata, aes(x=Year, y=Value))+
  ggtitle("Summary statistics about Q over time")+
  theme(plot.title = element_text(hjust = 0.5))+
  geom_line()+
  facet_wrap(~Statistic, ncol=2, scales="free_y")

Computing the Annual Statistics (on calendar and water year)

We compute the plethora of annual statistics

stat.annual <- compute.Q.stat.annual(
                          Station.Code  =Station.Code,
                          Station.Area  =Station.Area,
                          flow          =WSC.08HA011,
                          start.year    =start.year,
                          end.year      =end.year)

This will create a list object with many elements:

names(stat.annual)

This has many, many statistics some of which are shown below:

head(stat.annual$Q.stat.annual[, 1:10])

Notice the default acton (refer to the missing-data vignette) is to propagate missing values. Despite a complete data record starting in 1965, we made this example have no data prior to (and including) 1964-12-31. This implies that a three-day rolling average that includes 1964-12-31 cannot be computed which gives rise to missing values for the 3-day rolling average minimum flow for the first few days in 1965. This also occurs for other rolling averages that include any 1964 data such as the water year computation for the 1965 water year which started on 1964-10-01.

These statistics can be then processed as you desire.

As well, the default arguments will also create several .csv files with the summary statistics and a .pdf file with the trends plotted over time in the directory you run the code in. The file names are given in the list object:

stat.annual$file.stat.csv
stat.annual$file.stat.trans.csv
stat.annual$file.stat.trend.pdf

Computing the long term summary statistics

We compute the long term summary statistics

stat.longterm <- compute.Q.stat.longterm(
                          Station.Code =Station.Code,
                          #Station.Area =Station.Area,  # removed in recent version
                          flow         =WSC.08HA011,
                          start.year   =start.year,
                          end.year     =end.year)

This will create a list object with many elements:

names(stat.longterm)

This has the long term summary statistics (mean, min, max, etc) by month some of which are shown below:

head(stat.longterm$Q.cy.stat.longterm)

Because these statistics are computed on the calendar year basis and there were no missing values in the period of record, there are no missing summary statistics (see the missing-data vignette for more details). These statistics can be then processed as you desire.

As well, the default arguments will also create several .csv file with the summary statistics and a .pdf file with the trends plotted over time in the directory you run the code in. The file names are given in the list object:

stat.longterm$file.cy.stat.csv
stat.longterm$file.cy.stat.trans.csv

Computing the long term percentile statistics

We compute the long term percentile statistics

percentile.longterm <- compute.Q.percentile.longterm(
                          Station.Code=Station.Code,
                          #Station.Area=Station.Area,
                          flow        =WSC.08HA011,
                          start.year  =start.year,
                          end.year    =end.year)

This will create a list object with many elements:

names(percentile.longterm)

This has the long term summary statistics (mean, min, max, etc) by month for the calendary year some of which are shown below:

head(percentile.longterm$Q.cy.percentile.stat)

Because these statistics are computed on the calendar year basis and there were no missing values in the period of record, there are no missing summary statistics (see missing-data vignette for more details). Similar statistics are available on the water year basis.

These statistics can be then processed as you desire.

As well, the default arguments will also create several .csv file with the summary statistics for the calendar and water years and a .pdf file with the trends plotted over time in the directory you run the code in. The file names are given in the list object:

percentile.longterm$file.cy.stat.csv
percentile.longterm$file.cy.stat.trans.csv

Computing a volume frequency analysis

We compute the volume frequency analysis

vfa.analysis <- compute.volume.frequency.analysis( 
                      Station.Code   =Station.Code, 
                      flow           =WSC.08HA011,
                      start.year     =start.year, 
                      end.year       =end.year)

Because these statistics are computed on the calendar year basis and there were no missing values in the period of record, there was no impact of missingness on the computations.

This will create a list object with many elements:

names(vfa.analysis)

A frequency plot is created as an object in the list, and also written to the directory

vfa.analysis$freqplot

The quantiles from the fitted distribution (the default is a Pearson Log III distribution) are also computed:

vfa.analysis$fitted.quantiles.trans

These statistics can be then processed as you desire.

As well, the default arguments will also create several .csv file with the summary statistics and a .pdf file with the trends plotted over time in the directory you run the code in. The file names are given in the list object:

vfa.analysis$file.stat.csv 
vfa.analysis$file.quantile.csv
vfa.analysis$file.frequency.plot


bcgov/BCWaterDischargeAnalysis documentation built on Dec. 21, 2020, 2:20 p.m.