The Weighted-Multiple-Linear Regression (WREG) package has been developed to replicate the functionality of the U.S. Geological Survey's standalone WREG program (1). The WREG program was deployed as a MatLab executable program and is operated exclusively through a graphical user interface. The WREG package is intended to provide an open-source resource for users wishing to apply the methods of the WREG program in a more accessible, adaptable and transparent environment using the R programming language. The package is provided for users with some familiarity with the R programming environment. For users who prefer a graphical user interface, this has been included in the package as a Shiny application or through a standalone executable.
The WREG package and program are designed to help users develop regional regression equations relating streamflow frequency statistics to basin characteristics. In addition to ordinary least-squares, WREG implements hydrologic applications of weighted and generalized least-squares (2 and 3). The original manual for the WREG program (1) provides the necessary background and documentation of the WREG program and package. This vignette guides potential users through the basic workflows required to implement ordinary, weighted and generalized least-squares regression.
In the development of the WREG package, several improvements on the WREG program were considered and implemented. An option, Legacy=TRUE
has been included to force the WREG package to return the same result as the WREG program. This is only included for testing, but may help transitioning users become comfortable with the validity of the new implementations.
Once installed, the WREG package is loaded in the standard way:
library(WREG)
suppressWarnings(library(WREG))
The package contains same sample input files formatted for the MatLab version of WREG. The input files have been condensed into list objects for easy manipulation and storage. The data can be loaded with the following command:
wregDir <- file.path(system.file("exampleDirectory", package = "WREG"), "matlabImport") importedData <- importWREG(wregPath = wregDir)
The list object importedData
contains the streamflow frequency statistics (originally contained in FlowChar.txt), the independent variables (SiteInfo.txt) and the parameters of the fitted Log-Pearson Type III distributions (LP3G.txt, LP3K.txt, and LP3s.txt).
This example data set contains several dependent variables
names(importedData$Y)
and several independent variables
names(importedData$X)
allowing users to explore additional regressions and transformations beyond the scope of this vignette.
The importedData
list also contains a matrix of the record lengths at each site and concurrent record lengths. The WREG program derives these from analyzing the USGS########.txt or AnnualTimeSeries.txt files, but those have been pre-processed here.
Ordinary leasy-squares regression is implemented through the WREG.OLS
function. Being the simplest form of regression implemented in the WREG package, ordinary least-squares regression requires the fewest inputs. Only independet and dependent variables are required to implement ordinary least-squares regression.
For this vignette we will consider a regression of the 1% peak streamflow against drainage area and precipitation. We will fit a power-law regression, taking the natural logarithms of all independent and dependent variables. The following line of code prepares the dependent variable.
Y <- log(importedData$Y$Q100)
The independent variables are defined as
X <- log(importedData$X[, c("A", "P1006")]) X0 <- rep(1,nrow(X)) X <- cbind(X0,X)
Note that it is necessary to include the vector, X0, in order to fit the leading coefficient or constant.
The WREG program provided a graphical user interface for the selection and transformation of variables. Within the R console, this task is executed at the command line before passing inputs to the WREG.OLS
function. In order for summary statistics to be computed in the correct units, a character string defining the transformation on the dependent variable must be included as the argument transY
. Here, we used the natural logarithm transformation, so
transY <- 'ln'
Once these inputs are defined, they are passed to WREG.OLS
:
Ex.OLS <- WREG.OLS(Y, X, transY)
The resulting list object, Ex.OLS
contains a wide array of outputs
names(Ex.OLS)
each of which is described in detail in the help file of WREG.OLS
. The outputs of the WREG package are designed to be more comprehensive than the outputs of the WREG program.
The entire output cann be summarized by printing the object Ex.OLS
Ex.OLS
The coefficients, and associated statistics thereof, are included in Coefs
knitr::kable(round(Ex.OLS$Coefs,digits=4))
Performance metrics are found in PerformanceMetrics
knitr::kable(round(data.frame(t(unlist(Ex.OLS$PerformanceMetrics))),digits=4))
Though not presented here, the residuals, leverage and inlufence are found in ResLevInf
.
These outputs can be used to create a wide range of performance plots for detailed analysis. Here are several plots mirroring the output of the WREG program. (It is important to remember that the units of the axes replicate the units of the dependent variable Y
.)
plot(Ex.OLS$fitted.values, Ex.OLS$residuals, main = 'Residuals versus Estimated Flow Characteristics', xlab = 'Estimated Flow Characteristic', ylab = 'Residual', xaxs = 'i', yaxs = 'i', ylim = c(-2, 2), xlim = c(7, 9)) grid(nx = 16, ny = 16)
plot(Ex.OLS$Y, Ex.OLS$ResLevInf$Leverage, main = 'Leverage Values versus Observations', xlab = 'Observation', ylab = 'Leverage Value', xaxs = 'i', yaxs = 'i', ylim = c(0, 0.5), xlim = c(6, 11)) grid(nx = 20, ny = 10) abline(Ex.OLS$LevLim, 0, col = 'red')
plot(Ex.OLS$Y, Ex.OLS$ResLevInf$Influence, main = 'Influence Value versus Observation', xlab = 'Observation', ylab = 'Influence Value', xaxs = 'i', yaxs = 'i', ylim = c(0, 1.2), xlim = c(6, 11)) grid(nx = 20, ny = 12) abline(Ex.OLS$InflLim, 0, col = 'red')
With a few additional inputs, the function WREG.WLS
can apply weighted least-squares regression as defined by (2). In order to compute weights, the function requires information about record lengths and the fitted Log-Pearson Type III distribution.
Example record lengths are included in the example data set:
RL <- importedData$recLen
The example record lengths include the concurrent record lengths, resulting in a matrix. A vector of record lengths, ignoring the concurrent records, is obtained with diag(RL)
; if a matrix is passed to WREG.WLS
when Reg='WLS'
, then the diagnoal is extracted automatically.
In addition to record lengths, the weighting designed by (2) requires information on the fitted Log-Pearson Type III distribution. The three parameters, the standard deviation, deviates and skew, are passed as a list to WREG.MLR
. The example files contain the LP3 parameters for each dependent variable, so the following code extracts those related to the 1% event. Note that the standard deviation (S) and skew (G) do not depend on the event modeled.
LP3 <- data.frame(S = importedData$LP3f$S, K = importedData$LP3k$Q100, G = importedData$LP3f$G)
With these inputs, it is then possible to compute the coefficients based on the weight least-squares algorithm:
Ex.WLS <- WREG.WLS(Y = Y, X= X, recordLengths = RL, LP3 = LP3, transY = transY)
The new coefficients are
knitr::kable(round(Ex.WLS$Coefs,digits=4))
The application of weighted least-squared regression produces a range of new performance metrics,
names(Ex.WLS$PerformanceMetrics)
Finally, the output can be summarized by printing the regression object
Ex.WLS
The final form of regression implemented in the WREG package is the hydrologic interpretation of generalized least-squares regression developed by (3). This application of generalized least-squares regression relies on an approximation of cross correlation based on inter-site distances. With this approximation, generalized least-squares regression can be applied with or without uncertainty in regional skew. The latitude and Longitude of each site, embedded in the imported data, are used to either assess the fit of estimated cross-correlation or apply generalized least-squares regression.
The WREG program provides a graphical user interface in order to determine a relationship between distance and correlation. In the WREG package, the function xcorPlot
can be used to undertake the same exploration. By manually varying the alpha
, theta
and concurrentMin
, one can assess the fitted estimates. The model can also be fit using two separate calculations of distances: the WREG program uses a nautical-mile approximation, while 'DistMeth = 2' applies the Haversine formula.
xcorPlot(object = importedData, alpha = 0.002, theta = 0.98, DistMeth = 2, concurrentMin = 25, plot = FALSE)
A change from the MatLab implementation, this figure gives the Nash-Sutcliffe efficiency of the correlation model. This is meant only as a rough guide as to improved fits. By toggling plot
to FALSE
, the NSE value will be returned; this may be useful for iterative searching.
Once alpha
and theta
have been selected, it is possible to fit a model using generalized least-squares.
Ex.GLS <- WREG.GLS(Y = Y, X = X, recordLengths = RL, LP3 = LP3, transY = transY, basinChars = importedData$BasChars, alpha = 0.002, theta = 0.98, distMeth = 2)
The new coefficients are
knitr::kable(round(Ex.GLS$Coefs,digits=4))
The same perfromance metrics calculated when appolying wegihted least-squares regression are calculated when computing generalized least-squares regression. The full output can be summarized by printing the regression object
Ex.GLS
In addition to the generalized least-squares developed by (3), the WREG package is capable of implementing the correction for regional uncertainty developed by (4). When requesting the skew-correction, it is necessary to include several additional parameters:
MSEGR
: The mean-squared-error of the regional skewTY
: The return period of the streamflow frequency statisticPeak
: A logical value indicating if the event is a peak event (TRUE
) or a low event (FALSE
)It is also necessary to add the regional skew value to the LP3 list:
LP3$GR <- importedData$LP3f$GR
Applying generalized least-squares regression with a correction for uncertainty in regional skew
Ex.GLSs <- WREG.GLS(Y = Y, X = X, recordLengths = RL, LP3 = LP3, transY = transY, basinChars = importedData$BasChars, alpha = 0.002, theta = 0.98, distMeth = 2, MSEGR = 0.302, TY = 100, peak = TRUE, regSkew = TRUE)
produces a new set of coefficients
knitr::kable(round(Ex.GLSs$Coefs,digits=4))
The full output is summarized by printing the regression object
Ex.GLSs
The WREG program is also capable of applying region-of-influence regression, calculating coefficients by any of the methods previously discussed. This same functionality is replicated in the WREG package using the WREG.RoI
function. This vignette will not explore the application of region-of-influence regression, but the associated help files may provide the guidance needed. Naturally, applying region-of-influence produces several unique outputs; again, the help files provide descriptions.
A graphical user interface was created for this package. It can be downloaded as a stand-alone executable or launch from the r-console with the command WREGgui()
.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.