1 Introduction

The Weighted-Multiple-Linear Regression (WREG) package has been developed to replicate the functionality of the U.S. Geological Survey's standalone WREG program (1). The WREG program was deployed as a MatLab executable program and is operated exclusively through a graphical user interface. The WREG package is intended to provide an open-source resource for users wishing to apply the methods of the WREG program in a more accessible, adaptable and transparent environment using the R programming language. The package is provided for users with some familiarity with the R programming environment. For users who prefer a graphical user interface, this has been included in the package as a Shiny application or through a standalone executable.

The WREG package and program are designed to help users develop regional regression equations relating streamflow frequency statistics to basin characteristics. In addition to ordinary least-squares, WREG implements hydrologic applications of weighted and generalized least-squares (2 and 3). The original manual for the WREG program (1) provides the necessary background and documentation of the WREG program and package. This vignette guides potential users through the basic workflows required to implement ordinary, weighted and generalized least-squares regression.

1.1 Compatability between the WREG program and package

In the development of the WREG package, several improvements on the WREG program were considered and implemented. An option, Legacy=TRUE has been included to force the WREG package to return the same result as the WREG program. This is only included for testing, but may help transitioning users become comfortable with the validity of the new implementations.

2 Package and example data

Once installed, the WREG package is loaded in the standard way:

library(WREG)
suppressWarnings(library(WREG))

The package contains same sample input files formatted for the MatLab version of WREG. The input files have been condensed into list objects for easy manipulation and storage. The data can be loaded with the following command:

wregDir <- file.path(system.file("exampleDirectory", package = "WREG"),
  "matlabImport")
importedData <- importWREG(wregPath = wregDir)

The list object importedData contains the streamflow frequency statistics (originally contained in FlowChar.txt), the independent variables (SiteInfo.txt) and the parameters of the fitted Log-Pearson Type III distributions (LP3G.txt, LP3K.txt, and LP3s.txt).

This example data set contains several dependent variables

names(importedData$Y)

and several independent variables

names(importedData$X)

allowing users to explore additional regressions and transformations beyond the scope of this vignette.

The importedData list also contains a matrix of the record lengths at each site and concurrent record lengths. The WREG program derives these from analyzing the USGS########.txt or AnnualTimeSeries.txt files, but those have been pre-processed here.

3 Ordinary least-squares

Ordinary leasy-squares regression is implemented through the WREG.OLS function. Being the simplest form of regression implemented in the WREG package, ordinary least-squares regression requires the fewest inputs. Only independet and dependent variables are required to implement ordinary least-squares regression.

For this vignette we will consider a regression of the 1% peak streamflow against drainage area and precipitation. We will fit a power-law regression, taking the natural logarithms of all independent and dependent variables. The following line of code prepares the dependent variable.

Y <- log(importedData$Y$Q100)

The independent variables are defined as

X <- log(importedData$X[, c("A", "P1006")])
X0 <- rep(1,nrow(X))
X <- cbind(X0,X)

Note that it is necessary to include the vector, X0, in order to fit the leading coefficient or constant.

The WREG program provided a graphical user interface for the selection and transformation of variables. Within the R console, this task is executed at the command line before passing inputs to the WREG.OLS function. In order for summary statistics to be computed in the correct units, a character string defining the transformation on the dependent variable must be included as the argument transY. Here, we used the natural logarithm transformation, so

transY <- 'ln'

Once these inputs are defined, they are passed to WREG.OLS:

Ex.OLS <- WREG.OLS(Y, X, transY)

The resulting list object, Ex.OLS contains a wide array of outputs

names(Ex.OLS)

each of which is described in detail in the help file of WREG.OLS. The outputs of the WREG package are designed to be more comprehensive than the outputs of the WREG program.

The entire output cann be summarized by printing the object Ex.OLS

Ex.OLS

The coefficients, and associated statistics thereof, are included in Coefs

knitr::kable(round(Ex.OLS$Coefs,digits=4))

Performance metrics are found in PerformanceMetrics

knitr::kable(round(data.frame(t(unlist(Ex.OLS$PerformanceMetrics))),digits=4))

Though not presented here, the residuals, leverage and inlufence are found in ResLevInf.

These outputs can be used to create a wide range of performance plots for detailed analysis. Here are several plots mirroring the output of the WREG program. (It is important to remember that the units of the axes replicate the units of the dependent variable Y.)

plot(Ex.OLS$fitted.values, Ex.OLS$residuals,
  main = 'Residuals versus Estimated Flow Characteristics',
  xlab = 'Estimated Flow Characteristic', ylab = 'Residual',
  xaxs = 'i', yaxs = 'i', ylim = c(-2, 2), xlim = c(7, 9))
grid(nx = 16, ny = 16)
plot(Ex.OLS$Y, Ex.OLS$ResLevInf$Leverage,
  main = 'Leverage Values versus Observations',
  xlab = 'Observation', ylab = 'Leverage Value',
  xaxs = 'i', yaxs = 'i', ylim = c(0, 0.5), xlim = c(6, 11))
grid(nx = 20, ny = 10)
abline(Ex.OLS$LevLim, 0, col = 'red')
plot(Ex.OLS$Y, Ex.OLS$ResLevInf$Influence,
  main = 'Influence Value versus Observation',
  xlab = 'Observation', ylab = 'Influence Value',
  xaxs = 'i', yaxs = 'i', ylim = c(0, 1.2), xlim = c(6, 11))
grid(nx = 20, ny = 12)
abline(Ex.OLS$InflLim, 0, col = 'red')

4 Weighted Least-Squares

With a few additional inputs, the function WREG.WLS can apply weighted least-squares regression as defined by (2). In order to compute weights, the function requires information about record lengths and the fitted Log-Pearson Type III distribution.

Example record lengths are included in the example data set:

RL <- importedData$recLen

The example record lengths include the concurrent record lengths, resulting in a matrix. A vector of record lengths, ignoring the concurrent records, is obtained with diag(RL); if a matrix is passed to WREG.WLS when Reg='WLS', then the diagnoal is extracted automatically.

In addition to record lengths, the weighting designed by (2) requires information on the fitted Log-Pearson Type III distribution. The three parameters, the standard deviation, deviates and skew, are passed as a list to WREG.MLR. The example files contain the LP3 parameters for each dependent variable, so the following code extracts those related to the 1% event. Note that the standard deviation (S) and skew (G) do not depend on the event modeled.

LP3 <- data.frame(S = importedData$LP3f$S, K = importedData$LP3k$Q100, G = importedData$LP3f$G)

With these inputs, it is then possible to compute the coefficients based on the weight least-squares algorithm:

Ex.WLS <- WREG.WLS(Y = Y, X= X, recordLengths = RL, LP3 = LP3, transY = transY)

The new coefficients are

knitr::kable(round(Ex.WLS$Coefs,digits=4))

The application of weighted least-squared regression produces a range of new performance metrics,

names(Ex.WLS$PerformanceMetrics)

Finally, the output can be summarized by printing the regression object

Ex.WLS

5 Generalized Least-Squares

The final form of regression implemented in the WREG package is the hydrologic interpretation of generalized least-squares regression developed by (3). This application of generalized least-squares regression relies on an approximation of cross correlation based on inter-site distances. With this approximation, generalized least-squares regression can be applied with or without uncertainty in regional skew. The latitude and Longitude of each site, embedded in the imported data, are used to either assess the fit of estimated cross-correlation or apply generalized least-squares regression.

5.1 Estimated cross-correlation

The WREG program provides a graphical user interface in order to determine a relationship between distance and correlation. In the WREG package, the function xcorPlot can be used to undertake the same exploration. By manually varying the alpha, theta and concurrentMin, one can assess the fitted estimates. The model can also be fit using two separate calculations of distances: the WREG program uses a nautical-mile approximation, while 'DistMeth = 2' applies the Haversine formula.

xcorPlot(object = importedData, alpha = 0.002, 
  theta = 0.98, DistMeth = 2, concurrentMin = 25, plot = FALSE)

A change from the MatLab implementation, this figure gives the Nash-Sutcliffe efficiency of the correlation model. This is meant only as a rough guide as to improved fits. By toggling plot to FALSE, the NSE value will be returned; this may be useful for iterative searching.

5.2 Without uncertainty in skew

Once alpha and theta have been selected, it is possible to fit a model using generalized least-squares.

Ex.GLS <- WREG.GLS(Y = Y, X = X, recordLengths = RL, LP3 = LP3,
  transY = transY, basinChars = importedData$BasChars, 
  alpha = 0.002, theta = 0.98, distMeth = 2)

The new coefficients are

knitr::kable(round(Ex.GLS$Coefs,digits=4))

The same perfromance metrics calculated when appolying wegihted least-squares regression are calculated when computing generalized least-squares regression. The full output can be summarized by printing the regression object

Ex.GLS

5.3 Uncertainty in skew

In addition to the generalized least-squares developed by (3), the WREG package is capable of implementing the correction for regional uncertainty developed by (4). When requesting the skew-correction, it is necessary to include several additional parameters:

It is also necessary to add the regional skew value to the LP3 list:

LP3$GR <- importedData$LP3f$GR

Applying generalized least-squares regression with a correction for uncertainty in regional skew

Ex.GLSs <- WREG.GLS(Y = Y, X = X, recordLengths = RL, LP3 = LP3,
  transY = transY, basinChars = importedData$BasChars, 
  alpha = 0.002, theta = 0.98, distMeth = 2,
  MSEGR = 0.302, TY = 100, peak = TRUE, regSkew = TRUE)

produces a new set of coefficients

knitr::kable(round(Ex.GLSs$Coefs,digits=4))

The full output is summarized by printing the regression object

Ex.GLSs

6 Region-of-influence regression

The WREG program is also capable of applying region-of-influence regression, calculating coefficients by any of the methods previously discussed. This same functionality is replicated in the WREG package using the WREG.RoI function. This vignette will not explore the application of region-of-influence regression, but the associated help files may provide the guidance needed. Naturally, applying region-of-influence produces several unique outputs; again, the help files provide descriptions.

7 Graphical User Interface

A graphical user interface was created for this package. It can be downloaded as a stand-alone executable or launch from the r-console with the command WREGgui().

8 References

  1. Eng, K., Chen, Y., and Kiang, J.E., 2009, User's guide to the weighted-multiple-linear-regression program (WREG version 1.0): U.S. Geological Survey Techniques and Methods, book 4, chap. A8, 21 p.
  2. Tasker, G.D., 1980, Hydrologic regression with weighted least squares: Water Resources Research, v. 16, no. 6, p. 1107-1113.
  3. Stedinger, J.R, and Tasker, G.D., 1985, Regional hydrologic analysis, 1, ordinary, weighted, and generalized least squares compared: Water Resources Research, v. 21, no. 9, p. 1421-1432.
  4. Griffis, V.W., and Stedinger, J.R., 2007, The use of GLS regression in regional hydrologic analyses: Journal of Hydrology, v. 344, p. 82-95.---


wfarmer-usgs/WREG documentation built on July 24, 2020, 1:28 a.m.