WREG.RoI: Region-of-Influence Regression (WREG)

Description Usage Arguments Details Value Examples

Description

WREG.ROI implements region-of-influence regression in the WREG framework.

Usage

1
2
3
4
WREG.RoI(Y, X, Reg, transY = NA, recordLengths = NA, LP3 = NA,
  regSkew = FALSE, alpha = 0.01, theta = 0.98, BasinChars = NA,
  MSEGR = NA, TY = 2, Peak = T, ROI = c("PRoI", "GRoI", "HRoI"),
  n = NA, D = 250, DistMeth = 2, Legacy = FALSE)

Arguments

Y

The dependent variable of interest, with any transformations already applied.

X

The independent variables in the regression, with any transformations already applied. Each row represents a site and each column represents a particular independe variable. (If a leading constant is used, it should be included here as a leading column of ones.) The rows must be in the same order as the dependent variables in Y.

Reg

A string indicating which type of regression should be applied. The options include: “OLS” for ordinary least-squares regression, “WLS” for weighted least-squares regression, “GLS” for generalized least-squares regression, with no uncertainty in regional skew and “GLSskew” for generalized least-squares regression with uncertainty in regional skew. (In the case of “GLSskew”, the uncertainty in regional skew must be provided as the mean squared error in regional skew.)

transY

A required character string indicating if the the dependentvariable was transformed by the common logarithm ('log10'), transformed by the natural logarithm ('ln') or untransformed ('none').

recordLengths

This input is required for “WLS”, “GLS” and “GLSskew”. For “GLS” and “GLSskew”, recordLengths should be a matrix whose rows and columns are in the same order as Y. Each (r,c) element represents the length of concurrent record between sites r and c. The diagonal elements therefore represent each site's full record length. For “WLS”, the only the at-site record lengths are needed. In the case of “WLS”, recordLengths can be a vector or the matrix described for “GLS” and “GLSskew”.

LP3

A dataframe containing the fitted Log-Pearson Type III standard deviate, standard deviation and skew for each site. The names of this data frame are S, K and G. For “GLSskew”, the regional skew value must also be provided in a variable called GR. The order of the rows must be the same as Y.

regSkew

A logical vector indicating if regional skews are provided with an adjustment required for uncertainty therein (TRUE). The default is FALSE.

alpha

A number, required only for “GLS” and “GLSskew”. alpha is a parameter used in the estimated cross-correlation between site records. See equation 20 in the WREG v. 1.05 manual. The arbitrary, default value is 0.01. The user should fit a different value as needed.

theta

A number, required only for “GLS” and “GLSskew”. theta is a parameter used in the estimated cross-correlation between site records. See equation 20 in the WREG v. 1.05 manual. The arbitrary, default value is 0.98. The user should fit a different value as needed.

BasinChars

A dataframe containing three variables: StationID is the numerical identifier (without a leading zero) of each site, Lat is the latitude of the site, in decimal degrees, and Long is the longitude of the site, in decimal degrees. The sites must be presented in the same order as Y. Required only for “GLS”.

MSEGR

A number. The mean squared error of the regional skew.

TY

A number. The return period of the event being modeled. Required only for “GLSskew”. The default value is 2. (See the Legacy details below.)

Peak

A logical. Indicates if the event being modeled is a peak flow event or a low-flow event. TRUE indicates a peak flow, while FALSE indicates a low-flow event.

ROI

A string indicating how to define the region of influence. “PRoI” signifies physiographic, independent or predictor-variable region of influence. “GRoI” calls for a geographic region of influence. “HRoI” requests a hybrid region of influence. Details on each approach are provided in the manual for WREG v. 1.0.

n

The number of sites to include in the region of influence.

D

Required for “HRoI”, the geographic limit within which to search for a physiographic region of influence. In WREG v. 1.05 (see Legacy below), this is interpretted in meters. Elsewise, this is interpretted as miles.

DistMeth

Required for “GLS” and “GLSskew”. A value of 1 indicates that the "Nautical Mile" approximation should be used to calculate inter-site distances. A value of 2 designates the Haversine approximation. See Dist.WREG. The default value is 2. (See the Legacy details below.)

Legacy

A logical. A value of TRUE forces the WREG program to behave identically to WREG v. 1.05, with BUGS and all. It will force TY=2 and DistMeth=1. For ROI regressions, it will also require a specific calculation for weighing matrices in “WLS” (Omega.WLS.ROImatchMatLab), “GLS”, and “GLSskew” (see Omega.GLS.ROImatchMatLab). Legacy also forces the distance limit D to be interpretted in meters.

Details

The support for region-of-influence regression is described in the manual of WREG v. 1.0. WREG.RoI iterates through the sites of Y, defines a region of influence and implements the specified regression by calling a WREG function.

The logical handle Legacy has been included to test that this program returns the same results as WREG v. 1.05. In the development of this code, some idiosyncrasies of the MatLab code for WREG v. 1.05 became apparent. Setting Legacy equal to TRUE forces the code to use the same idiosycrasies as WREG v. 1.05. Some of these idosyncrasies may be bugs in the code. Further analysis is needed. For information on the specific idiosyncrasies, see the notes for the Legacy input and the links to other functions in this package.

Value

As with other WREG functions, WREG.RoI returns a large list of regression parameters and metrics. This list varies depending on the Reg specified, but may contain:

fitted.values

A vector of model estimates from the regression model.

residuals

A vector of model residuals.

PerformanceMetrics

A list of four elements. These represent approximate performance regression across all of the region-of-influence regressions. These include the mean squared error of residuals (MSE), the coefficient of determination (R2), the adjusted coefficient of determination (R2_adj) and the root mean squared error (RMSE, in percent). Details on the appropriateness and applicability of performance metrics can be found in the WREG manual.

Coefficients

A list composed of four elements: (1) Values contains the regression coefficeints estimated for the model built around each observation, (2) StanError contains the standard errors of each regression coefficient for the ROI regressions around each observations, (3) TStatistic contains the Student's T-statistic of each regression coefficient for the ROI regression built around each observation and (4) pValue contains the significance probability of each regression coefficient for the ROI regressions built around each observation. Each element of the list is a matrix the same size as X

ROI.Regressions

A list of elements and outputs from each individual ROI regression. These include a matrix of the sites used in each ROI regression (Sites.Used, a length(Y)-by-n matrix), a matrix of the geographic distances between the selected sites in each ROI regression ( Gdist.Used, a length(Y)-by-n matrix), a matrix of the physiographic distances between the selected sites in each ROI regression ( Pdist.Used, a length(Y)-by-n matrix), a matrix of the observations used in each ROI regression (Obs.Used, a length(Y)-by- n matrix), a matrix of model fits in the region of influence (Fits, a length(Y)-by-n matrix), a matrix of model residuals in the region of influence (Residuals, a length(Y)-by-n matrix), a matrix of leverages for each ROI regression (Leverage, a length(Y)-by-n matrix), a matrix of influences for each ROI regression (Influence, a length(Y)-by-n matrix), a logical matrix indicating if the leverage is significant in each ROI regression (Leverage.Significance, a length(Y) -by-n matrix), a logical matrix indicating if the influence is significant in each ROI regression (Influence.Significance, a length(Y) -by-n matrix), a vector of critical leverage values for each ROI regression (Leverage.Limits), a vector of critical influence values for each ROI regression (Influence.Limits) and a list of performance metrics for each ROI regression (PerformanceMetrics). The last element, PerformanceMetrics is identical to the same output from other functions excpet that every element is multiplied by the number of observations so as to capture the individual performance of each ROI regression.

ROI.InputParameters

A list of input parameters to record the controls on the ROI regression. D idicates the limit used in “HRoI”. n indicates the size of the region of influence. ROI is a string indicating the type of region of influence. Legacy is a logical indicating if the WREG v. 1.05 idiosycrasies were implemented.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# Import some example data
peakFQdir <- paste0(
  file.path(system.file("exampleDirectory", package = "WREG"),
    "pfqImport"))
gisFilePath <- file.path(peakFQdir, "pfqSiteInfo.txt")
importedData <- importPeakFQ(pfqPath = peakFQdir, gisFile = gisFilePath)

# Run a simple regression
Y <- importedData$Y$AEP_0.5
X <- importedData$X[c("A")]
transY <- "none"
basinChars <- importedData$BasChars
#result <- WREG.OLS(Y, X, transY)

result <- WREG.RoI(Y, X, Reg = "OLS", transY, BasinChars = basinChars,
  ROI='GRoI', n = 10L)

USGS-R/WREG documentation built on May 9, 2019, 6:48 p.m.