R/WREG.RoI.R

Defines functions WREG.RoI

Documented in WREG.RoI

#'Region-of-Influence Regression (WREG)
#'
#'@description \code{WREG.ROI} implements region-of-influence regression in the
#'WREG framework.
#'
#'@param Y The dependent variable of interest, with any transformations already 
#'  applied.
#'@param X The independent variables in the regression, with any transformations
#'  already applied.  Each row represents a site and each column represents a
#'  particular independe variable.  (If a leading constant is used, it should be
#'  included here as a leading column of ones.)  The rows must be in the same
#'  order as the dependent variables in \code{Y}.
#'@param Reg A string indicating which type of regression should be applied. The
#'  options include: \dQuote{OLS} for ordinary least-squares regression, 
#'  \dQuote{WLS} for weighted least-squares regression, \dQuote{GLS} for
#'  generalized least-squares regression, with no uncertainty in regional skew
#'  and \dQuote{GLSskew} for generalized least-squares regression with
#'  uncertainty in regional skew. (In the case of \dQuote{GLSskew}, the
#'  uncertainty in regional skew must be provided as the mean squared error in
#'  regional skew.)
#'@param transY A required character string indicating if the the 
#'  dependentvariable was transformed by the common logarithm ('log10'), 
#'  transformed by the natural logarithm ('ln') or untransformed ('none').
#'@param recordLengths This input is required for \dQuote{WLS}, \dQuote{GLS} and
#'  \dQuote{GLSskew}.  For \dQuote{GLS} and \dQuote{GLSskew},
#'  \code{recordLengths} should be a matrix whose rows and columns are in the
#'  same order as \code{Y}.  Each \code{(r,c)} element represents the length of
#'  concurrent record between sites \code{r} and \code{c}.  The diagonal
#'  elements therefore represent each site's full record length.  For
#'  \dQuote{WLS}, the only the at-site record lengths are needed. In the case of
#'  \dQuote{WLS}, \code{recordLengths} can be a vector or the matrix described
#'  for \dQuote{GLS} and \dQuote{GLSskew}.
#'@param LP3 A dataframe containing the fitted Log-Pearson Type III standard 
#'  deviate, standard deviation and skew for each site.  The names of this data
#'  frame are \code{S}, \code{K} and \code{G}.  For \dQuote{GLSskew}, the
#'  regional skew value must also be provided in a variable called \code{GR}. 
#'  The order of the rows must be the same as \code{Y}.
#'@param regSkew A logical vector indicating if regional skews are provided with
#'  an adjustment required for uncertainty therein (\code{TRUE}).  The default 
#'  is \code{FALSE}.
#'@param alpha A number, required only for \dQuote{GLS} and \dQuote{GLSskew}. 
#'  \code{alpha} is a parameter used in the estimated cross-correlation between
#'  site records.  See equation 20 in the WREG v. 1.05 manual.  The arbitrary,
#'  default value is 0.01.  The user should fit a different value as needed.
#'@param theta A number, required only for \dQuote{GLS} and \dQuote{GLSskew}. 
#'  \code{theta} is a parameter used in the estimated cross-correlation between
#'  site records.  See equation 20 in the WREG v. 1.05 manual.  The arbitrary,
#'  default value is 0.98.  The user should fit a different value as needed.
#'@param BasinChars A dataframe containing three variables: \code{StationID} is
#'  the numerical identifier (without a leading zero) of each site, \code{Lat}
#'  is the latitude of the site, in decimal degrees, and \code{Long} is the
#'  longitude of the site, in decimal degrees.  The sites must be presented in
#'  the same order as \code{Y}.  Required only for \dQuote{GLS}.
#'@param MSEGR A number. The mean squared error of the regional skew.
#'@param TY A number.  The return period of the event being modeled.  Required
#'  only for \dQuote{GLSskew}.  The default value is \code{2}.  (See the
#'  \code{Legacy} details below.)
#'@param Peak A logical.  Indicates if the event being modeled is a peak flow
#'  event or a low-flow event.  \code{TRUE} indicates a peak flow, while
#'  \code{FALSE} indicates a low-flow event.
#'@param ROI A string indicating how to define the region of influence. 
#'  \dQuote{PRoI} signifies physiographic, independent or predictor-variable
#'  region of influence. \dQuote{GRoI} calls for a geographic region of
#'  influence. \dQuote{HRoI} requests a hybrid region of influence.  Details on
#'  each approach are provided in the manual for WREG v. 1.0.
#'@param n The number of sites to include in the region of influence.
#'@param D Required for \dQuote{HRoI}, the geographic limit within which to
#'  search for a physiographic region of influence.  In WREG v. 1.05 (see
#'  \code{Legacy} below), this is interpretted in meters.  Elsewise, this is
#'  interpretted as miles.
#'@param DistMeth Required for \dQuote{GLS} and \dQuote{GLSskew}.  A value of
#'  \code{1} indicates that the "Nautical Mile" approximation should be used to
#'  calculate inter-site distances.  A value of \code{2} designates the
#'  Haversine approximation.  See \code{\link{Dist.WREG}}.  The default value is
#'  \code{2}.  (See the \code{Legacy} details below.)
#'@param Legacy A logical.  A value of \code{TRUE} forces the WREG program to
#'  behave identically to WREG v. 1.05, with BUGS and all.  It will force
#'  \code{TY=2} and \code{DistMeth=1}.  For ROI regressions, it will also
#'  require a specific calculation for weighing matrices in \dQuote{WLS}
#'  (\code{\link{Omega.WLS.ROImatchMatLab}}), \dQuote{GLS}, and \dQuote{GLSskew}
#'  (see \code{\link{Omega.GLS.ROImatchMatLab}}). \code{Legacy} also forces the
#'  distance limit \code{D} to be interpretted in meters.
#'  
#'@details The support for region-of-influence regression is described in the
#'  manual of WREG v. 1.0.  \code{WREG.RoI} iterates through the sites of
#'  \code{Y}, defines a region of influence and implements the specified
#'  regression by calling a WREG function.
#'  
#'  The logical handle \code{Legacy} has been included to test that this program
#'  returns the same results as WREG v. 1.05.  In the development of this code,
#'  some idiosyncrasies of the MatLab code for WREG v. 1.05 became apparent. 
#'  Setting \code{Legacy} equal to \code{TRUE} forces the code to use the same
#'  idiosycrasies as WREG v. 1.05.  Some of these idosyncrasies may be bugs in
#'  the code.  Further analysis is needed.  For information on the specific
#'  idiosyncrasies, see the notes for the \code{Legacy} input and the links to
#'  other functions in this package.
#'  
#'@return As with other WREG functions, \code{WREG.RoI} returns a large list
#'  of regression parameters and metrics.  This list varies depending on the
#'  \code{Reg} specified, but may contain: \item{fitted.values}{A vector of
#'  model estimates from the regression model.} \item{residuals}{A vector of
#'  model residuals.} \item{PerformanceMetrics}{A list of four elements.  These
#'  represent approximate performance regression across all of the
#'  region-of-influence regressions. These include the mean squared error of
#'  residuals (\code{MSE}), the coefficient of determination (\code{R2}), the
#'  adjusted coefficient of determination (\code{R2_adj}) and the root mean
#'  squared error (\code{RMSE}, in percent).  Details on the appropriateness and
#'  applicability of performance metrics can be found in the WREG manual.} 
#'  \item{Coefficients}{A list composed of four elements: (1) \code{Values}
#'  contains the regression coefficeints estimated for the model built around
#'  each observation, (2) \code{StanError} contains the standard errors
#'  of each regression coefficient for the ROI regressions around each
#'  observations, (3) \code{TStatistic} contains the Student's T-statistic of
#'  each regression coefficient for the ROI regression built around each
#'  observation and (4) \code{pValue} contains the significance probability of
#'  each regression coefficient for the ROI regressions built around each 
#'  observation.  Each element of the list is a matrix the same size as
#'  \code{X}} \item{ROI.Regressions}{A list of elements and outputs from each
#'  individual ROI regression.  These include a matrix of the sites used in each
#'  ROI regression (\code{Sites.Used}, a \code{length(Y)}-by-\code{n} matrix), a
#'  matrix of the geographic distances between the selected sites in each ROI
#'  regression (\code{ Gdist.Used}, a \code{length(Y)}-by-\code{n} matrix), a
#'  matrix of the physiographic distances between the selected sites in each ROI
#'  regression (\code{ Pdist.Used}, a \code{length(Y)}-by-\code{n} matrix), a
#'  matrix of the observations used in each ROI regression (\code{Obs.Used}, a
#'  \code{length(Y)}-by- \code{n} matrix), a matrix of model fits in the region
#'  of influence (\code{Fits}, a \code{length(Y)}-by-\code{n} matrix), a matrix
#'  of model residuals in the region of influence (\code{Residuals}, a
#'  \code{length(Y)}-by-\code{n} matrix), a matrix of leverages for each ROI
#'  regression (\code{Leverage}, a \code{length(Y)}-by-\code{n} matrix), a
#'  matrix of influences for each ROI regression (\code{Influence}, a \code{ 
#'  length(Y)}-by-\code{n} matrix), a logical matrix indicating if the leverage
#'  is significant in each ROI regression (\code{Leverage.Significance}, a
#'  \code{length(Y)} -by-\code{n} matrix), a logical matrix indicating if the
#'  influence is significant in each ROI regression
#'  (\code{Influence.Significance}, a \code{length(Y)} -by-\code{n} matrix), a
#'  vector of critical leverage values for each ROI regression 
#'  (\code{Leverage.Limits}), a vector of critical influence values for each ROI
#'  regression (\code{Influence.Limits}) and a list of performance metrics for
#'  each ROI regression (\code{PerformanceMetrics}).  The last element, 
#'  \code{PerformanceMetrics} is identical to the same output from other 
#'  functions excpet that every element is multiplied by the number of
#'  observations so as to capture the individual performance of each ROI
#'  regression.} \item{ROI.InputParameters}{A list of input parameters to record
#'  the controls on the ROI regression.  \code{D} idicates the limit used in
#'  \dQuote{HRoI}. \code{n} indicates the size of the region of influence. 
#'  \code{ROI} is a string indicating the type of region of influence. 
#'  \code{Legacy} is a logical indicating if the WREG v. 1.05 idiosycrasies were
#'  implemented.}
#'  
#' @examples
#' # Import some example data
#' peakFQdir <- paste0(
#'   file.path(system.file("exampleDirectory", package = "WREG"),
#'     "pfqImport"))
#' gisFilePath <- file.path(peakFQdir, "pfqSiteInfo.txt")
#' importedData <- importPeakFQ(pfqPath = peakFQdir, gisFile = gisFilePath)
#' 
#' # Run a simple regression
#' Y <- importedData$Y$AEP_0.5
#' X <- importedData$X[c("A")]
#' transY <- "none"
#' basinChars <- importedData$BasChars
#' #result <- WREG.OLS(Y, X, transY)
#' 
#' result <- WREG.RoI(Y, X, Reg = "OLS", transY, BasinChars = basinChars,
#'   ROI='GRoI', n = 10L)
#'@export
WREG.RoI <- function(Y,X,Reg,transY=NA,
  recordLengths = NA,LP3 = NA,regSkew=FALSE,
  alpha=0.01,theta=0.98,BasinChars=NA,MSEGR=NA,TY=2,Peak=T,
  ROI=c('PRoI','GRoI','HRoI'),n=NA,D=250,DistMeth=2,Legacy=FALSE) {
  # William Farmer, USGS, January 23, 2015
  # Greg PEtrochenkov, USGS, November 14, 2016 :  Changed validation scheme
  
  warn("clear")
  # Some upfront error handling
  wregValidation((!missing(X)&!missing(Y))&&(length(Y)!=nrow(X)), "eq", FALSE,
                 paste0("The length of Y must be the same as ",
                        "the number of rows in X."), warnFlag = TRUE)
  
  if (!wregValidation(missing(Y), "eq", FALSE,
                      "Dependent variable (Y) must be provided", warnFlag = TRUE)) {
    
    if (!wregValidation(Y, "numeric", message = 
                        "Dependent variable (Y) must be provided as class numeric",
                        warnFlag = TRUE)) {
      
      wregValidation(sum(is.na(Y)), "eq", 0 ,
                     paste0("The depedent variable (Y) contains missing ",
                            "values.  These must be removed."),
                     warnFlag = TRUE)
      
      wregValidation(sum(is.infinite(Y)), "eq", 0 ,
                     paste0("The depedent variable (Y) contains infinite ",
                            "values.  These must be removed."),
                     warnFlag = TRUE)
    }
  }
 
  if (!wregValidation(missing(X), "eq", FALSE,
                      "Independent variables (X) must be provided.", warnFlag = TRUE)) {
    
    if (!wregValidation((length(unique(apply(X,FUN=class,MARGIN=2)))!=1)|
                        (unique(apply(X,FUN=class,MARGIN=2))!="numeric"), "eq", FALSE,
                        "Independent variables (X) must be provided as class numeric.", warnFlag = TRUE)){
      
      wregValidation(sum(is.na(as.matrix(X))), "eq", 0,
                     paste0("Some independent variables (X) contain missing ",
                            "values.  These must be removed."), warnFlag = TRUE)
      
      wregValidation(sum(is.infinite(as.matrix(X))), "eq", 0,
                     paste0("Some independent variables (X) contain infinite ",
                            "values.  These must be removed."), warnFlag = TRUE)
    }
  }
  
  wregValidation(missing(n)|!is.integer(n)|length(n)>1, "eq", FALSE,
                 paste0("n must be provided as a single integer value."), warnFlag = TRUE)
  
  # Error checking LP3
  if (is.element(Reg,c("GLS","GLSskew","WLS"))) {
    
    if (!wregValidation(missing(LP3), "eq", FALSE, "LP3 must be provided as input",
                        warnFlag = TRUE)){
      if (!regSkew){
        
        if (!wregValidation(!is.data.frame(LP3), "eq", FALSE, 
                            paste("LP3 must be provided as a data frame with elements named",
                                  "'S', 'K' and 'G' for standard deivation, deviate and skew,",
                                  "respectively."), warnFlag = TRUE)){
          
          wregValidation(sum(is.element(c("S","K","G"),names(LP3))), "eq", 3, 
                         paste("In valid elements: The names of the elements in LP3 are",
                               names(LP3),". LP3 must be provided as a data frame with elements named",
                               "'S', 'K' and 'G' for standard deivation, deviate and skew,",
                               "respectively."), warnFlag = TRUE)
          
          if(wregValidation((length(unique(apply(cbind(LP3$S,LP3$K,LP3$G),FUN=class,MARGIN=2)))!=1)|
                            (unique(apply(cbind(LP3$S,LP3$K,LP3$G),FUN=class,MARGIN=2))!="numeric"), "eq", FALSE,
                            "LP3 must be provided as a numeric array", warnFlag = TRUE)){
            
            wregValidation(sum(is.infinite(LP3$S),is.infinite(LP3$K),is.infinite(LP3$G)), "eq", 0,
                           "LP3 must be provided as a numeric array", warnFlag = TRUE)
            
            wregValidation(sum(is.na(LP3$S),is.na(LP3$K),is.na(LP3$G)), "eq", 0,
                           paste0("Some elements of LP3$S, LP3$K, and LP3$G contain missing ",
                                  "values.  These must be removed."), warnFlag = TRUE)
          }
        }
      }
        
    } else {
      
      if (!wregValidation(!is.data.frame(LP3), "eq", FALSE, 
                          paste("LP3 must be provided as a data frame with elements named",
                                "'S', 'K' and 'G' for standard deivation, deviate and skew,",
                                "respectively."), warnFlag = TRUE)){
        
        wregValidation(sum(is.element(c("S","K","G","GR"),names(LP3))), "eq", 4, 
                       paste("In valid elements: The names of the elements in LP3 are",
                             names(LP3),". LP3 must be provided as a data frame with elements named",
                             "'S', 'K', 'G' and 'GR' for standard deivation, deviate,",
                             "skew and regional skew, respectively."), warnFlag = TRUE)
        
        if(!wregValidation((length(unique(apply(cbind(LP3$S,LP3$K,LP3$G,LP3$GR),FUN=class,MARGIN=2)))!=1)|
                           (unique(apply(cbind(LP3$S,LP3$K,LP3$G,LP3$GR),FUN=class,MARGIN=2))!="numeric"), "eq", FALSE, 
                           "LP3 must be provided as a numeric array", warnFlag = TRUE)){
          
          wregValidation(sum(is.infinite(LP3$S),is.infinite(LP3$K),
                             is.infinite(LP3$G),is.infinite(LP3$GR)), "eq", 0, 
                         paste0("Some elements of LP3$S, LP3$K, LP3$G and LP3$GR contain ",
                                "infinite values.  These must be removed."), warnFlag = TRUE)
          
          wregValidation(sum(is.na(LP3$S),is.na(LP3$K),is.na(LP3$G),is.na(LP3$GR)), "eq", 0, 
                         paste0("Some elements of LP3$S, LP3$K, LP3$G and LP3$GR contain ",
                                "missing values.  These must be removed."), warnFlag = TRUE)
          
        }
      }
    }

    
    if (!wregValidation(missing(recordLengths), "eq", FALSE,
                        "Record lengths must be provided.", warnFlag = TRUE)) {
      
      if (!wregValidation(recordLengths, "numeric", 
                          message="Record lengths must be provided as class numeric.", warnFlag = TRUE)) {
        
        wregValidation(sum(is.na(c(recordLengths))), "eq", 0, 
                       paste0("Some record lengths contain missing ",
                              "values.  These must be removed."), warnFlag = TRUE)
        
        wregValidation(sum(is.infinite(c(recordLengths))), "eq", 0, 
                       paste0("Some record lengths contain infinite ",
                              "values.  These must be removed."), warnFlag = TRUE)
      }
    }
  }
  
  if (!wregValidation(missing(BasinChars), "eq", FALSE, "BasinChars must be provided as input.",
                        warnFlag = TRUE)) {
  
    if (!wregValidation(!is.data.frame(BasinChars), "eq", FALSE,
                        paste("'BasinChars' must be provided as a data frame with elements",
                              "named 'Station.ID', 'Lat' and 'Long' for standard deivation,",
                              "deviate and skew, respectively."),
                        warnFlag = TRUE)) {
      
      if (!wregValidation(sum(is.element(c("Station.ID","Lat","Long"),names(BasinChars)))!=3, "eq", FALSE,
                          paste("In valid elements: The names of the elements in",
                                      "BasinChars are",names(BasinChars),
                                      ".  'BasinChars' must be provided as a data frame with elements",
                                      "named 'Station.ID', 'Lat' and 'Long' for standard deivation,",
                                      "deviate and skew, respectively."),
                          warnFlag = TRUE)) {

        if (!wregValidation((length(unique(apply(cbind(BasinChars$Lat,BasinChars$Long),FUN=class,MARGIN=2)))!=1)|
                            (unique(apply(cbind(BasinChars$Lat,BasinChars$Long),FUN=class,MARGIN=2))!="numeric"), "eq", FALSE,
                            "latitudes and longitudes must be provided as class numeric.",
                            warnFlag = TRUE)) {
          
          wregValidation(sum(is.na(c(BasinChars$Lat,BasinChars$Long))), "eq", 0,
                          paste0("Some latitudes and longitudes contain missing ",
                                "values.  These must be removed."),
                         warnFlag = TRUE)
          
          wregValidation(sum(is.infinite(c(BasinChars$Lat,BasinChars$Long))), "eq", 0,
                         paste0("Some latitudes and longitudes contain infinite ",
                                "values.  These must be removed."),
                         warnFlag = TRUE)
        }
      }
    }
  }
  
  ## Add controls to meet legacy demands
  wregValidation(!is.logical(Legacy), "eq", FALSE,
                 paste("Legacy must be either TRUE to force matching with previous",
                       "versions or FALSE for correct computations."), warnFlag = TRUE)
  
  ##    NOTE: Legacy forces program to return the same results as WREG v 1.05.
  if (Legacy) { # If legacy is indicated, override custom inputs.
    TY <- 2 # WREG v1.05 does not read this input correctly.
    DistMeth <- 1 # WREG v1.05 uses "Nautical Mile" approximation
  }
  
  if (warn("check")) {
    stop("Invalid inputs were provided.  See warnings().", warn("get"))
  }
  
  
  ## Empty vectors for output
  Sites.Used <- matrix(NA,nrow=length(Y),ncol=n) # Empty matrix.  Sites used for region of influence.
  Gdist.Used <- matrix(NA,nrow=length(Y),ncol=n) # Empty matrix.  Geographic distance to sites used for region of influence.
  Pdist.Used <- matrix(NA,nrow=length(Y),ncol=n) # Empty matrix.  Physiographic distance to sites used for region of influence.
  Obs.Used <- matrix(NA,nrow=length(Y),ncol=n) # Empty matrix.  Observed predictands at sites used for region of influence.
  Fits <- matrix(NA,nrow=length(Y),ncol=n) # Empty matrix.  Model fits at sites used for region of influence.
  Residuals <- matrix(NA,nrow=length(Y),ncol=n) # Empty matrix.  Model residuals at sites used for region of influence.
  Leverage.v <- matrix(NA,nrow=length(Y),ncol=n) # Empty matrix.  Leverage of sites used for region of influence.
  Influence.v <- matrix(NA,nrow=length(Y),ncol=n) # Empty matrix.  Influence of sites used for region of influence.
  Leverage.Significance <- matrix(NA,nrow=length(Y),ncol=n) # Empty matrix.  Logical.  Sites used for region of influence with significant leverage.
  Influence.Significance <- matrix(NA,nrow=length(Y),ncol=n) # Empty matrix.  Logical.  Sites used for region of influence with significant influence.
  Leverage.Limit <- vector(length=length(Y)) # Empty vector.  Critical values of leverage.
  Influence.Limit <- vector(length=length(Y)) # Empty vector.  Critical values of influence.
  Coef.Values <- matrix(NA,nrow=length(Y),ncol=ncol(X)) # Empty matrix.  Coefficents estimated from region of influence.
  Coef.SE <- matrix(NA,nrow=length(Y),ncol=ncol(X)) # Empty matrix.  Standard errors of coefficients estimated from region of influence.
  Coef.T <- matrix(NA,nrow=length(Y),ncol=ncol(X)) # Empty matrix.  T-statistics of coefficients estimated from region of influence.
  Coef.p <- matrix(NA,nrow=length(Y),ncol=ncol(X)) # Empty matrix.  p-values of coefficients estimated from region of influence.
  Y_hat <- vector(length=length(Y)) # Empty vector.  Region-of-Influence model fits.
  ROI.Omegas <- array(dim=c(length(Y),n,n)) # Empty array.  Weighting matrices from region-of-influence regressions.
  ROI.PerfMets <- list(MSE=vector(length=length(Y)),R2=vector(length=length(Y)),R2_adj=vector(length=length(Y)),RMSE=vector(length=length(Y))) # Performance metrics for output (basic, for OLS)
  if (is.element(Reg,c('WLS','GLS','GLSskew'))) { # non-OLS requires additional performance metrics
    ROI.PerfMets$R2_pseudo <- vector(length=length(Y)) # Empty vector.  Pseudo coefficient of determination of the individual ROI regression
    ROI.PerfMets$AVP=vector(length=length(Y)) # Empty vector.  Average variance of prediction for the individual ROI regression
    ROI.PerfMets$Sp=vector(length=length(Y)) # Empty vector.  Standard error of prediction, in percent, for the individual ROI regression
    ROI.PerfMets$VP.PredVar=matrix(NA,nrow=length(Y),ncol=n) # Empty vector.  Individual variances of prediction for the individual ROI regression
    ROI.PerfMets$ModErrVar=vector(length=length(Y)) # Empty vector.  Model-error variance of individual ROI regression
    ROI.PerfMets$StanModErr=vector(length=length(Y)) # Empty vector.  Standardized model-error variance, in percent, of individual ROI regression
  }
  
  ## Determine standard deviations of explanatory variables
  if (length(unique(X[,1]))==1) { # If the first predictor is a constant
    X.nocon <- X[,2:ncol(X)] # remove leading constant from model
  } else {
    X.nocon <- X # there is no leading constant
  }
  SDs <- t(matrix(rep(apply(X.nocon,2,sd),times=length(Y)),ncol=length(Y))) # Standard deviations of each predictor.
  
  ## Cycle through sites.  For each site, determine region, perform regression, save output.
  for (i in 1:length(Y)) { # Loop through sites...
    x0.i <- X[i,] # Predictors at target site
    x0.i.nocon <- t(matrix(unlist(rep(X.nocon[i,],times=length(Y))),ncol=length(Y))) # Predictors at target site, less a constant
    ### Physiographic Euclidian distance
    Pdist <- sqrt(rowSums(((x0.i.nocon-X.nocon)/SDs)^2)) # Euclidian distance in independent-variable space.  Eq 47.
    Pdist[i] <- Inf # To block self identification.
    ### Geographic distance distance
    Gdist <- vector(length=length(Y)) # Empty vector for geographic distances
    for (j in 1:length(Y)) {
      if (i!=j) {
        #### Geographic distance
        Gdist[j] <- Dist.WREG(Lat1=BasinChars$Lat[i],Long1=BasinChars$Long[i],Lat2=BasinChars$Lat[j],Long2=BasinChars$Long[j],method=DistMeth) # Intersite distance, miles
      } else {
        Gdist[j] <- Inf # To block self identification.
      }
    }
    if (Legacy) {
      Gdist <- Gdist*1000/0.6214 # Original matlab code does not specify the units for the limit.  The code implies meters.  Dist.WREG returns miles.
      # BUG: Should correct this so that D is interpretted in miles.
    }
    
    ### Identify region of influence
    if (ROI=='PRoI') {
      #### Physiographic Region-of-Influence
      temp <- sort.int(Pdist,index.return=T)
      NDX <- temp$ix[1:n] # Sites to use in this regression
    } else if (ROI=='GRoI') {
      #### Geographic Region-of-Influence
      temp <- sort.int(Gdist,index.return=T)
      NDX <- temp$ix[1:n] # Sites to use in this regression
    } else if (ROI=='HRoI') {
      #### Hybrid Region-of-Influence
      if (sum(Gdist<=D)<n) { # Not enough sites for hybrid selection; default to GRoI
        temp <- sort.int(Gdist,index.return=T)
        NDX <- temp$ix[1:n] # Sites to use in this regression
      } else { # Conduct hybrid ranking
        ProximateSites <- which(Gdist<=D,arr.ind=T)
        temp <- sort.int(Pdist[ProximateSites],index.return=T)
        NDX <- ProximateSites[temp$ix[1:n]] # Sites to use in this regression
      }
    }
    
    ### Perform regressions
    Y.i <- Y[NDX] # Predictands from region of influence
    X.i <- X[NDX,] # Predictors from region of influence
    if (!is.na(sum(c(recordLengths)))&&is.matrix(recordLengths)) {# GLS
      recordLengths.i <- recordLengths[NDX,NDX] # Record lengths from region of influence
    } else if (!is.na(sum(c(recordLengths)))&&is.vector(recordLengths)) { # WLS
      recordLengths.i <- recordLengths[NDX] # Record lengths from region of influence
    } else {
      recordLengths.i <- recordLengths
    }
    BasinChars.i <- BasinChars[NDX,] # Basin characteristics (IDs, Lat, Long) from region of influence.
    LP3.i <- data.frame(LP3)[NDX,] # LP3 parameters from region of influence
    if (Legacy) { # Apply regressions to match MatLab WREG v 1.05.
      if (Reg=='WLS') { # Use subroutine to correct WLS weights to meet v1.05 idiosyncrasies
        WeightFix <- Omega.WLS.ROImatchMatLab(Y.all=Y,X.all=X,LP3.all=LP3,RecordLengths.all=recordLengths,NDX=NDX)
        Reg.i <- WREG.UW(Y = Y.i, 
          X = as.matrix(X.i , ncol = length(X.i) / length(Y)),
          customWeight = WeightFix, transY, x0 = x0.i)
      } else if (is.element(Reg,c('GLS','GLSskew'))) { # Use subroutine to correct GLS and GLSskew weights to meet v1.05 idiosyncrasies
        # "Corrected" weighting matrix to match MATLAB code.  Returns weighting matrix and var.moderror.k
        WeightFix.k <- Omega.GLS.ROImatchMatLab(alpha=alpha,theta=theta,Independent=BasinChars.i,X=X.i,Y=Y.i,RecordLengths=recordLengths.i,LP3=LP3.i,MSEGR=MSEGR,TY=TY,Peak=Peak,X.all=X,LP3.all=LP3,DistMeth=DistMeth)
        # "Corrected" to match MATLAB code.  Returns weighting matrix and var.moderror.0
        WeightFix.0 <- Omega.GLS.ROImatchMatLab(alpha=alpha,theta=theta,Independent=BasinChars.i,X=matrix(1,ncol=1,nrow=nrow(X.i)),Y=Y.i,RecordLengths=recordLengths.i,LP3=LP3.i,MSEGR=MSEGR,TY=TY,Peak=Peak,X.all=matrix(1,ncol=1,nrow=nrow(X)),LP3.all=LP3,DistMeth=DistMeth)
        WeightFix <- list(Omega=WeightFix.k$Omega,var.modelerror.0=WeightFix.0$GSQ,var.modelerror.k=WeightFix.k$GSQ)
        Reg.i <- WREG.UW(Y = Y.i, 
          X = as.matrix(X.i , ncol = length(X.i) / length(Y)),
          customWeight = WeightFix, transY, x0 = x0.i)
      } else if (Reg=='OLS') { # Apply OLS normally.
        Reg.i <- WREG.OLS(Y = Y.i, 
          X = as.matrix(X.i , ncol = length(X.i) / length(Y)), 
          transY, x0 = x0.i)
      }
    } else { # Aply WREG with "bugs" corrected.
      if (Reg=='WLS') {
        Reg.i <- WREG.WLS(Y = Y.i, 
          X = as.matrix(X.i , ncol = length(X.i) / length(Y)),
          recordLengths = recordLengths.i,
          LP3 = LP3.i, transY, x0 = x0.i)
      } else if (is.element(Reg,c('GLS','GLSskew'))) { 
        Reg.i <- WREG.GLS(Y = Y.i, 
          X = as.matrix(X.i , ncol = length(X.i) / length(Y)),
          recordLengths = recordLengths.i,
          LP3 = LP3.i, transY,
          x0=x0.i, alpha, theta, Peak, distMeth = DistMeth,
          regSkew, MSEGR, TY, legacy=Legacy)
      } else if (Reg=='OLS') {
        Reg.i <- WREG.OLS(Y = Y.i, 
          X = as.matrix(X.i , ncol = length(X.i) / length(Y)), 
          transY, x0 = x0.i)
      }
    }
    
    ### Store outputs
    Y_hat[i] <- Reg.i$Y.ROI # Region-of-Influence model fits.
    Sites.Used[i,] <- NDX # Sites used for region of influence.
    Gdist.Used[i,] <- Gdist[NDX] # Geographic distance to sites used for region of influence.
    Pdist.Used[i,] <- Pdist[NDX] # Physiographic distance to sites used for region of influence.
    Obs.Used[i,] <- Y.i # Observed predictands at sites used for region of influence.
    Fits[i,] <- Reg.i$fitted.values # Model fits at sites used for region of influence.
    Residuals[i,] <- Reg.i$residuals # Model residuals at sites used for region of influence.
    Leverage.v[i,] <- Reg.i$ResLevInf$Leverage # Leverage of sites used for region of influence.
    Influence.v[i,] <- Reg.i$ResLevInf$Influence # Influence of sites used for region of influence.
    Leverage.Significance[i,] <- Reg.i$LevInf.Sig[,1] # Logical.  Sites used for region of influence with significant leverage.
    Influence.Significance[i,] <- Reg.i$LevInf.Sig[,2] # Logical.  Sites used for region of influence with significant influence.
    Leverage.Limit[i] <- Reg.i$LevLim  # Critical values of leverage.
    Influence.Limit[i] <- Reg.i$InflLim  # Critical values of influence.
    Coef.Values[i,] <- Reg.i$Coefs$Coefficient  # Coefficents estimated from region of influence.
    Coef.SE[i,] <- Reg.i$Coefs$'Standard Error' # Standard errors of coefficients estimated from region of influence.
    Coef.T[i,] <- Reg.i$Coefs$tStatistic # T-statistics of coefficients estimated from region of influence.
    Coef.p[i,] <- Reg.i$Coefs$pValue # p-values of coefficients estimated from region of influence.
    ROI.Omegas[i,,] <- Reg.i$Weighting # Weighting matrices from region-of-influence regressions.
    ROI.PerfMets$MSE[i] <- Reg.i$PerformanceMetrics$MSE # Mean squared-error of the individual ROI regression
    ROI.PerfMets$R2[i] <- Reg.i$PerformanceMetrics$R2 # Coefficient of determination of the individual ROI regression
    ROI.PerfMets$R2_adj[i] <- Reg.i$PerformanceMetrics$R2_adj # Adjusted coefficient of determination of the individual ROI regression
    ROI.PerfMets$RMSE[i] <- Reg.i$PerformanceMetrics$RMSE # Root-mean-squared error of the individual ROI regression
    if (is.element(Reg,c('WLS','GLS','GLSskew'))) { # non-OLS requires additional performance metrics
      ROI.PerfMets$R2_pseudo[i] <- Reg.i$PerformanceMetrics$R2_pseudo # Pseudo coefficient of determination of the individual ROI regression
      ROI.PerfMets$AVP[i] <- Reg.i$PerformanceMetrics$AVP # Average variance of prediction for the individual ROI regression
      ROI.PerfMets$Sp[i] <- Reg.i$PerformanceMetrics$Sp # Standard error of prediction, in percent, for the individual ROI regression
      ROI.PerfMets$VP.PredVar[i,] <- Reg.i$PerformanceMetrics$VP.PredVar # Individual variances of prediction for the individual ROI regression
      ROI.PerfMets$ModErrVar[i] <- Reg.i$PerformanceMetrics$ModErrVar # Model-error variance of individual ROI regression
      ROI.PerfMets$StanModErr[i] <- Reg.i$PerformanceMetrics$StanModErr # Standardized model-error variance, in percent, of individual ROI regression
    }
  }
  
  e <- Y-Y_hat # ROI model residuals
  MSE <- sum(e^2)/(nrow(X)-ncol(X)) # Mean square-error of ROI estimates
  if (Legacy) MSE <- mean(e^2) # BUG: ROI code takes mean of squared residuals without accounting for parameters.
  SSR <- sum(e^2) # Residual sum of squares from ROI estimates
  SST <- sum((Y-mean(Y))^2) # Total sum of squares from observations
  R2 <- 1 - SSR/SST # Coefficient of determination of overall ROI estimates
  R2_adj <- 1 - SSR*(nrow(X)-1)/SST/(nrow(X)-ncol(X)) # Adjusted coefficient of determination of overall ROI estimates
  RMSE <-100*sqrt(exp(log(10)*log(10)*MSE)-1) # Root-mean-squared error, in percent, of overall ROI estimates
  PerfMet <- list(MSE=MSE,R2=R2,R2_adj=R2_adj,RMSE=RMSE) # Performance metrics (for output)
  
  
  Output.ROI <- list(fitted.values=Y_hat,residuals=e,
    PerformanceMetrics=PerfMet,
    Coefficients=list(Values=Coef.Values,StanError=Coef.SE,TStatistic=Coef.T,pValue=Coef.p),
    ROI.Regressions=list(Sites.Used=Sites.Used,
      Gdist.Used=Gdist.Used,Pdist.Used=Pdist.Used,
      Obs.Used=Obs.Used,Fits=Fits,Residuals=Residuals,
      Leverage=Leverage.v,Influence=Influence.v,
      Leverage.Significance=Leverage.Significance,
      Influence.Significance=Influence.Significance,
      Leverage.Limits=Leverage.Limit,Influence.Limits=Influence.Limit,
      PerformanceMetrics=ROI.PerfMets),
    ROI.InputParameters=list(D=D,n=n,ROI=ROI,Legacy=Legacy))
  return(Output.ROI)
}
wfarmer-usgs/WREG documentation built on July 24, 2020, 1:28 a.m.