georob: Robust Fitting of Spatial Linear Models

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

The function georob fits a linear model with spatially correlated errors to geostatistical data that are possibly contaminated by independent outliers. The regression coefficients and the parameters of the variogram model are estimated by robust or Gaussian Restricted Maximum Likelihood (REML).

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
georob(formula, data, subset, weights, na.action, model = TRUE, 
    x = FALSE, y = FALSE, contrasts = NULL, offset, locations, 
    variogram.model = c( "RMexp", "RMbessel", "RMcauchy", 
      "RMcircular", "RMcubic", "RMdagum", "RMdampedcos", "RMdewijsian", "RMfbm",
      "RMgauss", "RMgenfbm", "RMgencauchy", "RMgengneiting", "RMgneiting", "RMlgd",
      "RMmatern", "RMpenta", "RMaskey", "RMqexp", "RMspheric", "RMstable",
      "RMwave", "RMwhittle"
    ), 
    param, 
    fit.param = c( variance = TRUE, snugget = FALSE, nugget = TRUE, scale = TRUE, 
      alpha = FALSE, beta = FALSE, delta = FALSE, 
      gamma = FALSE, kappa = FALSE, lambda = FALSE, mu = FALSE, nu = FALSE
    )[names(param)], 
    aniso = c(f1 = 1, f2 = 1, omega = 90, phi = 90, zeta = 0), 
    fit.aniso = c(f1 = FALSE, f2 = FALSE, omega = FALSE, 
        phi = FALSE, zeta = FALSE), 
    tuning.psi = 2, initial.param = c("exclude", "minimize", "no"), 

    control = georob.control(...),
    verbose = 0, ...)

Arguments

formula

a symbolic description of the regression model to be fit. See lm and formula for more details.

data

an optional data frame, a SpatialPointsDataFrame, list or environment (or another object coercible by as.data.frame to a data frame) containing the variables in the model and the coordinates where the data was recorded. If not found in data, the variables are taken from environment(formula), typically the environment from which georob is called.

subset

an optional vector specifying a subset of observations to be used in the fitting process.

weights

an optional vector of weights to be used in the fitting process, currently ignored.

na.action

a function which indicates what should happen when the data contain NAs. The default is set by the na.action argument of options, and is na.fail if that is unset. The “factory-fresh” default is na.omit. Another possible value is NULL, no action. Value na.exclude can be useful.

model, x, y

logicals. If TRUE the corresponding components of the fit (the model frame, the model matrix, the response) are returned. The model frame is augmented by the coordinates.

contrasts

an optional list. See the contrasts.arg of model.matrix.default.

offset

this can be used to specify an a priori known component to be included in the linear predictor during fitting. An offset term can be included in the formula instead or as well, and if both are specified their sum is used.

locations

a one-sided formula defining the variables that are used as coordinates of the locations were the data was recorded.

variogram.model

a character keyword defining the variogram model to be fitted. Currently, most basic variogram models provided by the package RandomFields can be fitted (see Details and Variogram).

param

a named numeric vector with initial values of the variogram parameters. The following parameter names are allowed (see Details and georobIntro for information about the parametrization of variogram models):

  • variance: variance (sill σ^2) of the auto-correlated component of the Gaussian random field B(s).

  • snugget: variance (spatial nugget σ^2_n) of the seemingly spatially uncorrelated component of B(s) (micro-scale spatial variation; default value
    snugget = 0).

  • nugget: variance (nugget τ^2) of the independent errors ε(s).

  • scale: range parameter (α) of the variogram.

  • names of additional variogram parameters such as the smoothness parameter ν of the Whittle-Matérn model as passed by the argument param to Variogram, see param.names.

fit.param

a named logical vector with the same names as used for param, defining which parameters are adjusted (TRUE) and which are kept fixed at their initial values (FALSE) when fitting the model.

aniso

a named numeric vector with initial values for fitting geometrically anisotropic variogram models. The following parameter names are allowed (see Details and georobIntro for information about the parametrization of variogram models):

  • f1: ratio f_1 of lengths of second and first semi-principal axes of an ellipsoidal surface with constant semivariance in R^3 (default f1 = 1).

  • f2: ratio f_2 of lengths of third and first semi-principal axes of the semivariance ellipsoid (default f2 = 1).

  • omega: azimuth in degrees of the first semi-principal axis of the semivariance ellipsoid (default omega = 90).

  • phi: 90 degrees minus altitude of the first semi-principal axis of the semivariance ellipsoid (default phi = 90).

  • zeta: angle in degrees between the second semi-principal axis and the direction of the line defined by the intersection between the x-y-plane and the plane orthogonal to the first semi-principal axis of the semivariance ellipsoid through the origin (default zeta = 0).

fit.aniso

a named logical vector with the same names as used for aniso, defining which parameters are adjusted (TRUE) and which are kept fixed at their initial values (FALSE) when fitting the model.

tuning.psi

positive numeric. The tuning constant c of the ψ_c-function of the robust REML algorithm.

initial.param

character, controlling whether initial values of parameters are computed for solving the estimating equations of the variogram and anisotropy parameters.

If initial.param = "minimize" (defaullt) robust initial values are computed by minimizing the sum of the squared robustified estimating equations using optim (see Details). If initial.param = "exclude" robust initial values of parameters are computed by discarding outlying observations based on the “robustness weights” of the initial fit of the regression model by lmrob and fitting the spatial linear model by Gaussian REML to the pruned data set (see Details). For initial.param = "no" no initial parameter values are computed and the estimating equations are solved with the initial values passed by param and aniso to georob.

control

a list specifying parameters that control the behaviour of georob. Use the function georob.control and see its help page for the components of control.

verbose

positive integer controlling logging of diagnostic messages to the console during model fitting. verbose = 0 largely suppresses such messages and verbose = 4 asks for most verbose output (see control arguments of nleqslv and optim and georob.control for information how to fine tuning diagnostic output generated by nleqslv and optim).

...

can be used to specify list components of control directly.

Details

georob fits a spatial linear model by robust or Gaussian REML (Kuensch et al., 2011, Kuensch et al., in preparation). georobIntro describes the employed model and briefly sketches the robust REML estimation and the robust external-drift kriging method. Here, we describe further details of georob.

Implemented variogram models

Currently, all except one (model hyperbolic) basic variogram models of the package RandomFields, valid in R^d, d > 1, can be fitted by georob (see argument variogram.models for a list of implemented models). Some of these models have in addition to variance, snugget, nugget and scale further parameters. Initial values of these parameters (param) and fitting flags (fit.param) must be passed to georob by the same names as used for the argument param of Variogram. Use the function param.names to list additional parameters of a given variogram.model.

Estimation of variance of micro-scale variation

Simultaneous estimation of the variance of the micro-scale variation (snugget, sigma^2_n), which appears as seemingly uncorrelated with a given sampling design, and of the variance (nugget, τ^2) of the independent errors requires that for some locations s_i replicated observations are available. Locations less or equal than zero.dist apart are thereby considered as being coincient (see georob.control).

Fitting intrinsic variogram models

The intrinsic variogram model fractalB is overparametrized when both the variance (plus possibly snugget) and the scale are fitted. Therefore, to estimate the parameters of this model scale must be kept fixed at an arbitray value by using fit.param["scale"] = FALSE.

Fitting geometrically anisotropic variogram models

The subsection Model of georobIntro describes how such models are parametrized and gives definitions the various elements of aniso. Some additional remarks might be helpful:

Constraining estimates of variogram parameters

Parameters of variogram models can vary only within certain bounds (see param.bounds and Variogram for allowed ranges). georob uses three mechanisms to constrain parameter estimates to permissible ranges:

  1. Parameter transformations: By default, all variance (variance, snugget, nugget), the range scale and the anisotropy parameters f1 and f2 are log-transformed before solving the estimating equations or maximizing the restricted loglikelihood and this warrants that the estimates are always positive (see georob.control for controlling parameter transformations).

  2. Checking permissible ranges: The additional parameters of the variogram models such as the smoothness parameter ν of the Whittle-Mat\'ern model are forced to stay in the permissible ranges by signalling an error to nleqslv or optim if the current trial values are invalid. These functions then graciously update the trial values of the parameters and carry their task on. However, it is clear that such a procedure likely gets stuck at a point on the boundary of the parameter space and is therefore just a workaround for avoiding runtime errors due to invalid parameter values.

  3. Exploiting the functionality of optim: If a spatial model is fitted non-robustly, then the arguments lower, upper and method of optim can be used to constrain the parameters (see optim.control how to pass them to optim). To achieve this one has to use the arguments method = "L-BFGS-B", lower = l, upper = u, where l and u are numeric vectors with the lower and upper bounds of the transformed parameters in the order as they appear in
    c( c(variance, snugget, nugget, scale, ...)[fit.param], aniso[fit.aniso]),
    where ... are additional parameters of isotropic variogram models (use
    param.names(variogram.model) to display the names and the order of the additional parameters for variogram.model).

Computing robust initial estimates of parameters for robust REML

To solve the robustified estimating equations for B and β the following initial estimates are used:

Finding the roots of the robustified estimating equations of the variogram and anisotropy parameters is more sensitive to a good choice of initial values than maximizing the Gaussian restricted loglikelihood with respect to the same parameters. Two options are implemented to get good initial values that are often sufficiently close to the roots so that nleqslv converges:

Setting initial.param = "minimize" invokes optim to minimize the sum of squared estimating equations. The required accuracy of the initial estimates are best controlled by the argument abstol of optim, e.g. by using the argument control = georob.control(optim = optim.control(optim.control = list(abstol = 1.e-6))).

Setting initial.param = "exclude" has the following effects:

  1. Initial values of the regression parameters are computed by lmrob irrespective of the choice for initial.method (see georob.control).

  2. Observations with “robustness weights” of the lmrob fit, satisfying
    ψ_c(r_i/s)/(r_i/s) <=min.rweight, are discarded (see georob.control).

  3. The model is fit to the pruned data set by Gaussian REML using optim.

  4. The resulting estimates of the variogram parameters (param, aniso) are used as initial estimates for the subsequent robust fit of the model by nleqslv.

Value

An object of class georob representing a robust (or Gaussian) REML fit of a spatial linear model. See georobObject for the components of the fit.

Author(s)

Andreas Papritz andreas.papritz@env.ethz.ch
http://www.step.ethz.ch/people/scientific-staff/andreas-papritz
with contributions by Cornelia Schwierz.

References

Kuensch, H. R., Papritz, A., Schwierz, C. and Stahel, W. A. (in preparation) Robust Geostatistics.

Kuensch, H. R., Papritz, A., Schwierz, C. and Stahel, W. A. (2011) Robust estimation of the external drift and the variogram of spatial data. Proceedings of the ISI 58th World Statistics Congress of the International Statistical Institute. http://e-collection.library.ethz.ch/eserv/eth:7080/eth-7080-01.pdf

See Also

georobIntro for a description of the model and a brief summary of the algorithms; georobObject for a description of the class georob; plot.georob for display of REML variogram estimates; georob.control for controlling the behaviour of georob; cv.georob for assessing the goodness of a fit by georob; predict.georob for computing robust kriging predictions; and finally georobModelBuilding for stepwise building models of class georob; georobMethods for further methods for the class georob.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
## Not run: 
################
## meuse data ##
################
data( meuse )

## Gaussian REML fit
r.logzn.reml <- georob(log(zinc) ~ sqrt(dist), data = meuse, locations = ~ x + y,
    variogram.model = "exponential",
    param = c( variance = 0.15, nugget = 0.05, scale = 200 ),
    tuning.psi = 1000)
summary(r.logzn.reml, correlation = TRUE)

## robust REML fit 
r.logzn.rob <- update(r.logzn.reml, tuning.psi = 1)
    
summary(r.logzn.rob, correlation = TRUE)

plot(r.logzn.reml, lag.class.def = seq( 0, 2000, by = 100 ))
lines(r.logzn.rob, col = "red")


###################
## wolfcamp data ##
###################
data(wolfcamp, package = "geoR")
d.wolfcamp <- data.frame(x = wolfcamp[[1]][,1], y = wolfcamp[[1]][,2],
    pressure = wolfcamp[[2]])

## fitting isotropic IRF(0) model
  
r.irf0.iso <- georob(pressure ~ 1, data = d.wolfcamp, locations = ~ x + y, 
    variogram.model = "fractalB",
    param = c( variance = 10, nugget = 1500, scale = 1, alpha = 1.5 ),
    fit.param = c( variance = TRUE, nugget = TRUE, scale = FALSE, alpha = TRUE),
    tuning.psi = 1000)
  
summary(r.irf0.iso)

## fitting isotropic IRF(0) model
  
r.irf0.aniso <- georob(pressure ~ 1, data = d.wolfcamp, locations = ~ x + y, 
    variogram.model = "fractalB",
    param = c( variance = 5.9, nugget = 1450, scale = 1, alpha = 1 ),
    fit.param = c( variance = TRUE, nugget = TRUE, scale = FALSE, alpha = TRUE),
    aniso = c( f1 = 0.51, f2 = 1, omega = 148, phi = 90, zeta = 0 ),
    fit.aniso = c( f1 = TRUE, f2 = FALSE, omega = TRUE, phi = FALSE, zeta = FALSE ), 
    tuning.psi = 1000)
summary(r.irf0.aniso)

plot(r.irf0.iso, lag.class.def = seq(0, 200, by = 7.5))
plot(r.irf0.aniso, lag.class.def = seq(0, 200, by = 7.5), 
    xy.angle.def = c(0, 22.5, 67.5, 112.5, 157.5, 180.), 
    add = TRUE, col = 2:5)
    
pchisq( 2*(r.irf0.aniso[["loglik"]] - r.irf0.iso[["loglik"]]), 2, lower = FALSE )
## End(Not run)

georob documentation built on May 2, 2019, 6:53 p.m.