View source: R/georob_exported_functions.R
| georob | R Documentation |
The function georob fits a linear model with spatially correlated
errors to geostatistical data that are possibly contaminated by
independent outliers. The regression coefficients and the parameters of
the variogram model are estimated by robust or Gaussian restricted
maximum likelihood (REML) or by Gaussian maximum likelihood (ML).
georob(formula, data, subset, weights, na.action, model = TRUE,
x = FALSE, y = FALSE, contrasts = NULL, offset, locations,
variogram.model = c("RMexp", "RMaskey", "RMbessel", "RMcauchy",
"RMcircular", "RMcubic", "RMdagum", "RMdampedcos", "RMdewijsian",
"RMfbm", "RMgauss", "RMgencauchy", "RMgenfbm", "RMgengneiting",
"RMgneiting", "RMlgd", "RMmatern", "RMpenta", "RMqexp",
"RMspheric", "RMstable", "RMwave", "RMwhittle"),
param, fit.param = default.fit.param()[names(param)],
aniso = default.aniso(), fit.aniso = default.fit.aniso(),
variogram.object = NULL,
tuning.psi = 2, control = control.georob(),
verbose = 0, ...)
formula |
a symbolic description of the regression model for the
external drift to be fit (mandatory argument). See
|
data |
an optional data frame, a
|
subset |
an optional vector specifying a subset of observations to be used in the fitting process. |
weights |
an optional vector of weights to be used in the fitting process, currently ignored. |
na.action |
a function which indicates what should happen when the
data contain |
model, x, y |
logical scalars. If |
contrasts |
an optional list. See the |
offset |
this optional argument can be used to specify an a
priori known component to be included in the linear predictor during
fitting. An |
locations |
a one-sided formula defining the variables that are used as coordinates of the locations were the data was recorded (mandatory argument). |
variogram.model |
a character keyword defining the variogram model
to be fitted. Currently, most basic variogram models provided formerly
by the now archived package RandomFields can be fitted (see
Details and |
param |
a named numeric vector with initial values of the
variogram parameters (mandatory argument). The names of
|
fit.param |
a named logical vector (or a function such as
|
aniso |
a named numeric vector with initial values (or a function such as
|
fit.aniso |
a named logical vector (or a function such as
|
variogram.object |
an optional list that defines a possibly nested variogram model. Each component is itself a list with the following components:
Note that the arguments |
tuning.psi |
positive numeric. The tuning constant |
control |
a list specifying parameters that control the behaviour of
|
verbose |
positive integer controlling logging of diagnostic
messages to the console during model fitting. |
... |
further arguments passed to function (e.g. |
georob fits a spatial linear model by robust (Künsch et al., 2011, Künsch
et al., in preparation) or Gaussian RE(ML) (Harville, 1977).
georobPackage describes the employed model and briefly
sketches the robust REML estimation and the robust external drift Kriging
method. Here, we describe further details of georob.
Currently, most basic variogram models provided formerly by the now
archived package RandomFields can be fitted by georob (see
argument variogram.model and gencorr for a list of
implemented models). Some of these models have in addition to
variance, snugget, nugget and scale further
parameters. Initial values of these parameters (param) and
fitting flags (fit.param) must be passed to georob by the
same names as used for the models RM... in
gencorr. Use the function param.names to
list additional parameters of a given variogram.model.
The arguments fit.param and fit.aniso are used to control
what variogram and anisotropy parameters are estimated and which are
kept at the constant initial values. The functions
default.fit.param and default.fit.aniso set
reasonable default values for these arguments. Note, as an aside, that
the function default.aniso sets (default) values of the
anisotropy parameters for an isotropic variogram.
The intrinsic variogram model RMfbm is over-parametrized when
both the variance (plus possibly snugget) and the
scale are estimated. Therefore, to estimate the parameters of
this model, scale must be kept fixed at an arbitrary value by
using fit.param["scale"] = FALSE.
The subsection Model of georobPackage describes
how such models are parametrized and gives definitions the various
elements of aniso. Some additional remarks might be helpful:
The first semi-principal axis points into the direction with
the farthest reaching auto-correlation, which is described by the range
parameter scale (\alpha).
The ranges in the direction of the second and third
semi-principal axes are given by f_1\alpha and f_2
\alpha, with 0 < f_2 \leq f_1 \leq 1.
The default values for aniso (f_1=1, f_2=1)
define an isotropic variogram model.
Valid ranges for the angles characterizing the orientation of
the semi-variance ellipsoid are (in degrees): \omega [0, 180],
\phi [0, 180], \zeta [-90, 90].
Simultaneous estimation of the variance of the micro-scale variation
(snugget, \sigma_\mathrm{n}^2), appears seemingly
as spatially uncorrelated with a given sampling design, and of the
variance (nugget, \tau^2) of the independent errors
requires that for some locations
\boldsymbol{s}_i replicated observations are
available. Locations less or equal than zero.dist apart are
thereby considered as being coincident (see
control.georob).
Parameters of variogram models can vary only within certain bounds (see
param.bounds and gencorr
for allowed ranges). georob uses three mechanisms to constrain
parameter estimates to permissible ranges:
Parameter transformations: By default, all variance
(variance, snugget, nugget), the range
scale, the anisotropy parameters f1 and f2 and
many of the additional parameters are log-transformed before
solving the estimating equations or maximizing the restricted
log-likelihood and this warrants that the estimates are always
positive (see control.georob for detailed explanations
how to control parameter transformations).
Checking permissible ranges: The additional parameters
of the variogram models such as the smoothness parameter \nu
of the Whittle-Matérn model are forced to stay in the permissible
ranges by signalling an error to nleqslv, nlminb or
optim if the current trial values are invalid. These
functions then graciously update the trial values of the parameters
and carry their task on. However, it is clear that such a
procedure likely gets stuck at a point on the boundary of the
parameter space and is therefore just a workaround for avoiding
runtime errors due to invalid parameter values.
Exploiting the functionality of nlminb and
optim: If a spatial model is fitted non-robustly, then the
arguments lower, upper (and method of
optim) can be used to constrain the parameters
(see control.optim how to pass them to optim).
For optim one has to use the arguments method =
"L-BFGS-B", lower = l, upper = u, where
l and u are numeric vectors with the lower and upper
bounds of the transformed parameters in the order as they
appear in
c(variance, snugget, nugget, scale,
...)[fit.param], aniso[fit.aniso]),
where ... are
additional parameters of isotropic variogram models (use
param.names(variogram.model) to display the names and the
order of the additional parameters for variogram.model).
For nlminb one has to use the arguments lower =
l, upper = u, where l and u are
numeric vectors as described above.
To solve the robustified estimating equations for
\boldsymbol{B} and
\boldsymbol{\beta} the following initial
estimates are used:
\widehat{\boldsymbol{B}}=
\boldsymbol{0}, if this turns out to be
infeasible, initial values can be passed to georob by the
argument bhat of control.georob.
\widehat{\boldsymbol{\beta}} is
either estimated robustly by the function
lmrob, rq or
non-robustly by lm (see argument
initial.fixef of control.georob).
Finding the roots of the robustified estimating equations of the
variogram and anisotropy parameters is more sensitive to a good choice
of initial values than maximizing the Gaussian (restricted)
log-likelihood with respect to the same parameters. If the initial
values for param and aniso are not sufficiently close to
the roots of the system of nonlinear equations, then
nleqslv may fail to find them.
Setting initial.param = TRUE (see control.georob)
allows one to find initial values that are
often sufficiently close to the roots so that
nleqslv converges. This is achieved by:
Initial values of the regression parameters are computed by
lmrob irrespective of the choice for
initial.fixef (see control.georob).
Observations with “robustness weights” of the
lmrob fit, satisfying
\psi_c(\widehat{\varepsilon}_i/\widehat{\tau})/(\widehat{\varepsilon}_i/\widehat{\tau})
\leq \mbox{\code{min.rweight}}, are discarded (see
control.georob).
The model is fit to the pruned data set by Gaussian REML using
nlminb or optim.
The resulting estimates of the variogram parameters
(param, aniso) are used as initial estimates for the
subsequent robust fit of the model by nleqslv.
Note that for step 3 above, initial values of param and
aniso must be provided to georob.
Unlike robust REML, where robustified estimating equations are solved
for the variance parameters nugget (\tau^2),
variance (\sigma^2), and possibly snugget
(\sigma_{\mathrm{n}}^2), for Gaussian (RE)ML the
variances can be re-parametrized to
the signal variance
\sigma_B^2 = \sigma^2 + \sigma_{\mathrm{n}}^2,
the inverse relative nugget
\eta = \sigma_B^2 / \tau^2 and
the relative auto-correlated signal variance
\xi = \sigma^2/\sigma_B^2.
georob maximizes then a (restricted) profile
log-likelihood that depends only on \eta, \xi,
\alpha, ..., and \sigma_B^2 is estimated by an explicit
expression that depends on these parameters (e.g. Diggle and
Ribeiro, 2006, p. 113). This is usually more efficient than
maximizing the (restricted) log-likelihood with respect to the original
variance parameters \tau^2,
\sigma_{\mathrm{n}}^2 and \sigma^2.
georob chooses the parametrization automatically, but the user
can control it by the argument reparam of the function
control.georob.
An object of class georob representing a robust (or Gaussian) (RE)ML
fit of a spatial linear model. See
georobObject for the components of the fit.
Andreas Papritz papritz@retired.ethz.ch
with contributions by Cornelia Schwierz.
Diggle, P. J. and Ribeiro, P. J. R. (2006) Model-based Geostatistics. Springer, New York, \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1007/978-0-387-48536-2")}.
Harville, D. A. (1977) Maximum likelihood approaches to variance component estimation and to related problems, Journal of the American Statistical Association, 72, 320–340, \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1080/01621459.1977.10480998")}.
Künsch, H. R., Papritz, A., Schwierz, C. and Stahel, W. A. (in preparation) Robust Geostatistics.
Künsch, H. R., Papritz, A., Schwierz, C. and Stahel, W. A. (2011) Robust estimation of the external drift and the variogram of spatial data. Proceedings of the ISI 58th World Statistics Congress of the International Statistical Institute. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.3929/ethz-a-009900710")}
georobPackage for a description of the model and a brief summary of the algorithms;
georobObject for a description of the class georob;
profilelogLik for computing profiles of Gaussian likelihoods;
plot.georob for display of RE(ML) variogram estimates;
control.georob for controlling the behaviour of georob;
georobModelBuilding for stepwise building models of class georob;
cv.georob for assessing the goodness of a fit by georob;
georobMethods for further methods for the class georob;
predict.georob for computing robust Kriging predictions;
lgnpp for unbiased back-transformation of Kriging prediction
of log-transformed data;
georobSimulation for simulating realizations of a Gaussian process
from model fitted by georob; and finally
sample.variogram and fit.variogram.model
for robust estimation and modelling of sample variograms.
################
## meuse data ##
################
data(meuse)
## Gaussian REML fit
r.logzn.reml <- georob(log(zinc) ~ sqrt(dist), data = meuse, locations = ~ x + y,
variogram.model = "RMexp",
param = c(variance = 0.15, nugget = 0.05, scale = 200),
tuning.psi = 1000)
summary(r.logzn.reml, correlation = TRUE)
plot(r.logzn.reml, lag.dist.def = seq(0, 2000, by = 100))
## robust REML fit
if(interactive()){
## example is run only in interactive session because cpu times exceeds 5 s
r.logzn.rob <- update(r.logzn.reml, tuning.psi = 1)
summary(r.logzn.rob, correlation = TRUE)
lines(r.logzn.rob, col = "red")
}
###################
## wolfcamp data ##
###################
data(wolfcamp)
## fitting isotropic IRF(0) model
r.irf0.iso <- georob(pressure ~ 1, data = wolfcamp, locations = ~ x + y,
variogram.model = "RMfbm",
param = c(variance = 10, nugget = 1500, scale = 1, alpha = 1.5),
fit.param = default.fit.param(scale = FALSE, alpha = TRUE),
tuning.psi = 1000)
summary(r.irf0.iso)
plot(r.irf0.iso, lag.dist.def = seq(0, 200, by = 7.5))
## fitting anisotropic IRF(0) model
if(interactive()){
## example is run only in interactive session because cpu times exceeds 5 s
r.irf0.aniso <- georob(pressure ~ 1, data = wolfcamp, locations = ~ x + y,
variogram.model = "RMfbm",
param = c(variance = 5.9, nugget = 1450, scale = 1, alpha = 1),
fit.param = default.fit.param(scale = FALSE, alpha = TRUE),
aniso = default.aniso(f1 = 0.51, omega = 148.),
fit.aniso = default.fit.aniso(f1 = TRUE, omega = TRUE),
tuning.psi = 1000)
summary(r.irf0.aniso)
plot(r.irf0.aniso, lag.dist.def = seq(0, 200, by = 7.5),
xy.angle.def = c(0, 22.5, 67.5, 112.5, 157.5, 180.),
add = TRUE, col = 2:5)
pchisq(2*(r.irf0.aniso[["loglik"]] - r.irf0.iso[["loglik"]]), 2, lower = FALSE)
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.