#' Guess the parameters of a spatial covariance function
#'
#' @description
#' Guess the parameters of the spatial covariance function of a random, regionalized variable. A
#' guess of such parameters is required to start the fitting functions of many geostatistical
#' packages such as __gstat__, __geoR__, and __georob__.
#'
#' @inheritParams variogramBins
#'
#' @param z Numeric vector with the values of the regionalized variable for which the values for the
#' spatial covariance parameters should be guessed.
#'
#' @param lags Numeric scalar defining the width of the variogram bins or a numeric vector with the
#' lower and upper bounds of the variogram bins. If missing, the variogram bins are computed using
#' [pedometrics::variogramBins()]. See \sQuote{Details} for more information.
#'
#' @param method Character keyword defining the method used for guessing the spatial covariance
#' parameters of the regionalized variable. Defaults to `method = "a"`. See \sQuote{Details} for
#' more information.
#'
#' @param min.npairs Positive integer defining the minimum number of point-pairs required so that a
#' variogram bin is used to guessing the spatial covariance parameters of the of the regionalized
#' variable. Defaults to `min.npairs = 30`.
#'
#' @param model Character keyword defining the spatial covariance function that will be fitted to
#' the data of the regionalized variable. Currently, most of the basic spatial covariance function
#' are accepted. See `geoR::cov.spatial()` for more information. Defaults to `model = "matern"`.
#'
#' @param nu Numerical value for the additional smoothness parameter \eqn{\nu} of the spatial
#' covariance function of the regionalized variable. See `RandomFields::RMmodel()` and argument
#' `kappa` of `geoR::cov.spatial()` for more information.
#'
#' @param plotit Should the guessed spatial covariance parameters be plotted along with the sample
#' (experimental) variogram of the regionalized variable? Defaults to `plotit = FALSE`.
#'
#' @param estimator Character keyword defining the estimator for computing the sample (experimental)
#' variogram of the regionalized variable, with options `"qn"` (default), `"mad"`, `"matheron"`, and
#' `"ch"`. See `georob::sample.variogram()` for more details.
#'
#' @return
#' A vector of numerical values, the guesses for the spatial covariance parameters of the
#' regionalized variable:
#'
#' * nugget
#' * partial sill
#' * range
#'
#' @details
#' There are five methods two guess the covariance parameters. Two of them, `"a"` and `"c"`, rely on
#' a sample variogram with exponentially growing variogram bins, while the other three, `"b"`,
#' `"d"`, and `"e"`, use equal-width variogram bins (see [pedometrics::variogramBins()]). All of
#' them are [heuristic](https://en.wikipedia.org/wiki/Heuristic).
#'
#' Method `"a"` was developed in-house and is the most elaborated of them, specially for guessing
#' the nugget variance.
#'
#' Method `"b"` was proposed by \doi{10.1016/0098-3004(95)00095-X}{Jian et al. (1996)} and
#' is implemented in \href{https://support.sas.com/documentation/cdl/en/statug/63347/HTML/default/viewer.htm#statug_variogram_a0000000593.htm}{SAS/STAT(R) 9.22}.
#'
#' Method `"c"` is implemented in the __automap__-package and was developed by
#' \doi{10.1016/j.cageo.2008.10.011}{Hiemstra et al. (2009)}.
#'
#' Method `"d"` was developed by \doi{10.1007/s11004-012-9434-1}{Desassis & Renard (2012)}.
#'
#' Method `"e"` was proposed by Larrondo et al. (2003)
#' <http://www.ccgalberta.com/ccgresources/report05/2003-122-varfit.pdf> and is implemented in the
#' VARFIT module of GSLIB <http://www.gslib.com/>.
#'
#' @references
#' Desassis, N. & Renard, D. Automatic variogram modelling by iterative least squares: univariate
#' and multivariate cases. _Mathematical Geosciences_. Springer Science + Business Media, v. 45, p.
#' 453-470, 2012.
#'
#' Hiemstra, P. H.; Pebesma, E. J.; Twenhöfel, C. J. & Heuvelink, G. B. Real-time automatic
#' interpolation of ambient gamma dose rates from the Dutch radioactivity monitoring network.
#' _Computers & Geosciences_. Elsevier BV, v. 35, p. 1711-1721, 2009.
#'
#' Jian, X.; Olea, R. A. & Yu, Y.-S. Semivariogram modelling by weighted least squares. _Computers &
#' Geosciences_. Elsevier BV, v. 22, p. 387-397, 1996.
#'
#' Larrondo, P. F.; Neufeld, C. T. & Deutsch, C. V. _VARFIT: a program for semi-automatic variogram
#' modelling_. Edmonton: Department of Civil and Environmental Engineering, University of Alberta,
#' p. 17, 2003.
#'
# @note
# Package __geoR__ is used to guess the range (scale) parameter of the following spatial covariance
# functions: "matern" (except when `nu = 0.5`), "powered.exponential", "stable", "cauchy",
# "gencauchy", "gneiting", and "gneiting.matern". The practical range is the distance at which the
# semivariance reaches its maximum, i.e. the sill.
#'
#' @section Dependencies:
# The __geoR__ package, provider of functions for the analysis of geostatistical data in R, is
# required for [pedometrics::variogramGuess()] to work. The development version of the __geoR__
# package is available on <http://www.leg.ufpr.br/geoR/> while its old versions are available on
# the CRAN archive at <https://cran.r-project.org/src/contrib/Archive/geoR/>.
#'
#' The __georob__ package, provider of functions for the robust geostatistical analysis of spatial
#' data in R, is required for [pedometrics::variogramGuess()] to work. The old versions of the
#' __georob__ package are available on the CRAN archive at
#' <https://cran.r-project.org/src/contrib/Archive/georob/>.
#'
#' @author Alessandro Samuel-Rosa \email{alessandrosamuelrosa@@gmail.com}
#'
#' @seealso [pedometrics::variogramBins()]
#'
#' @examples
# if (all(c(require(sp), require(georob), require(geoR)))) {
#' if (all(c(require(sp), require(georob)))) {
#' data(meuse, package = "sp")
#' icp <- variogramGuess(z = log(meuse$copper), coords = meuse[, 1:2])
#' }
#' @concept variogram
# FUNCTION - MAIN ##################################################################################
#' @export
#' @rdname variogramGuess
variogramGuess <-
function(z, coords, lags, cutoff = 0.5, method = "a", min.npairs = 30, model = "matern", nu = 0.5,
estimator = "qn", plotit = FALSE) {
# check if suggested packages are installed
if (!requireNamespace("georob")) stop("georob package is missing")
# if (!requireNamespace("geoR")) stop("geoR package is missing") # MOVED TO
# check function arguments
cov_models <- c(
"matern", "exponential", "gaussian", "spherical", "circular", "cubic", "wave", "linear",
"power", "powered.exponential", "stable", "cauchy", "gencauchy", "gneiting",
"gneiting.matern", "pure.nugget")
if (!model %in% cov_models) {
stop(paste("model '", model, "' is not implemented", sep = ""))
}
# Check lags and max.dist
if (missing(lags)) {
if (method %in% c("a", "c")) {
lags <- variogramBins(coords, cutoff = cutoff)
} else {
lags <- variogramBins(coords, n.lags = 15, type = "equi", cutoff = cutoff)
}
}
cutoff <- sqrt(
sum(apply(apply(coords[, 1:2], 2, range), 2, diff) ^ 2)) * cutoff
# Compute variogram
v <- georob::sample.variogram(
object = z, locations = coords, lag.dist.def = lags, max.lag = cutoff, estimator = estimator)
lags0 <- length(v$lag.dist)
# Merge lag-distance classes that have too few point-pairs
if (any(v$npairs < min.npairs)) {
message("correcting lags for minimum number of point-pairs...")
idx <- which(v$npairs < min.npairs) + 1
while (length(idx) >= 1) {
lags <- lags[-idx[1]]
v <- georob::sample.variogram(
object = z, locations = coords, lag.dist.def = lags, max.lag = cutoff,
estimator = estimator)
idx <- which(v$npairs < min.npairs) + 1
}
}
if (plotit) {
graphics::plot(v)
}
# SILL -----------------------------------------------------------------------------------------
# The initial guess for the sill should be the easiest among all
# parameters needed to fit a covariance model. Several rules are used in the
# literature:
# 1) the average of the maximum semivariance and the median semivariance in
# the sample variogram (Hiemstra et al., 2009);
# 2) the average of all the experimental points (Larrondo et al., 2003);
# 3) the average semivariance of the three last lag-distance classes
# (Jian et al., 1996)
# 4) the total variance (Desassis & Renard, 2012).
# I define the partial sill as the difference between the variance
# of the data minus the nugget variance.
sill <- switch(
method,
a = { # Samuel-Rosa
# The average of the variance of the data and gamma at the two last
# lag-distance classes.
mean(c(stats::var(z), v$gamma[c(length(v$gamma), length(v$gamma) - 1)]))
},
b = { # JianEtAl1996
mean(v$gamma[seq(length(v$gamma) - 2, length(v$gamma))])
},
c = { # HiemstraEtAl2009
mean(c(max(v$gamma), stats::median(v$gamma)))
},
d = { # DesassisEtAl2012
stats::var(z)
},
e = { # LarrondoEtAl2003
mean(v$gamma)
}
)
if (plotit) {
graphics::abline(h = sill, lty = "dashed")
}
# RANGE ----------------------------------------------------------------------------------------
# In general, the initial guess for the range (scale) parameter is made based on the lag-distance classes
# and on the dimensions of the study area. A common rule is to compute the initial range as half the
# maximum distance up to which lag-distance classes have been defined (Jian et al., 1996; Larrondo et al.,
# 2003; Desassis & Renard, 2012). Others set the initial range to a proportion of the diagonal of the study
# area, say 0.1, 0.35 or 0.5 (Hiemstra et al., 2009). This is rather arbitrary: the lag-distance classes
# usually are defined by some automatic procedure implemented in the software being used, which does not
# account for the features of the data that is being analysed.
# I propose using the estimate of the total variance and semivariance of the variable to make an initial
# guess for the range parameter. The total variance is used here because it is the initial guess of the
# total sill (see bellow).
# I start computing the absolute difference between the total variance and the semivariogram in each
# lag-distance class (except for the first). Then, I record the index of the lag-distance class where the
# semivariance is closest to the variance. The separation distance at the centre of this lag-distance class
# is recorded. This value can be taken to be approximately equivalent to the practical range. Thus, it is
# corrected using geoR::practicalRange.
# geoR IS ARCHIVED FROM TIME TO TIME... IT MUST BE REMOVED FRO HERE!!!
range <- switch(
method,
a = { # Samuel-Rosa2015
v$lag.dist[which.min(abs(v$gamma[-c(1, length(v$gamma))] - sill)) + 1]
},
b = { # JianEtAl1996
lags[length(lags)] * 0.5
},
c = { # HiemstraEtAl2009
sqrt(sum(apply(apply(coords, 2, range), 2, diff) ^ 2)) * 0.1
},
d = { # DesassisEtAl2012
lags[length(lags)] * 0.5
},
e = { # LarrondoEtAl2003
lags[length(lags)]
}
)
# Correct initial guess of the range parameter
if (model %in% c('linear', 'power')) {
range <- Inf
} else if (model == 'pure.nugget') {
range <- 0
} else if (model == 'exponential' | model == 'matern' & nu == 0.5) {
range <- range / 2.995731
} else if (model %in% c('spherical', 'circular', 'cubic')) {
range <- range
} else if (model == 'gaussian') {
range <- range / 1.730818
} else if (model == 'wave') {
range <- range / 2.991457
} else {
# "matern", "powered.exponential", "stable", "cauchy", "gencauchy", "gneiting", "gneiting.matern"
# geoR is archived from time to time...
# if (requireNamespace("geoR", quietly = TRUE)) {
# range <- range / geoR::practicalRange(cov.model = model, phi = 1, kappa = nu)
# } else {
# message(
# paste("geoR not installed... returning guess of practical range for model ", model)
# )
range <- range
# }
}
if (plotit) {
graphics::abline(v = range, lty = "dashed")
}
# NUGGET ---------------------------------------------------------------------------------------
# The initial guess for the nugget variance is commonly made using one of
# the following rules:
# 1) use the minimum semivariance in the sample variogram (Hiemstra et al.,
# 2009)
# 2) set the initial nugget value to zero (Larrondo et al., 2003)
# We can also find rules that take into account the difference in the
# semivariance between the first and second lag-distance classes weighted
# by the difference in the size of these lag-distance classes (Jian et al.,
# 1996). The resulting initial guess for the nugget variance is always lower
# than the minimum semivariance value.
nugget <- switch(
method,
a = { # Samuel-Rosa
# Is the minimum gamma in lags others than the first?
#
if (which.min(v$gamma) > 1) {
#
# If the minimum gamma is in lags other than in the first, then we
# may have one of two possibilities at hand. First, it may be that
# the best covariance model is that of a pure nugget effect. Second,
# the data at hand is not appropriate to estimate the behaviour of
# the variogram close to the origin. So, what is the best initial
# guess?
#
# The first task is to check which of the first three lags has the
# lowest gamma.
#
if (which.min(v$gamma[1:3]) == 2) {
#
# If the second lag has the lowest gamma, we have to find out if
# gamma in the first lag is higher than in the third lag.
#
if (v$gamma[1] > v$gamma[3]) {
#
# When gamma is the first lag is higher than gamma in the third
# lag, the best initial guess is the average of gamma in the
# second and third lags.
#
mean(v$gamma[c(2, 2, 3)])
} else {
#
# When gamma in the first lag is lower than in the third lag, then
# the best initial guess is the average of gamma in the first and
# second lags.
#
mean(v$gamma[c(1, 2, 2)])
}
} else {
#
# The third lag has the lowest gamma.
#
if (v$gamma[1] > v$gamma[2]) {
mean(v$gamma[c(2, 3, 3)])
} else {
mean(v$gamma[c(1, 3, 3)])
}
}
#
# The first task is to check if gamma in the first lag is larger than
# gamma in the second lag. This could easily happen if we have a
# pure nugget effect.
# Is the semivariance of the fist lag larger than that of the second
# lag? Or is the semivariance of the fist lag larger than the
# variance of z? Or is the semivariance of the fist and second lags
# larger than that of the third? This looks like a messy sample
# variogram.
# if (v$gamma[1] >= v$gamma[2] || v$gamma[1] >= var(z) ||
# which.min(v$gamma[1:3]) == 3) {
#
# Well... Let us simply use the minimum gamma value.
# min(v$gamma[1:3])
#
# }
# min(v$gamma)
} else {
# The first lag-distance class holds the minimum gamma!
#
# The task now is to estimate the shape of the semivariogram near the
# origin. The first thing to do is to test if gamma at the second lag
# is closer to the estimate of the sill, or to gamma at the first lag.
# If gamma at the second lag is closer to the estimated sill, then it
# means that the sample variogram is very steep near the origin.
# Because the distance between these two lags and the difference in
# their sizes are very small, it may be that gamma in the first
# lag-distance class is spuriously low (or not).
#
# if (diff(v$gamma[1:2]) > abs(diff(c(v$gamma[2], var(z))))) {
if (diff(v$gamma[1:2]) > abs(diff(c(v$gamma[2], sill)))) {
# Gamma in the second lag is closer to the estimated sill than to
# gamma in the first lag.
#
# The task now is to evaluate the third lag in relation to the first
# two. In the ideal situation, gamma will increase monotonically
# from the first to the third lag. If not, i.e. gamma in the third
# lag is lower than in the second lag, then our data is not
# appropriate to estimate the behaviour of the variogram near the
# origin: gamma in the first lag is too low, and gamma at the second
# lag is too high (higher than gamma at the third lag). It may be
# better to use a conservative initial guess, such as the average
# of gamma at the first and third lags.
#
if (!identical(order(v$gamma[1:3]), 1:3)) {
# mean(c(v$gamma[1], min(v$gamma[-1])))
mean(v$gamma[c(1, 3)])
} else {
# Gamma increases monotonically from the first to the third lag,
# but gamma at the first lag is too low. It may be better to use
# a conservative initial guess, such as the average of gamma at
# the first and second lags.
#
# mean(v$gamma[c(1, 2)])
mean(v$gamma[c(1, 1, 2)])
}
} else {
# Gamma in the second lag is closer to gamma in the first lag than
# to the estimated sill.
#
# This is the most wanted situation because it should enable
# estimating the behaviour of the variogram close to the origin with
# great precision. The question is whether we should use gamma in
# the fist lag as the initial guess, or a value lower than that. If
# the variogram is very steep, then it could be more appropriate to
# use a value lower than gamma in the first lag as our initial
# guess. We could use the estimated range to decide on this matter.
# a) If the estimated range is lower than the mean lag distance of
# the third lag class, then it is likely that the sample variogram
# is very steep near the origin (exponential model?). Thus, the
# initial guess for the nugget is half the difference between zero
# and gamma at the first lag.
# b) Otherwise, it is likely that the sample variogram rises slowly
# near the origin (Gaussian model?). The initial guess for the
# nugget is gamma at the first lag.
#
if (range < v$lag.dist[2]) {
if (lags0 == length(v$lag.dist)) {
diff(c(0, v$gamma[1])) * 0.5
} else {
v$gamma[1]
}
} else {
v$gamma[1]
}
}
}
},
b = { # JianEtAl1996
max(0, c(v$gamma[1] - (lags[1] / diff(lags[1:2])) * diff(v$gamma[1:2])))
},
c = { # HiemstraEtAl2009
min(v$gamma)
},
d = { # DesassisEtAl2012
1e-12
},
e = { # LarrondoEtAl2003
1e-12
}
)
if (plotit) {
graphics::abline(h = nugget, lty = "dashed")
}
# Guess the partial sill for a pure nugget effect model
if (nugget < sill) {
p_sill <- sill - nugget
} else {
# p_sill <- 1e-12
p_sill <- nugget * 1e-3
}
# Prepare output
res <- c(range = range, p_sill = p_sill, nugget = nugget)
return(res)
}
#' @export
#' @rdname variogramGuess
vgmICP <- variogramGuess
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.