In deankoch/pkern: Fast Kriging with Product Kernels

knitr::opts_chunk[['set']](echo = FALSE, warning = FALSE, message = FALSE)
# Try adding \usepackage{float} with chunk option fig.pos='h' if floating figures are driving you crazy.

# TODO: publish package and replace this with normal load call
library(devtools)
load_all()

# spatial libraries
library(sp)
library(sf)
library(terra)

# load helpers and prepared datasets
source('helpers.R')
meuse_list = get_meuse()

# copy the log zinc points data
pts = meuse_list[['soils']]['log_zinc']

# save default graphical parameters
.par_default = par(no.readonly=TRUE)

# set up a common color palette (this is the default in blitzkrig)
.pal = function(n) { hcl.colors(n, 'Spectral', rev=TRUE) }

library(smoothr)
library(ggplot2)


# TMP: leftovers from template 
# library(plotly)
# library(palmerpenguins)

Introduction

The second-order stationary Gaussian Process (SGP) plays a central role in modern analysis of spatial data. In geostatistics it underlies several important techniques of inference and prediction [@chiles2012geostatistics], including generalized least squares and kriging. We see it also in engineering and computer experiments with surrogate models [@gramacy2016lagp], in machine learning prediction algorithms [@rasmussen2006gaussian], and in probabilistic models for ecological systems [@koch2021signature], among many other applications.

In the SGP model, the joint probability distribution for $n$ points $z_k$, with coordinates $x_k$, $y_k$, is multivariate Gaussian, and its variance-covariance matrix $V$ is generated by a function $C$ of the separation vector between points $V_{ij} = C ( x_i - x_j, y_i - y_j )$ [@cressie2015statistics]. One of the charms of SGP theory is the simplicity in analytic expressions for things like likelihood, conditional expectation, and least squares estimates, all of which involve $V$ and its sub-matrices.

However, $V$ is a major headache for computer programmers. The number of entries in this matrix is $n^2$, so as sample sizes grow large, seemingly routine matrix expressions become surprisingly difficult to evaluate. For large enough $n$, readers will find that either $V$ is too large to fit in computer memory, or the order-$n^3$ complexity of its factorization is prohibitively slow, or both [@lindgren2011explicit].

These are major obstacles to software implementations of SGP, particularly when it comes to kriging. Kriging workflows with popular packages like \CRANpkg{gstat}, \CRANpkg{geoR}, and \CRANpkg{fields} can become so slow as to be unworkable (or even fail entirely due to out-of-memory errors) when the number of point locations of interest are large in number, as we will see in the [Computations] section.

This practical upper limit on sample sizes with conventional models motivates us to introduce a new package, \CRANpkg{blitzkrig}, whose implementation of the SGP is designed around Kronecker covariances on grid data, for improved computational efficiency. The computational ceiling is still there, but it is extended much higher. \CRANpkg{blitzkrig} offers:

Optimized functions for likelihood, GLS, kriging, and simulation
Flexible parametric covariance structures supporting anisotropy
Simple beginner-friendly workflows for down-scaling
Compatibility with data objects from \CRANpkg{raster}, \CRANpkg{terra}, and \CRANpkg{sf} packages

The likelihood function is particularly important here because it will enable modelers to adapt the methods in \CRANpkg{blitzkrig} as an engine for handling autocorrelation in more complex nonlinear spatial models, such as in the simulation of integro-difference equations. In fact, this package is built from code that was originall written for such a model, in a study on mountain pine beetle outbreaks [@koch2020computationally].

We illustrate the speed-up offered by \CRANpkg{blitzkrig} in the [Computations] section, with a comparison of computation times on a variety of ordinary kriging problems. This follows the [Kriging Example] section, where we present a detailed demonstration of a universal kriging workflow on the Meuse soils dataset. First we review some related tools and introducing the modelling approach that makes \CRANpkg{blitzkrig} unique.

Background

One strategy for computation with $V$ on large-$n$ problems is parallelization. The \CRANpkg{bigGP} [@paciorek2015biggp] and the \CRANpkg{laGP} [@gramacy2016lagp] packages, for example, distribute the task of block factorization to multiple linked processes. This can speed likelihood calculations by several orders of magnitude in large-$n$ examples.

However, these packages have steep learning curves, and are intended for high-memory, many-core computers. \CRANpkg{blitzkrig} instead aims to be simple to learn and operate, with a memory footprint suitable for use on ordinary desktop computers.

Large-$n$ solutions of this type usually make use of local approximations to the desired covariance function, with the goal of introducing sparsity in $V$ or its inverse (the precision matrix). Examples include: covariance tapering and fixed rank kriging, like in \CRANpkg{LatticeKrig} [@nychka2016latticekrig] and \CRANpkg{FRK} [@zammit2021frk]; Markov Random Field approximations [@lindgren2011explicit], and Bayesian approximations like R-INLA [@lindgren2015bayesian], \CRANpkg{laGP} [@gramacy2016lagp], and \CRANpkg{spBayes} [@finley2007spbayes, @finley2015spbayes].

The approach in \CRANpkg{blitzkrig} is unusual in that it computes exact likelihood (and kriging predictions) but also supports long-tailed (non-compact) covariance functions, including the very common Gaussian covariance function. The dense covariance matrices $V$ that result from this choice would normally be problematic, but our package does not rely on sparsity.

Instead it uses an algebraic shortcut that emerges for gridded layouts and certain covariance functions, called Kronecker covariances [@koch2020computationally; @drton2021existence]. This shortcut also pops up in auto-regression [@martin1979subclass] and separable space-time SGPs [@genton2007separable], but despite its elegant computational properties [@van2000ubiquitous] we rarely see it used in purely spatial problems like ordinary kriging.

CRAN lists a number of alternatives for fitting exact SGP models [@bivand2022cran], but three stand out for their scope, maturity, and quality of documentation: The \CRANpkg{gstat} package for variogram-based geostatistical modeling [@pebesma2004gstat; @bivand2013applied]; The \CRANpkg{geoR} package, which supports variogram, likelihood, and Bayesian techniques [@ribeiro1999splus; @diggle2007model], and \CRANpkg{fields} a feature-rich interpolation package by @nychka2021fields for SGPs and spline models.

The RandomFields package by @schlather2015randomfields also deserves mention, but is no longer in active development and not currently listed on CRAN. We included the most recently archived version of this package (2022-05-04) in the comparisons of the [Computations] section, along with the most recent major release of \CRANpkg{gstat}, \CRANpkg{geoR}, and \CRANpkg{fields} (as of 2022-09-04).

# define column names and a caption for the (two) calls below that make the table for the html and latex outputs
cfun_colnames = c('code', 'name', 'alias', '$c\\left( \\Delta \\right)$')
cfun_caption = 'A list of one-dimensional correlation functions available in blitzkrig. Kronecker covariances are constructed from the product of a pair of these functions, one receiving the $x$ separation distance as its argument, and the other the $y$ distance. Normalization constants are omitted for brevity, and $K_p$ denotes the order-$p$ Bessel function of the second kind (where $p$ is a shape parameter).'

# define the table
cfun_df = data.frame(

  code = c('exp', 'gau', 'gex',  'mat', 'sph'),
  name = c('Exponential', 'Gaussian', 'Gamma-Exponential', "Mat\\'ern", 'Spherical'),
  alias = c('-', 'Squared-Exponential, or Stable Kernel', 'Power-Exponential', "Whittle-Mat\\'ern", '-'),
  fun = c('$\\exp\\left( -\\Delta \\right)$',
          '$\\exp\\left( -\\Delta^2 \\right)$',
          '$\\exp\\left( -\\Delta^p \\right)$',
          '$\\Delta^p K_p\\left( \\Delta \\right)$',
          '$1 - (3/2)\\Delta + (1/2)\\Delta^3$')
)

knitr::kable(cfun_df, format='html', col.names=cfun_colnames, caption=cfun_caption, escape=FALSE)

kableExtra::kable_styling(knitr::kable(cfun_df, format='latex', col.names=cfun_colnames, booktabs=TRUE, escape=FALSE, caption=cfun_caption), font_size=9)

Model

\CRANpkg{blitzkrig} models a response data vector $z$ of point values $z_k$ (for $k=1,\dots n$) as an observation of the $n$-dimensional random vector $Z \sim \text{N} \left( X\beta, V \right)$. This splits $z$ into a deterministic trend component and a random spatial component.

The trend is represented by the expected value $X\beta$, where $X$ is a known covariate data matrix and $\beta$ an unknown vector of coefficients. The spatial component is represented by a two-dimensional (2d) covariance function $C$ which generates the covariance matrix $V$.

In \CRANpkg{blitzkrig}, the covariance function is constructed using the product of two one-dimensional components $c_x$ and $c_y$. These are functions of the $x$ and $y$ component distances, $\Delta_x$ and $\Delta_y$, and the covariance function has the form: \begin{equation} C \left( \Delta_x, \Delta_y \right) = \sigma^2 c_x \left( \Delta_x \right) c_y \left( \Delta_y \right) + \epsilon 1_{ { \Delta_x = \Delta_y = 0 }}. (#eq:covfun) \end{equation}

where component distances are scaled by range parameters $\rho_x$ and $\rho_y$, \begin{equation} \Delta_x = \frac{ \lvert x_i - x_j \rvert }{ \rho_x } \quad \text{and} \quad \Delta_y = \frac{ \lvert y_i - y_j \rvert }{ \rho_y }. (#eq:dfun) \end{equation}

We call this a Kronecker covariance because when the observed data lie on a regular grid (and we assume they do), the matrix $V - \epsilon I$ takes the form of a Kronecker product. Any pair of correlation functions can be used, and their parameters may take on different values in the $x$ and $y$ directions. Table r knitr::asis_output(ifelse(knitr::is_html_output(), '\\@ref(tab:cfun-table)', '\\@ref(tab:cfun-table-latex)')) lists all functional forms for $c$ currently implemented in \CRANpkg{blitzkrig}.

Parameters $\sigma$ and $\epsilon$ are the partial sill, and nugget variance, respectively. The nugget variance can be understood here to represent aspatial measurement error, whereas the partial sill, and range parameters describe the variance of points and the spatial profile of correlations between them.

For a more complete description these parameters and their various interpretations, see @cressie2015statistics. In particular, it is helpful to understand their role in defining a graphical diagnostic known as the semi-variogram curve, some examples of which are found towards the end of the [Kriging Example] section.

Kriging Example

\CRANpkg{blitzkrig} is primarily a kriging package, so we will demonstrate its core features in a point interpolation problem. Those familiar with the documentation for the \CRANpkg{sp} and \CRANpkg{gstat} packages will recognize the Meuse dataset, which appears in many R code examples and in the vignette by @pebesma2022meuse.

This is a survey of soil heavy metal concentrations the Meuse river floodplain in the Netherlands, introduced by @burrough2015principles and later reproduced as part of the \CRANpkg{sp} package by @pebesma2005sp. In this example we will interpolate zinc concentration while adjusting for a linear covariate, distance to river.

The SGP is a model for a multivariate normal random variable, so users should always begin by considering whether their data fits this assumption closely enough. In our case, log-transforming the concentration data produces a response variable that more closely resembles a sample from the expected distribution. These log-zinc values been loaded already in the \CRANpkg{sf} points data-frame named pts. Figure \@ref(fig:meuse-png) (left) shows how the 155 observations are positioned relative to the river.

# snap log zinc data to grid of specified resolution
g = pkern_snap(pts, g=list(gres=c(y=50, x=50)))

# plot source data using sf package, restoring default graphical parameters afterwards
plot(pts, pch=16, reset=FALSE, pal=.pal, main=NA, key.pos=1)
plot(st_geometry(pts), pch=1, add=TRUE)
plot(meuse_list[['river_poly']], col='lightblue', border=NA, add=TRUE)
plot(meuse_list[['river_line']], col='black', lwd=1, add=TRUE)
par(.par_default)

# plot gridded version using the blitzkrig package
pkern_plot(g, col_grid='lightgrey', reset=FALSE, zlab='log(ppb)')
plot(meuse_list[['river_line']], col='black', lwd=1, add=TRUE)

\CRANpkg{blitzkrig} only supports points lying on a regular grid. Irregularly spaced points like the Meuse data must be first snapped to such a grid. This is a matter of selecting a resolution sufficiently fine as to produce an acceptably small positional error. In the code above, we used pkern_snap to specify a resolution of 50 x 50 metres, creating a grid of dimensions 78 x 56 (rows x columns) with extent covering the Meuse point sample (Figure \@ref(fig:meuse-png), right). At this resolution most positional errors are in the range of 10-25 metres distance, which is good enough for the demonstration here.

Ordinary Kriging

The new gridded version of the data is now in the format expected by the \CRANpkg{blitzkrig}'s core modelling functions. A covariates data matrix X may optionally be supplied, as we do later, to account for linear trends. First, by way of comparison, we will ignore covariates and fit a spatially constant trend.

pkern_fit attempts to automatically find maximum likelihood estimators (MLEs) for the mean and covariance parameters by numerically optimizing the full joint likelihood for all observed points in g, using R's base::optim.

# fit the covariance model and mean
fit_result_ok = pkern_fit(g, quiet=TRUE)

The default covariance function implemented in pkern_fit is the isotropic Gaussian, but a variety of alternatives are available (see Table r knitr::asis_output(ifelse(knitr::is_html_output(), '\\@ref(tab:cfun-table)', '\\@ref(tab:cfun-table-latex)')) and the [Model] section). Sensible defaults for initial values and bounds are set automatically based on the sample variance and grid dimensions.

To interpolate observed data, pass the covariance parameters (argument pars) and a grid containing the observed data (argument g_obs) to pkern_cmean. This populates all grid points in g_obs with values from the kriging prediction equation (including at observed points). Variance is computed separately, but the calling syntax is the same apart from argument out='v'.

# compute conditional mean and variance 
z_ok = pkern_cmean(g_obs=g, pars=fit_result_ok[['pars']])
z_ok_var = pkern_cmean(g_obs=g, pars=fit_result_ok[['pars']], out='v', quiet=TRUE)

# make a masking polygon based on variances
is_var_high = z_ok_var > quantile(z_ok_var, 0.35, na.rm=TRUE)
mask_rast = pkern_export(modifyList(g, list(gval=is_var_high)))
mask_poly = st_as_sf(as.polygons(mask_rast))[1,] |> smoothr::smooth(method='ksmooth', smoothness=2)

# convert to raster via terra
mask_g = pkern_grid(terra::rasterize(as(st_geometry(mask_poly), 'SpatVector'), pkern_export(g)))
is_ok_masked = is.na(mask_g[['gval']])

# #terra::rasterize(poly_mask)
z_ok_plot = z_ok 
z_ok_plot[is_ok_masked] = NA
g_ok = modifyList(g, list(gval=z_ok_plot))

z_ok_var_plot = z_ok_var
z_ok_var_plot[is_ok_masked] = NA
g_ok_var = modifyList(g, list(gval=z_ok_var_plot))
# create pkern grid objects from output vectors


# plot predictor on log scale with river line 
pkern_plot(g_ok, axes=FALSE, zlab='log(ppb)', xlab='', ylab='', main='ordinary kriging predictor', reset=FALSE)
plot(meuse_list[['river_line']], col='black', lwd=1, add=TRUE)
par(.par_default)

# plot variance  
pkern_plot(g_ok_var, axes=FALSE, zlab='V(x,y)', xlab='', ylab='', main='ordinary kriging variance', pal='Inferno', reset=FALSE)
plot(meuse_list[['river_line']], col='black', lwd=1, add=TRUE)
par(.par_default)

Figure \@ref(fig:meuse-ok-pred-png) displays the output, masked to a neighbourhood of the observed data. Here we have requested predictions over the same grid that was used for fitting. Users can request any output grid they like, by snapping the observed data to it and passing it to pkern_cmean in argument g_obs. At the end of this section we will predict over a grid at much finer resolution.

The workflow up to this point has been ordinary kriging (OK), as we did not use any covariates to adjust for linear trends. We will do that now, by including distance to river (along with its square root) as example predictors, and re-fitting the model to demonstrate universal kriging (UK).

Universal Kriging

To include covariates in a model fit, simply pass a matrix X of covariate values at the observed point locations in your call to pkern_fit. For prediction, the X argument in pkern_cmean should include all of the output grid point locations.

Covariate data (rows) must be supplied in the same order as the response data in g. In the code below we construct the full matrix X by using pkern_coords to export the grid point locations to an sf points data-frame in the correct order, then use sf::st_distance to compute distances to a line geometry representing the middle of the river.

# measure distances for every point in the grid
river_dist = sf::st_distance(pkern_coords(g, out='sf'), meuse_list[['river_line']])

# include both distance and its square root
river_dist = units::drop_units(river_dist)
X = scale(cbind(river_dist, sqrt(river_dist)))

# find the subset of predictors at observed response locations
is_obs = !is.na(g[['gval']])

The UK workflow can now proceed like OK, except with X passed to pkern_fit and pkern_cmean.

# fit the covariance model again with X
fit_result_uk = pkern_fit(g_obs=g, X=X[is_obs,], quiet=TRUE)

# compute conditional mean and variance (supply all distances in X this time)
z_uk = pkern_cmean(g, fit_result_uk[['pars']], X=X)
z_uk_var = pkern_cmean(g, fit_result_uk[['pars']], X=X, out='v', quiet=TRUE)

These functions account for X by calling pkern_GLS automatically in the course of calculations to de-trend the response, $z$. Given a set of MLE covariance parameters, pkern_GLS estimates the trend $X\beta$ using the generalized least squares (GLS) expression for the effects vector $\hat{\beta}$.

# use GLS to estimate the spatially varying trend 
z_lm = pkern_GLS(g, fit_result_uk[['pars']], X=X, out='z')

# mask for points with low variance (include lowest third)
# max_var_uk = quantile(z_uk_var, 0.40)
# is_uk_masked = (z_uk_var > max_var_uk) | is_ok_masked

g_lm_plot = modifyList(g, list(gval=z_lm))

z_uk_plot = z_uk
z_uk_plot[is_ok_masked] = NA
g_uk_plot = modifyList(g, list(gval=z_uk_plot))

z_uk_var_plot = z_uk_var
z_uk_var_plot[is_ok_masked] = NA
g_uk_var_plot = modifyList(g, list(gval=z_uk_var_plot))

# plot linear predictor on log scale with river line 
pkern_plot(g_lm_plot, axes=FALSE, zlab='log(ppb)', xlab='', ylab='', main='covariates', reset=FALSE)
plot(meuse_list[['river_line']], col='black', lwd=1, add=TRUE)
par(.par_default)

# plot kriging predictor on log scale with river line 
pkern_plot(g_uk_plot, axes=FALSE, zlab='log(ppb)', xlab='', ylab='', main='kriging prediction', reset=FALSE)
plot(meuse_list[['river_line']], col='black', lwd=1, add=TRUE)
par(.par_default)

# omit high variance pixels (95th percentile) to better show interior detail
zlim = c(min(z_uk_var_plot, na.rm=TRUE), quantile(z_uk_var_plot, p=0.95, na.rm=TRUE))

# plot kriging variance on log scale with river line 
pkern_plot(g_uk_var_plot, axes=FALSE, zlab='log(ppb)', xlab='', ylab='', main='kriging variance', pal='Inferno', zlim=zlim, reset=FALSE)
plot(meuse_list[['river_line']], col='black', lwd=1, add=TRUE)
par(.par_default)

We called pkern_GLS by hand in the code above to produce the image in the left pane of Figure \@ref(fig:meuse-uk-pred-png), showing the values $X\hat{\beta}$ over the entire output grid. The middle and right panes show the resulting UK predictions and variance.

Diagnostics

Assuming the covariance model is a good fit to the data, kriging theory guarantees that pkern_cmean will return the best linear unbiased predictor for grid g_obs, in the sense of minimizing kriging variance.

To find the best fitting model for their data, users are encouraged to seek out informative covariates to include in X, and to check model fitting diagnostics for problems that would impact predictions. \CRANpkg{blitzkrig} provides two visual diagnostics in the functions pkern_plot_pars and pkern_plot_semi.

# compute sample semivariogram for the OK model
vg_ok = pkern_sample_vg(g)

# recompute variogram with trend removed
g_residual = g
g_residual[['gval']][is_obs] = g_residual[['gval']][is_obs] - z_lm[is_obs]
vg_UK = pkern_sample_vg(g_residual)

# plot the sample semi-variogram with theoretical curve in blue for OK model fit
par(mar=c(5.1, 5.1, 4.1, 2.1))
pkern_plot_semi(vg_ok, fit_result_ok[['pars']], main='OK model semi-variogram')
par(.par_default)

# plot correlation heatmap for OK model fit
g_plot = modifyList(g, list(gdim=rep(min(g[['gdim']]), 2), gyx=NULL, gval=NULL))
pkern_plot_pars(fit_result_ok[['pars']], g_plot, ij=T, main='')

# plot the sample semi-variogram with theoretical curve in blue
par(mar=c(5.1, 5.1, 4.1, 2.1))
pkern_plot_semi(vg_UK, fit_result_uk[['pars']], main='UK model semi-variogram')
par(.par_default)

# plot correlation heatmap
g_plot = modifyList(g, list(gdim=rep(min(g[['gdim']]), 2), gyx=NULL, gval=NULL))
pkern_plot_pars(fit_result_uk[['pars']], g_plot, ij=T, main='')

The output of pkern_fit includes a list of covariance parameters. Any such parameter list can be visualized using pkern_plot_pars. This shows the footprint of covariances between any given point and its surrounding points and indicates the level of smoothing to be expected in predictions. Another type of visualization, the semi-variogram, can be plotted with pkern_plot_semi. Various estimators for the semi-variogram are implemented in pkern_sample_vg. These generate a point cloud of sample semi-variances, estimated directly from the data, that can be compared visually with the theoretical curve obtained via MLE.

Diagnostic plots from the OK and UK workflows are shown together for comparison in Figure \@ref(fig:meuse-vg-png). Accounting for distance to the river has a dramatic effect on the fitted covariance model. The UK model estimates a much lower variance than OK, and favors a much smaller effective range (or spatial scale of correlations). This leads to more spatial detail in the kriging predictions from UK.

The initial values and bounds used in fitting covariance parameters can be found in the output of pkern_fit (data-frame bds). We encourage users to understand these settings, and to be on the lookout for common problems with fitting, such as parameters converging to a bound, or not moving from their initial values. Both can indicate problems of model-misspecification, or simply a poor choice of initial values.

Anistropic Models

Another way of improving model fit is to explore covariance functions with more flexibility than the default isotropic Gaussian. The term isotropic refers to radial symmetry in $C$ (equal distance implying equal covariance), and anisotropic refers to its absence. \CRANpkg{blitzkrig} supports models of both kinds.

The following code fits two examples to the Meuse data; The first is a Gaussian covariance with anisotropy, and the second a Matérn product covariance [@koch2020computationally].

# fit a 2 + 2 parameter Gaussian covariance with anisotropy
fit_result_uk_gau = pkern_fit(g_obs=g, X=X[is_obs,], iso=FALSE, quiet=TRUE)

# fit a 2 + 4 parameter product Matern
fit_result_uk_mat = pkern_fit(g_obs=g, X=X[is_obs,], pars='mat', iso=FALSE, quiet=TRUE)

The iso=FALSE argument allows $c_x$ and $c_y$ to take on different parameters, whereas the default iso=TRUE forces $c_x=c_y$. The pars='mat' argument is shorthand for pars=c(y='mat', x='mat'). It specifies that $c_x$ and $c_y$ should both be Matérn functions.

# fit the other kernels
fit_result_uk_gxp = pkern_fit(g, pars='gxp', X=X[is_obs,], iso=FALSE, quiet=TRUE)
fit_result_uk_sph = pkern_fit(g, pars='sph', X=X[is_obs,], iso=FALSE, quiet=TRUE)
fit_result_uk_exp = pkern_fit(g, pars='exp', X=X[is_obs,], iso=FALSE, quiet=TRUE)

# plot everything in minimal style
g_plot = modifyList(g, list(gdim=rep(min(g[['gdim']]), 2), gyx=NULL, gval=NULL))
pkern_plot_pars(fit_result_uk_gau$pars, g_plot, minimal=TRUE, main='', xlab='', ylab='', pal='Inferno')
pkern_plot_pars(fit_result_uk_mat$pars, g_plot, minimal=TRUE, main='', xlab='', ylab='', pal='Inferno')
pkern_plot_pars(fit_result_uk_gxp$pars, g_plot, minimal=TRUE, main='', xlab='', ylab='', pal='Inferno')
pkern_plot_pars(fit_result_uk_sph$pars, g_plot, minimal=TRUE, main='', xlab='', ylab='', pal='Inferno')
pkern_plot_pars(fit_result_uk_exp$pars, g_plot, minimal=TRUE, main='', xlab='', ylab='', pal='Inferno')
par(.par_default)

The anisotropic Gaussian function has one more parameter than its isotropic counterpart, since it allows the two range parameters $\rho_x$ and $\rho_y$ to vary independently. The Matérn product function adds two more shape parameters, controlling the degree of kurtosis in two directions. The fitted parameters from these two models are visualized as the first two heatmaps on the left of Figure \@ref(fig:meuse-alt-fit), together with models for three other choices of pars.

The Gaussian is the only isotropic model in \CRANpkg{blitzkrig}. The product Matérn function (fit_result_uk_mat above), for example, does not generalize the more common 2d (isotropic) Matérn function. It does however approach the Gaussian as $p \to \inf$ [@koch2020unifying].

Other choices of $c_x$, $c_x$ lead to a variety of other types of anisotropy, but always with a directionality oriented along one or both of the coordinate axes. Differently oriented data could be modeled by estimating the direction of anisotropy in a preliminary analysis [eg. by the method of @koch2020computationally] then rotating the observed point locations by this angle prior to snapping. For simplicity, we skip this step here.

Anisotropic covariance functions impart complexity, in the form of new parameters, but also flexibility. Flexibility is a double-edged sword. It can improve a model by more closely aligning it with reality, or it can worsen it by enabling over-fitting, which leads to underestimates of uncertainty and poor predictions.

Cross-validation

# extract the GLS estimates of the linear predictor (trend)
pts_lm_mat = pkern_GLS(g, fit_result_uk_mat[['pars']], X=X, out='z')

# recompute variogram with trend removed
g_residual_mat = g
g_residual_mat[['gval']][is_obs] = g_residual_mat[['gval']][is_obs] - pts_lm_mat[is_obs]
vg_UK_mat = pkern_sample_vg(g_residual_mat)

# compute conditional mean and variance
z_uk_mat = pkern_cmean(g, fit_result_uk_mat[['pars']], X=X)
z_uk_mat_var = pkern_cmean(g, fit_result_uk_mat[['pars']], X=X, out='v', quiet=TRUE)

# copy data and apply mask
z_uk_mat_plot = z_uk_mat
z_uk_mat_plot[is_ok_masked] = NA
g_uk_mat_plot = modifyList(g, list(gval=z_uk_mat_plot))

z_uk_mat_var_plot = z_uk_mat_var
z_uk_mat_var_plot[is_ok_masked] = NA
g_uk_mat_var_plot = modifyList(g, list(gval=z_uk_mat_var_plot))

# plot the sample semi-variogram with theoretical curve in blue for anisotropic model fit
par(mar=c(5.1, 5.1, 4.1, 2.1))
pkern_plot_semi(vg_UK_mat, fit_result_uk_mat[['pars']], main='Matern model semi-variogram')
par(.par_default)

# plot kriging predictor on log scale with river line 
pkern_plot(g_uk_mat_plot, axes=FALSE, zlab='log(ppb)', xlab='', ylab='', main='kriging prediction', reset=FALSE)
plot(meuse_list[['river_line']], col='black', lwd=1, add=TRUE)
par(.par_default)

# omit high variance pixels (98th percentile) to better show interior detail
zlim = c(min(z_uk_mat_var_plot, na.rm=TRUE), quantile(z_uk_mat_var_plot, p=0.98, na.rm=TRUE))

# plot kriging variance on log scale with river line 
pkern_plot(g_uk_mat_var_plot, axes=FALSE, zlab='log(ppb)', xlab='', ylab='', main='kriging variance', pal='Inferno', zlim=zlim, reset=FALSE)
plot(meuse_list[['river_line']], col='black', lwd=1, add=TRUE)
par(.par_default)

Figure \@ref(fig:meuse-alt-pred-png) shows the semi-variogram, predictions, and variance from the Matérn product model fitted in the previous section. The predictions have more spatial complexity compared to the isotropic UK model fitted earlier on, and variance has decreased slightly. Are we over-fitting? One way to check is through cross-validation (CV).

In CV, we withhold a subset of the data (a hold-out set, or fold) during model fitting, and then come back to it later as a reference to compare predictions. If we have an over-fitting problem, the model is likely to predict well on the training locations, but poorly on the hold-out locations. The hold-out error gives an indication of what to expect when predicting at an unseen locations.

We performed a 5-fold cross-validation experiment, by dividing the 155 observed Meuse points (randomly) into five folds, each of size n=31. For each fold, we fitted all four models to the complementary subset of the data (size n=124), then computed the mean squared prediction error on the hold-out set for log-zinc (MSPE) and its back-transformed version, MSPEb (see the [Back-transforming] section below).

# load pre-computed results
cv_results_path = 'D:/pkern/vignettes/comparisons/data/cv_results.csv'
cv_results_all = read.csv(cv_results_path, row.names=NULL)
cv_results = cv_results_all[c('name', 'covariance', 'isotropic', 'covariates', 'parameters', 'rMSPE', 'rMSPEb')]
#cv_results['covariates'] = as.logical(cv_results['covariates'])

# this helper function call runs the CV workflow (uncomment to run again)
#cv_results = run_cv(g, X, n_fold=5, n_rep=25)
#write.csv(cv_results, cv_results_path)

# define caption text here
caption_cv = 'Estimates of the root mean squared prediction error on the Meuse dataset for log zinc (rMSPE) and its back-transformed values (rMSPEb) in a 25 X 5-fold cross-validation (CV) experiment. Results are reported for the four kriging models presented earlier, illustrating a progression of improvement as we add covariates and refine the covariance model.'

knitr::kable(cv_results, format='html', caption=caption_cv)

kableExtra::kable_styling(knitr::kable(cv_results, format='latex', booktabs=TRUE, caption=caption_cv), font_size=9)

As these statistics were sensitive to the choice of partition, we repeated the process 25 times and averaged the results. They are reported in Table r knitr::asis_output(ifelse(knitr::is_html_output(), '\\@ref(tab:cv-table)', '\\@ref(tab:cv-table-latex)')). Lower scores indicate better predictive performance and a better fitting model. We found a progression of improvements as we added complexity: starting with our baseline model (fit_result_ok), then adding covariates (fit_result_uk), introducing anisotropy (fit_result_uk_gau) and, finally, introducing additional flexibility in covariance shape (fit_result_uk_mat).

Back-transforming

If predictions are required on the original scale (ppb, instead of its logarithm), users may think to simply take the exponential of the kriging predictions vector z_uk. However this leads to underestimates of the (random) variable $\exp(z)$ (by Jensen's inequality), so we recommend adding one half the kriging variance to z_uk before exponentiating [@cressie2015statistics]. Note that this particular bias-adjustment is specific to the log-transformation.

Computations

This final section explores the computational bottlenecks that can make kriging slow, and that techniques that \CRANpkg{blitzkrig} uses to circumvent them. The core idea is to rewrite the matrix $V - \epsilon I$ as a Kronecker product (see the [Model] section), so that operations involving sub-matrices of $V$ can be simplified.

Two sub-matrices are particularly important in this context. The first is the $n_o \times n_o$ covariance $V_o$, for the $n_o$ observed points. $V_o$ has to be generated (its entries calculated), then factorized, so that products of the form $V_o^{-1} z$ can be calculated. This happens in many routine spatial statistical operations - including likelihood, GLS, kriging equations, and conditional simulation - and it the reason they become computationally intensive for large $n_o$.

The second important sub-matrix is the $n_p \times n_o$ cross-covariance $V_p$, between the observed locations and the $n_p$ unseen prediction locations. $n_p$ can be very large, particularly when a gridded output is needed, making products with $V_p$ expensive both in terms of computation time and memory use. Such products appear in both the kriging prediction and variance equations.

\CRANpkg{blitzkrig} generates the entries of $V_o$ and $V_p$ directly from the Kronecker product factorization of $V - \epsilon I$. This speeds construction time for $V_o$ and $V_p$, and, more importantly, it enables the package to multiply a vector by $V_p$ without having to explicitly build $V_p$ in computer memory.

Benchmark Comparisons

To illustrate the speed of \CRANpkg{blitzkrig} we timed computations for the Meuse OK exercise presented in the [Kriging Example] section, repeating it several times over a range of output resolutions (varying $n_p$) and recording median evaluation times in five repetitions using \CRANpkg{microbenchmark}.

For comparison we also tested four popular R packages mentioned in the introduction, \CRANpkg{gstat}, \CRANpkg{geoR}, \CRANpkg{fields}, and RandomFields, all using the same isotropic Gaussian model fitted with default settings. To get a range of observed sample sizes ($n_o$) values we repeated this process with five other point datasets, one on ozone concentration, and four on forest density.

The ozone data are included as the object ChicagoO3 in the \CRANpkg{fields} package [@nychka2021fields] and appear frequently in its example code. They are Environmental Protection Agency recordings of average ozone concentration in the air at $n_o=20$ Chicago-area stations in 1987.

example_input_path = 'D:/pkern/vignettes/comparisons/data/inputs/treed_1376_1377329.rds'

# load the training data
temp_list = readRDS(example_input_path)
treed_g_train = temp_list[['training_grid']]
treed_pts = temp_list[['training_pts']]
rm(temp_list)

start_fit = Sys.time()

# extract covariate data from DEM
dem_path = 'D:/pkern/vignettes/comparisons/data/treed_dem.tif'
dem_g_src = pkern_grid(rast(dem_path))

# load the full resolution treed data and snap training points to it
treed_path = 'D:/pkern/vignettes/comparisons/data/treed.tif'
treed_g_src = pkern_grid(rast(treed_path))
treed_g_snap = pkern_snap(treed_pts, treed_g_src)
is_treed_obs = !is.na(treed_g_snap[['gval']])

# use elevation and its square root as covariates
treed_X = cbind(dem_g_src[['gval']], sqrt(dem_g_src[['gval']]))
X = NA

# fit the model
treed_fit_result = pkern_fit(treed_g_train, X=treed_X[is_treed_obs,], iso=FALSE, pars='gau', quiet=TRUE)
pars = treed_fit_result[['pars']]

end_fit = Sys.time()
start_pred = Sys.time()

# make predictions
treed_z = pkern_cmean(g_obs=treed_g_snap, X=treed_X, pars=pars)
treed_g_pred = modifyList(treed_g_snap, list(gval=treed_z))

# report both times
end_pred = Sys.time()
print(end_fit-start_fit)
print(end_pred-start_pred)


# find a common range for colorbar
zlim = range(c(treed_g_pred$gval, treed_g_train$gval, treed_g_src$gval), na.rm=TRUE)
pal_nm = 'YlGn'
pal = function(n) { hcl.colors(n, pal_nm, rev=TRUE) }

# draw the three plot panes
pkern_plot(treed_g_src, zlim=zlim, main='treed data (original scale)', minimal=TRUE, pal=pal_nm, rev=TRUE, col_box='black')
pkern_plot(treed_g_train, zlim=zlim, main='treed data (up-scaled)', minimal=TRUE, col_grid='white', pal=pal_nm, rev=TRUE)
pkern_plot(treed_g_pred, zlim=zlim, zlab='density', main='kriging predictions', minimal=TRUE, pal=pal_nm, rev=TRUE)

The forest data come from Canada-wide estimates of forest characteristics by @beaudoin2018tracking. Estimates of areal tree cover (%) for the province of British Columbia (BC) are re-published in \CRANpkg{rasterbc} as (raster) tiles, each containing around one million data points. We created 4 much smaller example datasets by taking sub-samples of points from a tile in Central BC (Figure \@ref(fig:treed-png), left).

For one of the forest sub-samples (which we call treed) we picked one thousand points at random. For the others we sampled points at regular intervals, so that the sub-samples themselves formed rasters. These up-scaled datasets are named treed_88, treed_352, treed_1376, with the suffix indicating $n_o$. Figure \@ref(fig:treed-png) (middle) shows the largest.

The three raster examples are special in that the observed point sets form complete grids. They contain no NAs. The three irregularly sampled point sets (ozone, Meuse, and treed) are examples of the incomplete data case. All but around 0.1% of the points in the treed grid are NA, and in the (snapped) ozone and Meuse grids, empty, un-sampled spaces are filled with NAs (see Figure \@ref(fig:meuse-png), right).

Complete Data

This distinction between complete and incomplete is important because both $V - \epsilon I$ and $V_o - \epsilon I$ have Kronecker product factorizations in the complete case, and this makes $V_o$ much easier to deal with. The result is big performance improvements in \CRANpkg{blitzkrig} on large $n_o$ problems, particularly in evaluations of the likelihood.

# load the benchmarking results
inputs_csv_path = 'D:/pkern/vignettes/comparisons/data/inputs.csv'
result_csv_path = 'D:/pkern/vignettes/comparisons/data/outputs.csv'
inputs_df = read.csv(inputs_csv_path, row.names=NULL)
results_df = read.csv(result_csv_path, row.names=NULL)
results_names = unique(results_df[['name_in']]) 

# TODO: run bechmarking script again with higher max output points and save error messages (if any) to data frame
# then copy some of this code to the end of that script and save csv. The factor reordering probably has to stay here
# also include gridded and gx, gy columns

# set variance eval time 0 for packages that included variance in kriging prediction output
results_df[['var_s']][ results_df[['pkg']] %in% c('gstat', 'geoR') ] = 0 

# create variance + prediction eval time column
results_df[['pred_and_var_s']] = results_df[['pred_s']] + results_df[['var_s']]
results_df[['with_var']] = 'yes' 

# add new rows to represent prediction time only
results_only_pred = results_df[results_df[['pkg']] %in% c('fields', 'pkern', 'RandomFields'),]
results_only_pred[['pred_and_var_s']] = results_only_pred[['pred_s']]
results_only_pred[['with_var']] = 'no'
results_df = rbind(results_df, results_only_pred)

# re-order packages factor to the order we want them listed the legend
results_df[['pkg']] = factor(results_df[['pkg']], levels=c('pkern', 'fields', 'RandomFields', 'gstat', 'geoR'))

# indicate if the example dataset is a complete grid
results_df[['is_complete']] = c('no', 'yes')[ 1 + as.integer(!is.na(results_df[['gdim_x']])) ]
results_df[['is_complete']] = factor(results_df[['is_complete']], levels=c('yes', 'no'))

# plot time to fit the model against number of observed points
ggplot(subset(results_df, !(pkg == 'gstat'))) +
  aes(x=n_train, y=fit_s, color=pkg, lty=is_complete) +
  geom_point(size=1, pch=1) +
  geom_line(lwd=0.5) +
  scale_linetype_manual(values=c(no='solid', yes='11')) +
  xlab('number of observed points') +
  ylab('fitting time (seconds)') +
  labs(color='R package',
       lty='complete grid') +
  scale_x_log10(
    breaks = scales::trans_breaks('log10', function(x) 10^x),
    labels = scales::trans_format('log10', scales::math_format(10^.x))
  ) +
  scale_y_log10(
    breaks = scales::trans_breaks('log10', function(x) 10^x),
    labels = scales::trans_format('log10', scales::math_format(10^.x))
  ) +
  theme_bw() +
  theme(text=element_text(size=8),
        strip.text.x=element_text(face='bold'),
        strip.text.y=element_text(face='bold'))

The improvement is illustrated in Figure \@ref(fig:bench-fit-png), which plots evaluation time against $n_o$ for likelihood-based model fitting. We excluded \CRANpkg{gstat} from this comparison because it uses variograms (instead of MLE) to parametrize covariance. The other packages all follow a similar trajectory of rising computation times, with the exception of \CRANpkg{blitzkrig} in the complete data case (dotted lines). Speedy likelihood function evaluations by \CRANpkg{blitzkrig} led to faster fitting times for large $n_o$.

In a log-log plot like Figure \@ref(fig:bench-fit-png), a linear trend with slope $p$ reflects a complexity power law with exponent $p$. What is noteworthy here is in the large-$n_o$ behavior, which shows a similar $p$ for all packages except \CRANpkg{blitzkrig} on complete examples, where it is faster than the rest by a factor that grows exponentially with $n_o$. These differences reflects the rate-limiting operation of factoring $V_o$. In \CRANpkg{blitzkrig} its algorithmic complexity is $O( n_o^3 )$ for incomplete data, and closer to $O(n_o^{3/2})$ for complete data.

The performance advantage of \CRANpkg{blitzkrig} on complete grids is not limited to kriging-related problems. We can expect similar improvements in computation time for any application of likelihood, GLS, or simulations from SGPs. Note that the completeness requirement is strict - any number of NAs in the data grid means the grid is incomplete

Prediction and Variance

After model fitting we timed evaluations of the kriging prediction and variance equations for ten prediction grids, ranging in size from $8 \times 11$ ($n_p=88$) to $4084 \times 5396$ ($n_p>$ 22 million). To limit the total length of the experiment, the function calls for each package were halted (timed-out) at around two minutes.

For \CRANpkg{blitzkrig} and \CRANpkg{fields}, we timed kriging prediction and variance separately. In \CRANpkg{gstat} and \CRANpkg{geoR}, both are returned from a single function call, so we only recorded the combined time. In RandomFields, variance was unavailable at the time of writing, so we recorded only prediction time.

# names of the two groups of example datasets (pts = incomplete, rast = complete)
pts_names = c('ozone', 'meuse', 'treed')
rast_names = paste0('treed_', c(88, 352, 1376))

# reorder data to reflect the panel order we want in the plot
results_plot_df = results_df[ results_df[['name_in']] %in% c(pts_names, rast_names),]
results_plot_df = results_plot_df[order( match(results_plot_df[['name_in']], c(pts_names, rast_names)) ),]

# create titles with sample size and order by input size
results_plot_df[['n_plot']] = paste0('(n_o=', results_plot_df[['n_train']], ')')
results_plot_df[['name_plot']] = apply(results_plot_df[, c('name_in', 'n_plot')], 1, paste, collapse=' ')
results_plot_df[['name_plot']] = factor(results_plot_df[['name_plot']], levels=unique(results_plot_df[['name_plot']]))

# make the plot
ggplot(results_plot_df) +
  aes(x=n_out, y=pred_and_var_s, color=pkg, lty=with_var) +
  geom_point(size=1, pch=1) +
  geom_line(lwd=0.5) +
  scale_linetype_manual(values = c(no='11', yes='solid')) +
  xlab('number of points predicted') + 
  ylab('prediction time (seconds)') +
  labs(color='R package',
       lty='with variance') +
  scale_x_log10(
   breaks = scales::trans_breaks('log10', function(x) 10^x),
   labels = scales::trans_format('log10', scales::math_format(10^.x))
  ) +
  scale_y_log10(
   breaks = scales::trans_breaks('log10', function(x) 10^x),
   labels = scales::trans_format('log10', scales::math_format(10^.x))
  ) +
  facet_wrap(vars(name_plot)) +
  #scale_color_viridis(discrete=TRUE, option='turbo') +
  theme_bw() +
  theme(text=element_text(size=8),
        strip.text.x=element_text(face='bold'),
        strip.text.y=element_text(face='bold'))

The times are plotted in Figure \@ref(fig:bench-pred-png) as a function of prediction grid size, $n_p$. Results from each data example are shown separately in two rows of three panels - the top three show results from the incomplete cases, and the bottom three from the complete cases. One thing that is obvious right away is the relative speed of prediction (dashed lines) compared to variance and prediction (solid lines). This is because in the variance equation we must multiply $V_p$ by a matrix, whereas in prediction we multiply it by a vector (increasing complexity by a factor of $n_o$). This makes RandomFields, \CRANpkg{fields}, and \CRANpkg{blitzkrig} stand in out the large $n_p$ results for having fast prediction-only methods.

Even with incomplete data, the prediction method of \CRANpkg{blitzkrig} was the fastest by far on all but the smallest $n_p$ and $n_o$ examples. As we saw with likelihood, the relative speed-up increases with $n_o$, and by $n_o=1000$ it is nearly two orders of magnitude faster than the next fastest method for large $n_p$ problems. Variance was also faster with large $n_o$ on the incomplete examples, but to a lesser extent.

With complete data, prediction times were even faster. For example, kriging from the treed_1376 points onto a $1021 \times 1349$ grid took less than half a second in \CRANpkg{blitzkrig}, whereas the next fastest package (\CRANpkg{fields}) required 90 seconds.

\CRANpkg{blitzkrig} is naturally suited down-scaling with complete data, and the addition of covariates and anisotropy introduces little additional computational complexity in prediction. For example, it took less than three seconds to create the $1021 \times 1349$ UK output for the treed_1376 example in Figure \@ref(fig:treed-png) (right) using the anisotropic Gaussian covariance, and elevation, along with its square root, as covariates. Half of this time was used in fitting and half in prediction.

Variance computations were also much improved in the complete case, where the combined variance and prediction times for \CRANpkg{blitzkrig} were faster than any other package by 1-2 orders of magnitude in all but the smallest $n_p$ examples.

\CRANpkg{blitzkrig} computes the variance in an unusual way (for both complete and incomplete problems). Instead of multiplying $V_p$ by the $n_o \times n_o$ matrix $V_o^{-1}$, we multiply it be the eigen-vectors of $V_o$ in a (length-$n_o$) loop, combining results as it goes. This is slower than matrix multiplication but it greatly reduces computer memory demands on problems with $n_o$ and $n_p$ both large, as we avoid ever having to write $V_p$ or the product $V_p V_o^{-1}$ entirely in memory during kriging.

Summary

Kronecker covariances are common enough in geostatistics. We see them everywhere in the form of Gaussian covariance functions, and separable space-time models. Nevertheless, we are unaware of any other CRAN R package for SGP models that factors the spatial covariance $V$ into a Kronecker product, as we do in \CRANpkg{blitzkrig}.

This factorization leads to huge computational advantages in evaluating likelihood and kriging equations (among other things), as we saw in Figures \@ref(fig:bench-fit-png) and \@ref(fig:bench-pred-png)), with speed-ups of 1-3 orders of magnitude on problems with hundreds to thousands of observed points. This, for example, makes it feasible to downscale a raster onto a grid of several million points in seconds, using a statistically principled, MLE-based OK or UK model.

A fast interpolation tool like this could supplant methods like inverse distance weighting and Voronoi tiling, which, despite their bias issues, are often considered as reasonable fill-ins for when kriging is computationally impractical.

\CRANpkg{blitzkrig} gets its performance edge through restrictions on covariance and point layout, but we nevertheless think it has wide applicability. Users can simply snap irregular points to a reasonably large grid, as we did in the Meuse example. And while some important covariance functions like the 2d Matérn are excluded, the alternatives provided will be flexible enough approximations in many cases [@koch2020computationally].

In specializing for Kronecker covariance, \CRANpkg{blitzkrig} raises the upper limits for practical sample sizes and resolutions in SGP-based analysis, providing a more interactive and less frustrating user experience for analysts.

References

deankoch/pkern documentation built on Oct. 26, 2023, 8:54 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

deankoch/pkern
Fast Kriging with Product Kernels

In deankoch/pkern: Fast Kriging with Product Kernels

Introduction

Background

Model

Kriging Example

Ordinary Kriging

Universal Kriging

Diagnostics

Anistropic Models

Cross-validation

Back-transforming

Computations

Benchmark Comparisons

Complete Data

Prediction and Variance

Summary

References

R Package Documentation

Browse R Packages

We want your feedback!

deankoch/pkern Fast Kriging with Product Kernels

In deankoch/pkern: Fast Kriging with Product Kernels

Introduction

Background

Model

Kriging Example

Ordinary Kriging

Universal Kriging

Diagnostics

Anistropic Models

Cross-validation

Back-transforming

Computations

Benchmark Comparisons

Complete Data

Prediction and Variance

Summary

References

R Package Documentation

Browse R Packages

We want your feedback!

deankoch/pkern
Fast Kriging with Product Kernels