pkern: Fast Kriging with Product Kernels

meuse_vignette

Dean Koch 2022-06-03

Mitacs UYRW project

pkern: Fast kriging on gridded datasets

This vignette shows how to use pkern to interpolate the Meuse river dataset (part of the sp package) onto a high resolution grid:

pkern_grid defines the grid
pkern_snap snaps point data to the grid
pkern_fit fits a covariance model to the data
pkern_cmean computes the kriging predictor and variance

All plots in this document except the first are generated by pkern_plot, which uses base graphics to display rasters and matrices. Try setting ij=TRUE in any of the plot calls to see a different styling used for matrices.

pkern has no requirements outside of base packages (stats, utils, grDevices, and graphics), but terra and sf are (strongly) suggested, and we use them in this vignette to load and manage the example data.

# load extra dependencies for the vignette
library(sp)
library(sf)

## Linking to GEOS 3.9.1, GDAL 3.2.1, PROJ 7.2.1

library(terra)

## terra version 1.4.22

We will look at the “meuse” dataset (Pebesma, 2009) on heavy metal concentrations in the Meuse river floodplain in the Netherlands. These data are included with the sp package that we just loaded, and appear in this tutorial pdf for gstat. Load the data with data(meuse) and data(meuse.riv). I have a helper function get_meuse to do this (see source code for this document).

# load data
meuse = get_meuse()
str(meuse)

## List of 3
##  $ soils     :Classes 'sf' and 'data.frame': 155 obs. of  15 variables:
##   ..$ cadmium : num [1:155] 11.7 8.6 6.5 2.6 2.8 3 3.2 2.8 2.4 1.6 ...
##   ..$ copper  : num [1:155] 85 81 68 81 48 61 31 29 37 24 ...
##   ..$ lead    : num [1:155] 299 277 199 116 117 137 132 150 133 80 ...
##   ..$ zinc    : num [1:155] 1022 1141 640 257 269 ...
##   ..$ elev    : num [1:155] 7.91 6.98 7.8 7.66 7.48 ...
##   ..$ dist    : num [1:155] 0.00136 0.01222 0.10303 0.19009 0.27709 ...
##   ..$ om      : num [1:155] 13.6 14 13 8 8.7 7.8 9.2 9.5 10.6 6.3 ...
##   ..$ ffreq   : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
##   ..$ soil    : Factor w/ 3 levels "1","2","3": 1 1 1 2 2 2 2 1 1 2 ...
##   ..$ lime    : Factor w/ 2 levels "0","1": 2 2 2 1 1 1 1 1 1 1 ...
##   ..$ landuse : Factor w/ 15 levels "Aa","Ab","Ag",..: 4 4 4 11 4 11 4 2 2 15 ...
##   ..$ dist.m  : num [1:155] 50 30 150 270 380 470 240 120 240 420 ...
##   ..$ geometry:sfc_POINT of length 155; first list element:  'XY' num [1:2] 181072 333611
##   ..$ distance: num [1:155, 1] 118 125 232 359 482 ...
##   ..$ log_zinc: num [1:155] 6.93 7.04 6.46 5.55 5.59 ...
##   ..- attr(*, "sf_column")= chr "geometry"
##   ..- attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ...
##   .. ..- attr(*, "names")= chr [1:14] "cadmium" "copper" "lead" "zinc" ...
##  $ river_poly:sfc_POLYGON of length 1; first list element: List of 1
##   ..$ : num [1:413, 1:2] 181505 181478 181452 181420 181388 ...
##   ..- attr(*, "class")= chr [1:3] "XY" "POLYGON" "sfg"
##  $ river_line:sfc_LINESTRING of length 1; first list element:  'XY' num [1:409, 1:2] 181444 181425 181412 181398 181379 ...

# plot using sf package
plot(meuse[['river_poly']], col='lightblue', reset=FALSE)
plot(meuse[['river_line']], lwd=2, add=TRUE)
plot(meuse[['soils']]['zinc'], pch=16, add=TRUE)

n_meuse = nrow(meuse[['soils']])

We will interpolate the log-transformed values (as in the gstat vignette). Start by defining a grid and snapping the points to it

# desired resolution in units of metres
gres = c(y=5, x=5)

# snap points, copying values of dependent variable
g_meuse = pkern_snap(meuse[['soils']]['log_zinc'], g=list(gres=gres))

# plot the grid only
pkern_plot(modifyList(g_meuse, list(gval=NA)), col_grid=NA)

# plot with source points indicated over their snapped grid location
pkern_plot(g_meuse, zlab='log(zinc)', reset=FALSE)
plot(sf::st_geometry(meuse[['soils']]), add=TRUE)

Pass this data to pkern_fit to do ordinary kriging

# ordinary kriging: fit isotropic gaussian model by default
fit_result_OK = pkern_fit(g_obs=g_meuse, pars='gau', quiet=TRUE)

By default, a sample semi-variogram is plotted, with the fitted model drawn in transparent blue. Notice that a range of semi-variances can be realized for a given distance because the model is anisotropic.

The fitted model overestimates the (partial sill) variance in this case because there is an unaccounted-for trend. Ordinary kriging estimates spatially constant trends, but here we should expect it to depend on how near we are to the presumed source of the zinc deposits (the Meuse River).

The code below demonstrates creating a linear predictor - the distance to the river.

# useful variables
gdim = g_meuse[['gdim']]
is_obs = !is.na(g_meuse[['gval']])
n_obs = sum(is_obs)

# get distance values for entire grid
g_meuse_sf = pkern_coords(g_meuse, out='sf')

## processing 435240 grid points...

d2r_result = units::drop_units(st_distance(g_meuse_sf, meuse[['river_line']]))

# include both distance and its square root
meuse_predictors = scale(cbind(d2r_result, sqrt(d2r_result)))

# copy the subset of predictors at observed response locations
X = matrix(meuse_predictors[is_obs,], nrow=n_obs)

Fit the model with the linear predictor included. This is called universal kriging

# fit the model and plot semivariogram
fit_result_UK = pkern_fit(g_obs=g_meuse, pars='gau', X=X, quiet=TRUE)

# print the GLS estimates for coefficients of the linear predictor
pars_UK = fit_result_UK$pars
pkern_GLS(g_meuse, pars_UK, X=X, out='b')

## [1]  5.681606  1.238569 -1.884427

These fitted kernels can be visualized using pkern_plot_pars. This plots the footprint of correlations around any given central point on the grid.

# plot the kernel shape
pkern_plot_pars(pars_UK, g_meuse)

# print the actual contents of the parameters list
str(pars_UK)

## List of 4
##  $ y    :List of 2
##   ..$ k : chr "gau"
##   ..$ kp: Named num 233
##   .. ..- attr(*, "names")= chr "rho"
##  $ x    :List of 2
##   ..$ k : chr "gau"
##   ..$ kp: Named num 233
##   .. ..- attr(*, "names")= chr "rho"
##  $ eps  : num 0.0856
##  $ psill: num 0.0937

# GLS to estimate the (spatially varying) trend
z_gls = pkern_GLS(g_meuse, pars_UK, X=meuse_predictors, out='z')
g_meuse_gls = modifyList(g_meuse, list(gval=z_gls))
pkern_plot(g_meuse_gls, main='estimated trend component')

Once you have a set of covariance parameters, interpolation becomes very easy with pkern_cmean. This computes the kriging predictor, an interpolator with nice properties like unbiasedness and minimal variance, under suitable assumptions.

# compute spatial mean
g_meuse_detrend = modifyList(g_meuse, list(gval=g_meuse[['gval']]-z_gls))
z_spat = pkern_cmean(g_meuse_detrend, pars_UK, X=0)
g_meuse_spat = modifyList(g_meuse, list(gval=z_spat))
pkern_plot(g_meuse_spat,
           main='estimated spatial component')

# compute UK predictions, masking outside observed range
z_pred = z_gls + z_spat
g_meuse_pred = modifyList(g_meuse, list(gval=z_pred))
pkern_plot(g_meuse_pred,
           zlim=log(range(meuse[['soils']][['zinc']])),
           zlab='log(zinc)',
           main='kriging predictor')

Kriging variance is also computed with pkern_cmean. This is much slower to compute than the kriging predictor

# prediction variance
z_var = pkern_cmean(g_meuse_detrend, pars_UK, X=0, out='v', quiet=TRUE)
g_meuse_var = modifyList(g_meuse, list(gval=z_var))
pkern_plot(g_meuse_var, main='kriging variance')

The above plots are for the variable on the log-scale. The unbiasdness property is lost when transforming back to the original scale, but now that we have the kriging variance, we can make a correction:

# prediction bias adjustment from log scale
z_pred2 = exp(z_pred + z_var/2)
g_meuse_pred2 = modifyList(g_meuse, list(gval=z_pred2))
pkern_plot(g_meuse_pred2,
           zlab='zinc (ppm)',
           zlim=range(meuse[['soils']][['zinc']]),
           main='UK predictions (masked to input range)')

This vignette is intended to get users up and running with pkern by demonstrating a very simple example on a familiar dataset. In other vignettes we will look in more detail at how pkern_cmean actually works, and how users can modify the workflow to incorporate a trend model.

This chunk below is used to create the markdown document you’re reading from the R script file with same name, in this directory. It uses rmarkdown to create the md file, then substitutes local image paths for github URLs so the document will display the images properly on my repo page.

if(FALSE)
{
  # Restart session and run code chunk below to build the markdown file
  library(here)
  library(rmarkdown)

  # make the markdown document and delete the unwanted html
  path.input = here('vignettes/meuse_vignette.R')
  path.output = here('vignettes/meuse_vignette.md')
  path.garbage = here('vignettes/meuse_vignette.html')
  rmarkdown::render(path.input, clean=TRUE, output_file=path.output)
  unlink(path.garbage)

  # substitute local file paths for image files with URLs on github
  md.github = gsub('https://github.com/deankoch/pkern/blob/main', 'https://github.com/deankoch/pkern/blob/main', readLines(path.output))
  writeLines(md.github, path.output)
}

deankoch/pkern documentation built on Oct. 26, 2023, 8:54 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com