pick.from.points: Pick Variable from Spatial Dataset
In RSAGA: SAGA Geoprocessing and Terrain Analysis

pick.from.points

R Documentation

Pick Variable from Spatial Dataset

Description

These functions pick (i.e. interpolate without worrying too much about theory) values of a spatial variables from a data stored in a data.frame, a point shapefile, or an ASCII or SAGA grid, using nearest neighbor or kriging interpolation. pick.from.points and ⁠[internal.]pick.from.ascii.grid⁠ are the core functions that are called by the different wrappers.

Usage

pick.from.points(
  data,
  src,
  pick,
  method = c("nearest.neighbour", "krige"),
  set.na = FALSE,
  radius = 200,
  nmin = 0,
  nmax = 100,
  sill = 1,
  range = radius,
  nugget = 0,
  model = vgm(sill - nugget, "Sph", range = range, nugget = nugget),
  log = rep(FALSE, length(pick)),
  X.name = "x",
  Y.name = "y",
  cbind = TRUE
)

pick.from.shapefile(data, shapefile, X.name = "x", Y.name = "y", ...)

pick.from.ascii.grid(
  data,
  file,
  path = NULL,
  varname = NULL,
  prefix = NULL,
  method = c("nearest.neighbour", "krige"),
  cbind = TRUE,
  parallel = FALSE,
  nsplit,
  quiet = TRUE,
  ...
)

pick.from.ascii.grids(
  data,
  file,
  path = NULL,
  varname = NULL,
  prefix = NULL,
  cbind = TRUE,
  quiet = TRUE,
  ...
)

internal.pick.from.ascii.grid(
  data,
  file,
  path = NULL,
  varname = NULL,
  prefix = NULL,
  method = c("nearest.neighbour", "krige"),
  nodata.values = c(-9999, -99999),
  at.once,
  quiet = TRUE,
  X.name = "x",
  Y.name = "y",
  nlines = Inf,
  cbind = TRUE,
  range,
  radius,
  na.strings = "NA",
  ...
)

pick.from.saga.grid(
  data,
  filename,
  path,
  varname,
  prec = 7,
  show.output.on.console = FALSE,
  env = rsaga.env(),
  ...
)

Arguments

`data`	data.frame giving the coordinates (in columns specified by `⁠X.name, Y.name⁠`) of point locations at which to interpolate the specified variables or grid values
`src`	data.frame
`pick`	variables to be picked (interpolated) from `src`; if missing, use all available variables, except those specified by `X.name` and `Y.name`
`method`	interpolation method to be used; uses a partial match to the alternatives `"nearest.neighbor"` (currently the default) and `"krige"`
`set.na`	logical: if a column with a name specified in `pick` already exists in `data`, how should it be dealt with? `set.na=FALSE` (default) only overwrites existing data if the interpolator yields a non-`NA` result; `set.na=TRUE` passes `NA` values returned by the interpolator on to the results data.frame
`radius`	numeric value specifying the radius of the local neighborhood to be used for interpolation; defaults to 200 map units (presumably meters), or, in the functions for grid files, `2.5*cellsize`.
`nmin`	numeric, for `method="krige"` only: see `gstat::krige()` function in package gstat
`nmax`	numeric, for `method="krige"` only: see `gstat::krige()` function in package gstat
`sill`	numeric, for `method="krige"` only: the overall sill parameter to be used for the variogram
`range`	numeric, for `method="krige"` only: the variogram range
`nugget`	numeric, for `method="krige"` only: the nugget effect
`model`	for `method="krige"` only: the variogram model to be used for interpolation; defaults to a spherical variogram with parameters specified by the `range`, `sill`, and `nugget` arguments; see `gstat::vgm()` in package gstat for details
`log`	logical vector, specifying for each variable in `pick` if interpolation should take place on the logarithmic scale (default: `FALSE`)
`X.name`	name of the variable containing the x coordinates
`Y.name`	name of the variable containing the y coordinates
`cbind`	logical: should the new variables be added to the input data.frame (`cbind=TRUE`, the default), or should they be returned as a separate vector or data.frame? `cbind=FALSE`
`shapefile`	point shapefile
`...`	arguments to be passed to `pick.from.points`, and to `internal.pick.from.ascii.grid` in the case of `pick.from.ascii.grid`
`file`	file name (relative to `path`, default file extension `.asc`) of an ASCII grid from which to pick a variable, or an open connection to such a file
`path`	optional path to `file`
`varname`	character string: a variable name for the variable interpolated from grid file `file` in `pick.from.*.grid`; if missing, variable name will be determined from `file`name by a call to `create.variable.name()`
`prefix`	an optional prefix to be added to the `varname`
`parallel`	logical (default: `FALSE`): enable parallel processing; requires additional packages such as (formerly) `doSNOW` or doMC. See example below and `plyr::ddply()`
`nsplit`	split the data.frame `data` in `nsplit` disjoint subsets in order to increase efficiency by using `plyr::ddply()` in package plyr. The default seems to perform well in many situations.
`quiet`	logical: provide information on the progress of grid processing on screen? (only relevant if `at.once=FALSE` and `method="nearest.neighbour"`)
`nodata.values`	numeric vector specifying grid values that should be converted to `NA`; in addition to the values specified here, the nodata value given in the input grid's header will be used
`at.once`	logical: should the grid be read as a whole or line by line? `at.once=FALSE` is useful for processing large grids that do not fit into memory; the argument is currently by default `FALSE` for `method="nearest.neighbour"`, and it currently MUST be `TRUE` for all other methods (in these cases, `TRUE` is the default value); piecewise processing with `at.once=FALSE` is always faster than processing the whole grid `at.once`
`nlines`	numeric: stop after processing `nlines` lines of the input grid; useful for testing purposes
`na.strings`	passed on to `scan()`
`filename`	character: name of a SAGA grid file, default extension `.sgrd`
`prec`	numeric, specifying the number of digits to be used in converting a SAGA grid to an ASCII grid in `pick.from.saga.grid`
`show.output.on.console`	a logical (default: `FALSE`), indicates whether to capture the output of the command and show it on the R console (see `system()`, `rsaga.geoprocessor()`).
`env`	list: RSAGA geoprocessing environment created by `rsaga.env()`

Details

pick.from.points interpolates the variables defined by pick in the src data.frame to the locations provided by the data data.frame. Only nearest neighbour and ordinary kriging interpolation are currently available. This function is intended for 'data-rich' situations in which not much thought needs to be put into a geostatistical analysis of the spatial structure of a variable. In particular, this function is supposed to provide a simple, 'quick-and-dirty' interface for situations where the src data points are very densely distributed compared to the data locations.

pick.from.shapefile is a front-end of pick.from.points for point shapefiles.

pick.from.ascii.grid retrieves data values from an ASCII raster file using either nearest neighbour or ordinary kriging interpolation. The latter may not be possible for large raster data sets because the entire grid needs to be read into an R matrix. Split-apply-combine strategies are used to improve efficiency and allow for parallelization.

The optional parallelization of pick.from.ascii.grid computation requires the use of a parallel backend package such as (formerly) doSNOW or doMC, and the parallel backend needs to be registered before calling this function with parallel=TRUE. The example section provides an example using doSNOW on Windows. I have seen 25-40% reduction in processing time by parallelization in some examples that I ran on a dual core Windows computer.

pick.from.ascii.grids performs multiple pick.from.ascii.grid calls. File path and prefix arguments may be specific to each file (i.e. each may be a character vector), but all interpolation settings will be the same for each file, limiting the flexibility a bit compared to individual pick.from.ascii.grid calls by the user. pick.from.ascii.grids currently processes the files sequentially (i.e. parallelization is limited to the pick.from.ascii.grid calls within this function).

pick.from.saga.grid is the equivalent to pick.from.ascii.grid for SAGA grid files. It simply converts the SAGA grid file to a (temporary) ASCII raster file and applies pick.from.ascii.grid.

internal.pick.from.ascii.grid is an internal 'workhorse' function that by itself would be very inefficient for large data sets data. This function is called by pick.from.ascii.grid, which uses a split-apply-combine strategy implemented in the plyr package.

Value

If cbind=TRUE, columns with the new, interpolated variables are added to the input data.frame data.

If cbind=FALSE, a data.frame only containing the new variables is returned (possibly coerced to a vector if only one variable is processed).

Note

method="krige" requires the gstat package.

pick.from.shapefile requires the shapefiles package.

The nearest neighbour interpolation currently randomly breaks ties if pick.from.points is used, and in a deterministic fashion (rounding towards greater grid indices, i.e. toward south and east) in the grid functions.

Author(s)

Alexander Brenning

References

Brenning, A. (2008): Statistical geocomputing combining R and SAGA: The example of landslide susceptibility analysis with generalized additive models. In: J. Boehner, T. Blaschke, L. Montanarella (eds.), SAGA - Seconds Out (= Hamburger Beitraege zur Physischen Geographie und Landschaftsoekologie, 19), 23-32.

Examples

## Not run: 
# assume that 'dem' is an ASCII grid and d a data.frame with variables x and y
pick.from.ascii.grid(d, "dem")
# parallel processing on Windows using the doSNOW package:
# ---outdated - doSNOW has been superseded---
## require(doSNOW)
## registerDoSNOW(cl <- makeCluster(2, type = "SOCK")) # DualCore processor
## pick.from.ascii.grid(d, "dem", parallel = TRUE)
# produces two (ignorable) warning messages when using doSNOW
# typically 25-40% faster than the above on my DualCore notebook
## stopCluster(cl)

## End(Not run)

## Not run: 
# use the meuse data for some tests:
require(gstat)
data(meuse)
data(meuse.grid)
meuse.nn = pick.from.points(data=meuse.grid, src=meuse,
    pick=c("cadmium","copper","elev"), method="nearest.neighbour")
meuse.kr = pick.from.points(data=meuse.grid, src=meuse,
    pick=c("cadmium","copper","elev"), method="krige", radius=100)
# it does make a difference:
plot(meuse.kr$cadmium,meuse.nn$cadmium)
plot(meuse.kr$copper,meuse.nn$copper)
plot(meuse.kr$elev,meuse.nn$elev)

## End(Not run)

RSAGA documentation built on April 3, 2025, 6:48 p.m.