sk_sample_pt: Sub-grid point sampler for grid data
In snapKrig: Fast Kriging and Geostatistics on Grids with Kronecker Covariance

sk_sample_pt

R Documentation

Sub-grid point sampler for grid data

Description

Sample n locations from the non-NA points in the input grid g, optionally using them as centers to place n sub-grids of the specified size and resolution.

Usage

sk_sample_pt(
  g,
  n = 100,
  lag_max = 0,
  up = 0,
  over = FALSE,
  sk_out = TRUE,
  seed = NULL
)

Arguments

`g`	an sk grid object or any other object accepted by `sk`
`n`	integer > 0, the maximum number of center points to sample
`lag_max`	integer, Moore neighborhood radius (ie the maximum queen's distance)
`up`	integer, the up-scaling factor for sampling sub-grids of `g`
`over`	logical, indicates to allow overlapping sub-grids (when they can be avoided)
`sk_out`	logical, if TRUE (the default) the function returns an sk grid
`seed`	integer seed, passed to `base::set.seed`

Details

When sk_out=TRUE (the default), the function returns an sk grid containing the sampled points. If multiple samples are requested, a multi-layer grid is returned. When sk_out=FALSE, the function returns the vector index of the sampled grid points, or if multiple samples are requested, a list of vectors.

By default the function simply draws a sample of n locations (uniformly at random) from the non-NA points in the input grid g.

When lag_max > 1, the function instead returns the the Moore neighbourhood of radius lag_max around each of the sample points (including the center point). These sub-grids are returned as distinct layers (or list entries, if sk_out=FALSE). Their resolution can be coarsened (up-scaled) by increasing up from its default 0. up must either be 0 or else a positive integer that evenly divides lag_max (see sk_rescale).

For a given up, the grid g can be partitioned into (up+1)^2 distinct non-overlapping sub-grids. When over=FALSE (the default), the function apportions its n point samples as evenly as possible among these disjoint subsets. This ensures that if n is less than or equal to (up+1)^2, and there are no NAs, there can be no repetition (overlap) of points in the returned sub-grids.

Note that with the default sk_out=TRUE, lag_max > 1 is only supported for complete grids g. This is because with missing data it is hard (and sometimes impossible) to ensure that the Moore neighborhoods have identical NA structure (and this is a requirement for multi-layer sk grids).

Note also that multi-layer sk grids are not fully supported yet. If you pass a multi-layer grid to g, the function returns results for the first layer only.

Value

If lag_max == 0 (the default), the function returns a single-layer sk grid when sk_out=TRUE, or else the sample indices in g as a length-n integer vector. If lag_max > 0, the function returns a multi-layer sk grid sk_out=TRUE, or else a list of n vectors indexing the sampled points in each sub-grid of g.

Examples

# define an empty grid
g_empty = sk(gdim = c(100, 100))

# get an ordinary random sample with default settings
g_sample = sk_sample_pt(g_empty)
plot(g_sample)

# same call with index return mode
idx_sample = sk_sample_pt(g_empty, sk_out=FALSE)
str(idx_sample)

# reduce or increase number of center points from default 100
g_sample = sk_sample_pt(g_empty, n=10)
plot(g_sample)

# add some data to g and repeat
pars = sk_pars(g_empty)
pars$eps = 1e-6
g = sk_sim(g_empty, pars)
plot(g)
g_sample = sk_sample_pt(g)
plot(g_sample)

# sample 3 subgrids from Moore neighbourhoods of radius 6 (index output mode)
n = 3
idx_sample = sk_sample_pt(g, n=n, lag_max=6L, sk_out=FALSE, seed=42)

# plot each list element a different color
group_sample = rep(0L, length(g))
for(i in seq(n)) group_sample[ idx_sample[[i]] ] = i
sk_plot(group_sample, dim(g), breaks=c('not sampled', seq(n)), zlab='sub-grid')

# plot all the sub-grid data
g_plot = g_empty
g_plot[unlist(idx_sample)] = g[unlist(idx_sample)]
plot(g_plot)

# default sk_out=TRUE returns them as multi-layer grid object
g_sample = sk_sample_pt(g, n=n, lag_max=6L, seed=42)
plot(g_sample, layer=1, zlim=range(g_plot, na.rm=TRUE))
plot(g_sample, layer=2, zlim=range(g_plot, na.rm=TRUE))
plot(g_sample, layer=3, zlim=range(g_plot, na.rm=TRUE))



# When up > 0 the function will attempts to avoid overlap whenever possible
up = 1
n = (up+1)^2 # to get disjoint results n must be less than or equal to (up+1)^2
lag_max = 10 * (up+1) # vary to get larger/smaller subsets. max allowable: min(gdim)/2
idx_sample = sk_sample_pt(g, n=n, up=up, lag_max=lag_max, sk_out=FALSE)
idx_overlap = rowSums( sapply(idx_sample, function(i) seq_along(g) %in% i) )

# plot each list element a different color
group_sample = rep(0L, length(g))
for(i in seq(n)) group_sample[ idx_sample[[i]] ] = i
sk_plot(group_sample, dim(g), breaks=c('not sampled', seq(n)), zlab='sub-grid')

# no overlap
sk_plot(as.integer(idx_overlap), dim(g), zlab='times sampled')

# compare with over=TRUE (usually results in overlap - try running a few times)
idx_sample_compare = sk_sample_pt(g, n=n, up=up, lag_max=lag_max, over=TRUE, sk_out=FALSE)
idx_overlap_compare = rowSums( sapply(idx_sample_compare, function(i) seq_along(g) %in% i) )
sk_plot(as.integer(idx_overlap_compare), dim(g), zlab='times sampled')

# incomplete input data example
g_sample = sk_sample_pt(g, n=10)
sk_plot(g_sample)

# draw a sample of center points and indicate sub-grids in color
idx_sample = sk_sample_pt(g_sample, n=10, lag_max=6, up=1, over=FALSE, sk_out=FALSE)
g_sample_grid = g_empty
g_sample_grid[] = rep('not sampled', length(g_empty))
g_sample_grid[unlist(idx_sample)] = 'sub-grid sample'
plot(g_sample_grid)

snapKrig documentation built on May 31, 2023, 6:34 p.m.