scps: Spatially Correlated Poisson Sampling

View source: R/scps.R

scpsR Documentation

Spatially Correlated Poisson Sampling

Description

Selects spatially balanced samples with prescribed inclusion probabilities from a finite population using Spatially Correlated Poisson Sampling (SCPS).

Usage

scps(prob, x, rand = NULL, type = "kdtree2", bucketSize = 50, eps = 1e-12)

lcps(prob, x, type = "kdtree2", bucketSize = 50, eps = 1e-12)

Arguments

prob

A vector of length N with inclusion probabilities, or an integer > 1. If an integer n, then the sample will be drawn with equal probabilities n / N.

x

An N by p matrix of (standardized) auxiliary variables. Squared euclidean distance is used in the x space.

rand

A vector of length N with random numbers. If this is supplied, the decision of each unit is taken with the corresponding random number. This makes it possible to coordinate the samples.

type

The method used in finding nearest neighbours. Must be one of "kdtree0", "kdtree1", "kdtree2", and "notree".

bucketSize

The maximum size of the terminal nodes in the k-d-trees.

eps

A small value used to determine when an updated probability is close enough to 0.0 or 1.0.

Details

If prob sum to an integer n, a fixed sized sample (n) will be produced. The implementation uses the maximal weight strategy, as specified in Grafström (2012).

Coordinated SCPS

If rand is supplied, coordinated SCPS will be performed. The algorithm for coordinated SCPS differs from the SCPS algorithm, as uncoordinated SCPS chooses a unit to update randomly, whereas coordinated SCPS traverses the units in the supplied order. This has a small impact on the efficiency of the algorithm for coordinated SCPS.

Locally Correlated Poisson Sampling (LCPS)

The method differs from SCPS as LPM1 differs from LPM2. In each step of the algorithm, the unit with the smallest updating distance is chosen as the deciding unit.

Value

A vector of selected indices in 1,2,...,N.

Functions

  • lcps():

k-d-trees

The types "kdtree" creates k-d-trees with terminal node bucket sizes according to bucketSize.

  • "kdtree0" creates a k-d-tree using a median split on alternating variables.

  • "kdtree1" creates a k-d-tree using a median split on the largest range.

  • "kdtree2" creates a k-d-tree using a sliding-midpoint split.

  • "notree" does a naive search for the nearest neighbour.

References

Friedman, J. H., Bentley, J. L., & Finkel, R. A. (1977). An algorithm for finding best matches in logarithmic expected time. ACM Transactions on Mathematical Software (TOMS), 3(3), 209-226.

Maneewongvatana, S., & Mount, D. M. (1999, December). It’s okay to be skinny, if your friends are fat. In Center for geometric computing 4th annual workshop on computational geometry (Vol. 2, pp. 1-8).

Grafström, A. (2012). Spatially correlated Poisson sampling. Journal of Statistical Planning and Inference, 142(1), 139-147.

Prentius, W. (2023). Locally correlated Poisson sampling. Environmetrics, e2832.

See Also

Other sampling: cube(), hlpm2(), lcube(), lpm()

Examples

## Not run: 
set.seed(12345);
N = 1000;
n = 100;
prob = rep(n/N, N);
x = matrix(runif(N * 2), ncol = 2);
s = scps(prob, x);
plot(x[, 1], x[, 2]);
points(x[s, 1], x[s, 2], pch = 19);

set.seed(12345);
prob = c(0.2, 0.25, 0.35, 0.4, 0.5, 0.5, 0.55, 0.65, 0.7, 0.9);
N = length(prob);
x = matrix(runif(N * 2), ncol = 2);
ep = rep(0L, N);
r = 10000L;
for (i in seq_len(r)) {
  s = scps(prob, x);
  ep[s] = ep[s] + 1L;
}
print(ep / r);

set.seed(12345);
N = 1000;
n = 100;
prob = rep(n/N, N);
x = matrix(runif(N * 2), ncol = 2);
scps(prob, x);
lcps(prob, x);

## End(Not run)


BalancedSampling documentation built on May 29, 2024, 10:25 a.m.