GeoNeighIndex: Spatial, spatio-temporal, or bivariate nearest-neighbour...

View source: R/GeoNeighIndex.R

GeoNeighIndexR Documentation

Spatial, spatio-temporal, or bivariate nearest-neighbour indices

Description

The function returns nearest-neighbour pair indices for spatial, spatio-temporal, or bivariate data. Optionally, a stochastic thinning mechanism can be applied to retain only a subset of the candidate nearest-neighbour pairs.

Usage

GeoNeighIndex(coordx, coordy=NULL, coordz=NULL, coordt=NULL,
              coordx_dyn=NULL, distance="Eucl", neighb=4,
              maxdist=NULL, maxtime=1, radius=1,
              bivariate=FALSE, p_neighb=1,
              thin_method="bernoulli")

Arguments

coordx

A numeric (d \times 2)-matrix or (d \times 3)-matrix. Coordinates on a sphere for a fixed radius radius are passed in longitude/latitude format expressed in decimal degrees.

coordy

A numeric vector giving one dimension of spatial coordinates; optional argument, default is NULL.

coordz

A numeric vector giving one dimension of spatial coordinates; optional argument, default is NULL.

coordt

A numeric vector giving the temporal coordinates. Optional argument, default is NULL; if NULL, a purely spatial random field is expected.

coordx_dyn

A list of numeric coordinate matrices containing spatial coordinates that may vary over time. For spatio-temporal data, the list length must equal the number of time points. For bivariate data with different spatial supports, the list must have length two, with one coordinate matrix for each variable. Optional argument, default is NULL.

distance

String; the name of the spatial distance. Default is "Eucl" (Euclidean distance). See the Section Details of GeoFit.

neighb

Numeric; a positive integer indicating the nearest-neighbour order. In the bivariate case, it may also be a vector of length three, corresponding to within-variable 1, cross-variable, and within-variable 2 neighbourhood sizes.

maxdist

A numeric value denoting the maximum spatial distance; see Details. In the bivariate case, it may also be a vector of length three, corresponding to within-variable 1, cross-variable, and within-variable 2 distance thresholds.

maxtime

A numeric value denoting the maximum temporal distance; see Details.

radius

Numeric; a value indicating the radius of the sphere when using great-circle distances. Default value is 1.

bivariate

Logical; if FALSE (default), the data are interpreted as univariate spatial or spatio-temporal realisations. If TRUE, the data are interpreted as a realization from a bivariate field.

p_neighb

Numeric; a value in (0,1] controlling stochastic thinning. Its interpretation depends on thin_method. If thin_method="bernoulli", p_neighb controls the expected retained fraction of candidate pairs through calibrated independent Bernoulli inclusion probabilities. If thin_method="TargetBalanced", p_neighb is interpreted as a nominal target fraction of candidate pairs for the hard-core greedy TargetBalanceding; the final number of retained pairs is capped by the endpoint-disjoint TargetBalanceding constraint and may be smaller than the target.

thin_method

String; stochastic thinning scheme. Available options are "bernoulli" and "TargetBalanced". The default is "bernoulli". With "bernoulli", the function uses independent Bernoulli thinning, possibly with pair-specific probabilities depending on spatial or temporal lags. With "TargetBalanced", the function uses hard-core greedy TargetBalanceding and retains only endpoint-disjoint pairs.

Details

The function first builds a candidate set of directed nearest-neighbour pairs. For purely spatial data, the candidate set contains spatial nearest-neighbour pairs. For spatio-temporal data, the function includes within-time spatial pairs, pure temporal same-site pairs, and cross-time spatio-temporal pairs up to maxtime. For bivariate data, the function includes within-variable and cross-variable pairs.

If thin_method="bernoulli" and p_neighb<1, candidate pairs are retained independently with calibrated Bernoulli probabilities. These probabilities may depend on pair features, such as spatial or temporal lag, but they are calibrated so that the expected number of retained pairs is approximately p_neighb times the number of candidate pairs.

If thin_method="TargetBalanced", the function applies a hard-core greedy TargetBalanceding procedure. A random permutation of the candidate pairs is scanned, and a pair is retained only if neither endpoint has already been used by a previously retained pair. Therefore no observation index is used in more than one retained pair. In this case p_neighb is not a marginal inclusion probability. It defines the nominal target round(p_neighb d), where d is the number of candidate pairs, but the final number of retained pairs is bounded above by \lfloor n/2 \rfloor, where n is the number of observation indices, and may be smaller due to TargetBalanceding feasibility.

If thin_method="bernoulli" and p_neighb=1, no thinning is applied. If thin_method="TargetBalanced" and p_neighb=1, the function attempts to retain as many pairs as allowed by the hard-core TargetBalanceding constraint; this is not equivalent to no thinning.

Value

Returns a list containing some of the following components:

colidx

Vector of neighbour indices.

rowidx

Vector of target indices.

lags

Vector of spatial distances.

lagt

Vector of temporal distances, returned for spatio-temporal data.

first

Variable indicator for the first component of a bivariate pair, returned for bivariate data.

second

Variable indicator for the second component of a bivariate pair, returned for bivariate data.

maxdist

Maximum spatial distance used to construct the candidate pairs, when available.

neighb

Nearest-neighbour order used to construct the candidate pairs, when available.

n_candidates

Number of candidate pairs before thinning.

n_retained

Number of pairs retained after thinning or TargetBalanceding.

target_retained

Target number of retained pairs. For Bernoulli thinning this is the expected retained count. For hard-core TargetBalanceding this is the capped target count.

target_retained_raw

Uncapped target number of retained pairs, returned for hard-core TargetBalanceding.

TargetBalanceding_cap

Maximum number of endpoint-disjoint pairs, returned for hard-core TargetBalanceding.

effective_fraction

Observed retained fraction, n_retained/n_candidates.

expected_retained

Expected retained count under calibrated Bernoulli thinning.

thin_method

Thinning method used.

p_neighb_interpretation

Text description of how p_neighb was interpreted.

Author(s)

Moreno Bevilacqua, moreno.bevilacqua89@gmail.com, https://sites.google.com/view/moreno-bevilacqua/home, Victor Morales Onate, victor.morales@uv.cl, https://sites.google.com/site/moralesonatevictor/, Christian Caamano-Carrillo, chcaaman@ubiobio.cl, https://www.researchgate.net/profile/Christian-Caamano

Examples

require(GeoModels)
NN <- 400
coords <- cbind(runif(NN), runif(NN))

corrmodel <- "Matern"
scale <- 0.5/3
param <- list(mean=0, sill=1, nugget=0, scale=scale, smooth=0.5)

set.seed(951)
data <- GeoSim(coordx=coords, corrmodel=corrmodel,
               model="Gaussian", param=param)$data

sel <- GeoNeighIndex(coordx=coords, neighb=5)

data1 <- data[sel$colidx]
data2 <- data[sel$rowidx]

## plotting pairs that are neighbours of order 5
plot(data1, data2, xlab="", ylab="",
     main="h-scatterplot, neighb=5")

## Bernoulli thinning: p_neighb controls the expected retained fraction
sel_ber <- GeoNeighIndex(coordx=coords, neighb=5,
                         p_neighb=0.2,
                         thin_method="bernoulli")

data1 <- data[sel_ber$colidx]
data2 <- data[sel_ber$rowidx]

## plotting a random fraction of pairs that are neighbours of order 5 
plot(data1, data2, xlab="", ylab="",
     main="h-scatterplot, neighb=5")

GeoModels documentation built on June 15, 2026, 9:07 a.m.