geoThin: Thin geographic points (mostly) deterministically

View source: R/geoThin.r

geoThinR Documentation

Thin geographic points (mostly) deterministically

Description

This function thins geographic points such that none have nearest neighbors closer than some user-specified distance. The results are almost deterministic (see Details).

Usage

geoThin(x, minDist, longLat = NULL, distFunct = NULL, verbose = FALSE, ...)

Arguments

x

Data frame, matrix, SpatialPoints, or SpatialPointsDataFrame object.

minDist

Numeric. Minimum distance needed between points to retain them. Points falling < this distance will be discarded. If distFunct is distGeo then this should be in the same units as f (see link[geosphere]{distGeo} and related "dist" functions).

longLat

Two-element character list or two-element integer list. If x is a data frame then this should be a character list specifiying the names of the fields in x or a two-element list of integers that correspond to longitude and latitude (in that order). For example, c('long', 'lat') or c(1, 2). If x is a matrix then this is a two-element list indicating the column numbers in x that represent longitude and latitude (for example, c(1, 2)). If x is a SpatialPoints or a SpatialPointsDataFrame object then this argument is ignored.

distFunct

Either a function or NULL. If NULL then distGeo is used to calculate distances. More accurate distances can be obtained by using other functions (see distHaversine and references therein). Alternatively, a custom function can be used so long as its first argument is a 2-column numeric matrix with one row for the x- and y-coordinates of a single point and its second argument is a two-column numeric matrix with one or more rows of other points.

verbose

Logical. If TRUE then display progress.

...

Arguments to pass to distFunct.

Details

The procedure for removing points is as follows:

  • Find points with largest number of neighbors (< minDist away). If just one such point exists, remove it, but if there is more than one then...

  • Of these find the points with the closest neighbor within minDist. If just one such point exists, remove it, but if there is more than one then...

  • Of these find the point that is closest to the centroid of all non-removed points. If just one such point exists, remove it, but if there is more than one...

  • Of these find the point that has the smallest median distance to all points (even if > minDist). If just one such point exists, remove it, but if there is more than one then...

  • Of these randomly select a point and remove it.

  • Repeat.

Thus the results are deterministic up to the last tie-breaking step.

Value

Object of class x.

See Also

geoThinApprox

Examples

# example using data frame
x <- data.frame(long=c(-90.1, -90.1, -90.15, -90.17, -90.2, -89),
   lat=c(38, 38, 38, 38, 38, 38), point=letters[1:6])
x
geoThin(x, minDist=500, longLat=1:2, verbose=TRUE)
geoThin(x, minDist=5000, longLat=c(1, 2), verbose=TRUE)

# example of potential randomness
set.seed(111)
geoThin(x, minDist=1000, longLat=c(1, 2))
geoThin(x, minDist=1000, longLat=c(1, 2))
geoThin(x, minDist=1000, longLat=c(1, 2))

# example using SpatialPointsDataFrame
data(lemurs)
fulvus <- lemurs[lemurs$species == 'Eulemur fulvus', c('longitude', 'latitude')]
fulvus <- sp::SpatialPointsDataFrame(
		fulvus,
		data=fulvus,
		proj4string=getCRS('wgs84', TRUE)
)

data(mad0)
sp::plot(mad0, main='Madagascar')
points(fulvus, col='red')
thinned <- geoThin(fulvus, 50000)
points(thinned, pch=16)
legend('topright', legend=c('retained', 'discarded'),
col=c('black', 'red'), pch=c(16, 1))

# test to see function works when no points need removed
thinned <- geoThin(fulvus, 200, verbose=TRUE)
sp::plot(mad0, main='Madagascar')
points(fulvus, col='red')
points(thinned, pch=16)
legend('topright', legend=c('retained', 'discarded'),
col=c('black', 'red'), pch=c(16, 1))

adamlilith/enmSdm documentation built on Jan. 6, 2023, 11 a.m.