nppen: Non-Parametric Probabilistic Ecological Niche
In jiho/nppen: Non-Parametric Probabilistic Ecological Niche

Description Usage Arguments Value References Examples

Compute the probability of presence of a taxon based on the environment.

1	nppen(X, Y, fast = TRUE, cores = 1)

`X`	Matrix or data.frame containing the environmental data at locations of presence (possibly rasterized using `rasterize`).
`Y`	Matrix or data.frame containing the environmental data at the points where the probability of presence needs to be predicted (i.e., usually on a grid).
`fast`	When TRUE, a single total covariance matrix is used for each element of Y, instead of one matrix per permutation (i.e. per line of X). This is immensely faster and leads to only marginal (<2%) changes in the predicted probabilities as long as X is large and points of Y are distributed within the range covered by X (which they should be for the model to make sense anyway).
`cores`	Number of computing cores to use; the parallelisation is done in Y because often Y is much larger than X and, when `fast=TRUE` the size of X is not too much of a problem.

A vector of probabilities of length nrow(Y).

Beaugrand, G., Lenoir, S., Ibañez, F., and Manté, C. (2011) _A new model to assess the probability of occurrence of a species, based on presence-only data_. Marine Ecology Progress Series, *424*, 175-190. http://www.int-res.com/abstracts/meps/v424/p175-190/

# define environmental data
set.seed(1)
X <- data.frame(temp=rnorm(200, 15, 3), sal=rnorm(200, 37, 1))
# define target points: one well inside the niche, one on the border
Y <- data.frame(temp=c(15, 18.7), sal=c(37, 38.7))
# represent both in environmental space
plot(X)
points(Y, col="red")
# compute probability of presence
nppen(X, Y, fast=FALSE)
nppen(X, Y, fast=TRUE)
# higher probability for the first point, as expected

# rasterize X to make is smaller (hence faster to compute) and avoid spatial
# bias in the underlying data (e.g. preferential sampling in a give area)
X_binned <- rasterize(X)
nrow(X)
nrow(X_binned)
plot(sal ~ temp, X_binned)
points(Y, col="red")
# compute probability of presence
nppen(subset(X_binned, select=-n), Y, fast=FALSE) # NB: remove column `n`
# simply binning loose some information regarding where the centre of the
# niche is and the niche appear more spread (hence higher probability for
# the second point here).

# only keep common observations (= observed more than once)
X_binned_common <- X_binned[X_binned$n > 1,]
X_binned_common <- subset(X_binned_common, select=-n)
points(X_binned_common, pch=16)
nppen(X_binned_common, Y, fast=FALSE)
# reducing to common observations helps better represent the initial niche

## Not run: # Parallel performance test
set.seed(1)
X <- data.frame(temp=rnorm(1000, 15, 3), sal=rnorm(1000, 37, 1))
# define target points: one well inside the niche, one on the border
Y <- data.frame(temp=rnorm(10000, 15, 2), sal=rnorm(10000, 37, 0.7))
system.time(y <- nppen(X, Y, cores=1))
system.time(y <- nppen(X, Y, cores=4))

## End(Not run)