ngnn: NGNN, Nonlinear Gradient Nearest Neighbors
In phytomosaic/ngnn: ngnn: Nonlinear Gradient Nearest Neighbors

Description Usage Arguments Details Value References See Also Examples

Predict community composition based on individualistic but possibly coordinated species responses

ngnn(spe, idi, ido, nm, nmulti = 5, pa = FALSE, pr = FALSE,
  method = "bray", thresh = 0.9, neighb = 5, maxits = 999, k = 1,
  ...)

gnn(obj, k, ...)

## S3 method for class 'ngnn'
summary(obj, ...)

ngnn_plot_nco(obj, type = "points", ocol = 2, cexn = NULL, ...)

ngnn_get_spp(obj, ...)

ngnn_plot_spp(obj, pick = NULL, zlim, nm, ...)

`spe`	species dataframe, rows = sample units and columns = species
`idi`	in-sample predictor dataframe, rows must match 'spe'
`ido`	out-of-sample predictor dataframe, where rows = new sample units
`nm`	string vector specifying predictors to include (max 2)
`nmulti`	number of random starts in nonparametric regression
`pa`	logical, convert to presence/absence?
`pr`	logical, use 'beals' for probs of joint occurrence?
`method`	distance measure for all ordinations
`thresh`	numeric threshold for stepacross dissimilarities
`neighb`	number of adjacent distances considered in NCOpredict
`maxits`	number of NCOpredict iterations
`k`	the maximum number of nearest neighbors to find in NCO gradient space
`...`	additional arguments passed to function
`obj`	object of class 'ngnn' from call to `ngnn`
`type`	either 'points' or 'text' for plotting
`ocol`	color value or vector for out-of-sample points
`cexn`	expansion factor for points and text
`pick`	variable to query
`zlim`	vector of length 2, giving vertical limits for plots

When given a set of sample units where species abundances and corresponding predictor values are both known, how does one infer which species should appear in 'new' sample units where only the predictors are known? NGNN (nonlinear gradient nearest neighbors) approaches the problem of species imputation in the following way:

Regress species individualistically on predictors ->
Feed fitted values to NMS ordination ->
Find nearest neighbors in ordination space, and assign species.

A more detailed description:
First, define an in-sample set of sample units where species abundances and corresponding predictor values are both known, as well as an out-of-sample set where only predictor values are known. Second, use npmr to perform NPMR regression (McCune 2006) of both in-sample and out-of-sample sample units; use nco to feed NPMR fitted values to NMS ordination (Kruskal 1964); this is nonparametric constrained ordination (NCO; McCune and Root 2012; McCune and Root 2017). A follow-up step with nco_predict allows calculating predicted NCO scores for the out-of-sample set even though species compositions are not strictly known. Finally, use gnn to identify the in-sample Euclidean nearest neighbor of each out-of-sample point in the NCO ordination space, and assign the (possibly averaged) species composition of that neighbor to the point in question. This retains realistic communities of co-occurring species, since they've already been observed in at least one other sample unit. The entire process is summarized in the wrapper function ngnn.

Function ngnn finds the k nearest nighbors in the original ordination space; higher values of k probably work better with many original points, and with points more evenly distributed in ordination space.

List of class 'ngnn' with elements:

spe = original species matrix
id_i = in-sample predictors used in NPMR
nm = which predictors were used
nm_len = their length
iYhat = in-sample fitted values from NPMR
oYhat = out-of-sample fitted values from NPMR
np_stat = fit, tolerances and results of signif tests
np_mods = list of NPMR regression models for every species
np_bw = list of NPMR bandwidths for every species,
scr_i = environmentally constrained site scores from NCO
NCO_model = the NCO ordination model
R2_internal = squared correlation of Dhat and Dz from NCO
R2_enviro = squared correlation of D and Dz from NCO
R2_partial = squared correlation of Dz and each predictor from NCO
Axis_tau = rank correlation of each axis and predictor from NCO
nmsp = a list of 5 items describing predicted NCO scores (see nco_predict)
flagax1,flagax2 = flags which nco_predict values were off axes
nn = identifies the gradient nearest neighbor for predicted vals
spp_imputed = inferred out-of-sample species compositions

Kruskal, J. B. 1964. Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika 29: 1-27.

McCune, B. 2006. Non-parametric habitat models with automatic interactions. Journal of Vegetation Science 17(6):819-830.

McCune, B., and H. T. Root. 2012. Nonparametric constrained ordination. 97th ESA Annual Meeting. Ecological Society of America, Portland, OR.

McCune, B., and H. T. Root. 2017. Nonparametric constrained ordination to describe community and species relationships to environment. Unpublished ms.

Ohmann, J.L., and M.J. Gregory. 2002. Predictive mapping of forest composition and structure with direct gradient analysis and nearest-neighbor imputation in coastal Oregon, U.S.A. Canadian Journal of Forest Research 32:725-741.

npmr for NPMR, nco for NCO, nco_predict for predictive NCO, and gnn for the core function of NGNN.

# set up
set.seed(978)
require(vegan)
data(varespec, varechem)
spe <- varespec ; id  <- varechem
i   <- sample(1:nrow(spe), size=floor(0.75*nrow(spe))) # sample
spe <- spe[i, ]          # in-sample species
idi <- id[i, ]           # in-sample predictors
ido <- id[-i, ]          # out-of-sample predictors
nm  <- c('Al', 'K')      # select 1 or 2 gradients of interest

# basic usage
res <- ngnn(spe, idi, ido, nm, nmulti=5, method='bray',
            thresh=0.90, neighb=5, maxits=999, k=1)
summary(res)
str(res, 1)

# plot the species response curves
ngnn_plot_spp(res, pick=1:9, nm=nm)

# plot the NCO gradient space
ngnn_plot_nco(res)

# predicted (imputed) species composition for out-of-sample sites
ngnn_get_spp(res)

# how close were predicted species composition to 'true' values?
spe_append <- rbind(spe, res$spp_imputed)   # append to existing
heatmap(t(as.matrix(spe_append)), Rowv=NA, Colv=NA)

# check composition of 'hold-out' data
heatmap(t(as.matrix(varespec[-i,])), Rowv=NA, Colv=NA)
# ... vs new species from NGNN
heatmap(t(as.matrix(res$spp_imputed)), Rowv=NA, Colv=NA)

# Prediction error: Root Mean Square Error
`rmse` <- function(y, ypred, ...){
     sqrt(mean((y-ypred)^2, ...))
}
rmse(varespec[-i,], res$spp_imputed)


## can do entire process manually, avoiding the wrapper function:
# NPMR
res_npmr <- npmr(spe, idi, ido, nm, nmulti=5)
# NCO (NMS)
res_nco  <- nco(res_npmr, method='bray', thresh=0.90)
# NCOpredict (NMSpredict)
res_nmsp <- nco_predict(res_nco, method='bray', neighb=5,
                        maxits=999)
# GNN
res_gnn  <- gnn(obj=res_nmsp, k=1)
summary(res_gnn)