Functions for reconstructing (predicting) environmental values from biological assemblages using the Modern Analogue Technique (MAT), also know as k nearest neighbours (k-NN).

```
MAT(y, x, dist.method="sq.chord", k=5, lean=TRUE)
## S3 method for class 'MAT'
predict(object, newdata=NULL, k=object$k, sse=FALSE,
nboot=100, match.data=TRUE, verbose=TRUE, lean=TRUE,
...)
## S3 method for class 'MAT'
performance(object, ...)
## S3 method for class 'MAT'
crossval(object, k=object$k, cv.method="lgo",
verbose=TRUE, ngroups=10, nboot=100, h.cutoff=0, h.dist=NULL, ...)
## S3 method for class 'MAT'
print(x, ...)
## S3 method for class 'MAT'
summary(object, full=FALSE, ...)
## S3 method for class 'MAT'
plot(x, resid=FALSE, xval=FALSE, k=5, wMean=FALSE, xlab="",
ylab="", ylim=NULL, xlim=NULL, add.ref=TRUE,
add.smooth=FALSE, ...)
## S3 method for class 'MAT'
residuals(object, cv=FALSE, ...)
## S3 method for class 'MAT'
fitted(object, ...)
## S3 method for class 'MAT'
screeplot(x, ...)
paldist(y, dist.method="sq.chord")
paldist2(y1, y2, dist.method="sq.chord")
```

`y` , `y1` , `y2` |
data frame containing biological data. |

`newdata` |
data frame containing biological data to predict from. |

`x` |
a vector of environmental values to be modelled, matched to y. |

`dist.method` |
dissimilarity coefficient. See details for options. |

`match.data` |
logical indicate the function will match two species datasets by their column names. You should only set this to |

`k` |
number of analogues to use. |

`lean` |
logical to remove items form the output. |

`object` |
an object of class |

`resid` |
logical to plot residuals instead of fitted values. |

`xval` |
logical to plot cross-validation estimates. |

`wMean` |
logical to plot weighted-mean estimates. |

`xlab` , `ylab` , `xlim` , `ylim` |
additional graphical arguments to |

`add.ref` |
add 1:1 line on plot. |

`add.smooth` |
add loess smooth to plot. |

`cv.method` |
cross-validation method, either "lgo", "bootstrap" or "h-block". |

`verbose` |
logical to show feedback during cross-validation. |

`nboot` |
number of bootstrap samples. |

`ngroups` |
number of groups in leave-group-out cross-validation, or a vector contain leave-out group menbership. |

`h.cutoff` |
cutoff for h-block cross-validation. Only training samples greater than |

`h.dist` |
distance matrix for use in h-block cross-validation. Usually a matrix of geographical distances between samples. |

`sse` |
logical indicating that sample specific errors should be calculated. |

`full` |
logical to indicate a full or abbreviated summary. |

`cv` |
logical to indicate model or cross-validation residuals. |

`...` |
additional arguments. |

`MAT`

performs an environmental reconstruction using the modern analogue technique. Function `MAT`

takes a training dataset of biological data (species abundances) `y`

and a single associated environmental variable `x`

, and generates a model of closest analogues, or matches, for the modern data data using one of a number of dissimilarity coefficients. Options for the latter are: "euclidean", "sq.euclidean", "chord", "sq.chord", "chord.t", "sq.chord.t", "chi.squared", "sq.chi.squared", "bray". "chord.t" are true chord distances, "chord" refers to the the variant of chord distance using in palaeoecology (e.g. Overpeck et al. 1985), which is actually Hellinger's distance (Legendre & Gallagher 2001). There are various help functions to plot and extract information from the results of a `MAT`

transfer function. The function `predict`

takes `MAT`

object and uses it to predict environmental values for a new set of species data, or returns the fitted (predicted) values from the original modern dataset if `newdata`

is `NULL`

. Variables are matched between training and newdata by column name (if `match.data`

is `TRUE`

). Use `compare.datasets`

to assess conformity of two species datasets and identify possible no-analogue samples.

`MAT`

has methods `fitted`

and `rediduals`

that return the fitted values (estimates) and residuals for the training set, `performance`

, which returns summary performance statistics (see below), and `print`

and `summary`

to summarise the output. `MAT`

also has a `plot`

method that produces scatter plots of predicted vs observed measurements for the training set.

Function `screeplot`

displays the RMSE of prediction for the training set as a function of the number of analogues (k) and is useful for estimating the optimal value of k for use in prediction.

`paldist`

and `paldist1`

are helper functions though they may be called directly. `paldist`

takes a single data frame or matrix returns a distance matrix of the row-wise dissimilarities. `paldist2`

takes two data frames of matrices and returns a matrix of all row-wise dissimilarities between the two datasets.

Function `MAT`

returns an object of class `MAT`

which contains the following items:

`call` |
original function call to |

`fitted.vales` |
fitted (predicted) values for the training set, as the mean and weighted mean (weighed by dissimilarity) of the k closest analogues. |

`diagnostics` |
standard deviation of the k analogues and dissimilarity of the closest analogue. |

`dist.n` |
dissimilarities of the k closest analogues. |

`x.n` |
environmental values of the k closest analogues. |

`match.name` |
column names of the k closest analogues. |

`x` |
environmental variable used in the model. |

`dist.method` |
dissimilarity coefficient. |

`k` |
number of closest analogues to use. |

`y` |
original species data. |

`cv.summary` |
summary of the cross-validation (not yet implemented). |

`dist` |
dissimilarity matrix (returned if |

If function `predict`

is called with `newdata=NULL`

it returns a matrix of fitted values from the original training set analysis. If `newdata`

is not `NULL`

it returns list with the following named elements:

`fit` |
predictions for |

`diagnostics` |
standard deviations of the k closest analogues and distance of closest analogue. |

`dist.n` |
dissimilarities of the k closest analogues. |

`x.n` |
environmental values of the k closest analogues. |

`match.name` |
column names of the k closest analogues. |

`dist` |
dissimilarity matrix (returned if |

If sample specific errors were requested the list will also include:

`fit.boot` |
mean of the bootstrap estimates of newdata. |

`v1` |
standard error of the bootstrap estimates for each new sample. |

`v2` |
root mean squared error for the training set samples, across all bootstram samples. |

`SEP` |
standard error of prediction, calculated as the square root of v1^2 + v2^2. |

Functions `paldist`

and `paldist2`

return dissimilarity matrices. `performance`

returns a matrix of performance statistics for the MAT model, with columns for RMSE, R2, mean and max bias for each number of analogues up to k. See `performance`

for a description of the output.

Steve Juggins

Legendre, P. & Gallagher, E. (2001) Ecologically meaningful transformations for ordination of species. *Oecologia*, **129**, 271-280.

Overpeck, J.T., Webb, T., III, & Prentice, I.C. (1985) Quantitative interpretation of fossil pollen spectra: dissimilarity coefficients and the method of modern analogs. *Quaternary Research*, **23**, 87-108.

`WAPLS`

, `WA`

, `performance`

, and `compare.datasets`

for diagnostics.

```
# pH reconstruction of the RLGH, Scotland, using SWAP training set
# shows recent acidification history
data(SWAP)
data(RLGH)
fit <- MAT(SWAP$spec, SWAP$pH, k=20) # generate results for k 1-20
#examine performance
performance(fit)
print(fit)
# How many analogues?
screeplot(fit)
# do the reconstruction
pred.mat <- predict(fit, RLGH$spec, k=10)
# plot the reconstruction
plot(RLGH$depths$Age, pred.mat$fit[, 1], type="b", ylab="pH", xlab="Age")
#compare to a weighted average model
fit <- WA(SWAP$spec, SWAP$pH)
pred.wa <- predict(fit, RLGH$spec)
points(RLGH$depths$Age, pred.wa$fit[, 1], col="red", type="b")
legend("topleft", c("MAT", "WA"), lty=1, col=c("black", "red"))
```

