WAPLS | R Documentation |

Functions for reconstructing (predicting) environmental values from biological assemblages using weighted averaging partial least squares (WAPLS) regression and calibration.

```
WAPLS(y, x, npls=5, iswapls=TRUE, standx=FALSE, lean=FALSE,
check.data=TRUE, ...)
WAPLS.fit(y, x, npls=5, iswapls=TRUE, standx=FALSE, lean=FALSE)
## S3 method for class 'WAPLS'
predict(object, newdata=NULL, sse=FALSE, nboot=100,
match.data=TRUE, verbose=TRUE, ...)
## S3 method for class 'WAPLS'
crossval(object, cv.method="loo", verbose=TRUE, ngroups=10,
nboot=100, h.cutoff=0, h.dist=NULL, ...)
## S3 method for class 'WAPLS'
performance(object, ...)
## S3 method for class 'WAPLS'
rand.t.test(object, n.perm=999, ...)
## S3 method for class 'WAPLS'
screeplot(x, rand.test=TRUE, ...)
## S3 method for class 'WAPLS'
print(x, ...)
## S3 method for class 'WAPLS'
summary(object, full=FALSE, ...)
## S3 method for class 'WAPLS'
plot(x, resid=FALSE, xval=FALSE, npls=1,
xlab="", ylab="", ylim=NULL, xlim=NULL, add.ref=TRUE,
add.smooth=FALSE, ...)
## S3 method for class 'WAPLS'
residuals(object, cv=FALSE, ...)
## S3 method for class 'WAPLS'
coef(object, ...)
## S3 method for class 'WAPLS'
fitted(object, ...)
```

`y` |
a data frame or matrix of biological abundance data. |

`x` , `object` |
a vector of environmental values to be modelled or an object of class |

`newdata` |
new biological data to be predicted. |

`iswapls` |
logical logical to perform WAPLS or PLS. Defaults to TRUE = WAPLS. |

`standx` |
logical to standardise x-data in PLS, defaults to FALSE. |

`npls` |
number of pls components to extract. |

`check.data` |
logical to perform simple checks on the input data. |

`match.data` |
logical indicate the function will match two species datasets by their column names. You should only set this to |

`lean` |
logical to exclude some output from the resulting models (used when cross-validating to speed calculations). |

`full` |
logical to show head and tail of output in summaries. |

`resid` |
logical to plot residuals instead of fitted values. |

`xval` |
logical to plot cross-validation estimates. |

`xlab` , `ylab` , `xlim` , `ylim` |
additional graphical arguments to |

`add.ref` |
add 1:1 line on plot. |

`add.smooth` |
add loess smooth to plot. |

`cv.method` |
cross-validation method, either "loo", "lgo", "bootstrap" or "h-block". |

`verbose` |
logical show feedback during cross-validation. |

`nboot` |
number of bootstrap samples. |

`ngroups` |
number of groups in leave-group-out cross-validation, or a vector contain leave-out group menbership. |

`h.cutoff` |
cutoff for h-block cross-validation. Only training samples greater than |

`h.dist` |
distance matrix for use in h-block cross-validation. Usually a matrix of geographical distances between samples. |

`sse` |
logical indicating that sample specific errors should be calculated. |

`rand.test` |
logical to perform a randomisation t-test to test significance of cross validated components. |

`n.perm` |
number of permutations for randomisation t-test. |

`cv` |
logical to indicate model or cross-validation residuals. |

`...` |
additional arguments. |

Function `WAPLS`

performs partial least squares (PLS) or weighted averaging partial least squares (WAPLS) regression. WAPLS was first described in ter Braak and Juggins (1993) and ter Braak et al. (1993) and has since become popular in palaeolimnology for reconstructing (predicting) environmental values from sub-fossil biological assemblages, given a training dataset of modern species and envionmental data. Prediction errors and model complexity (number of components) can be estimated by cross-validation using `crossval`

which implements leave-one out, leave-group-out, or bootstrapping. With leave-group out one may also supply a vector of group memberships for more carefully designed cross-validation experiments.

Function `predict`

predicts values of the environemntal variable for `newdata`

or returns the fitted (predicted) values from the original modern dataset if `newdata`

is `NULL`

. Variables are matched between training and newdata by column name (if `match.data`

is `TRUE`

). Use `compare.datasets`

to assess conformity of two species datasets and identify possible no-analogue samples.

`WAPLS`

has methods `fitted`

and `rediduals`

that return the fitted values (estimates) and residuals for the training set, `performance`

, which returns summary performance statistics (see below), `coef`

which returns the species coefficients, and `print`

and `summary`

to summarise the output. `WAPLS`

also has a `plot`

method that produces scatter plots of predicted vs observed measurements for the training set.

Function `rand.t.test`

performs a randomisation t-test to test the significance of the cross-validated components after van der Voet (1994).

Function `screeplot`

displays the RMSE of prediction for the training set as a function of the number of components and is useful for estimating the optimal number for use in prediction. By default `screeplot`

will also carry out a randomisation t-test and add a line to scree plot indicating percentage change in RMSE with each component annotate with the p-value from the randomisation test.

Function `WAPLS`

returns an object of class `WAPLS`

with the following named elements:

`coefficients` |
species coefficients (the updated "optima"). |

`meanY` |
weighted mean of the environmental variable. |

`iswapls` |
logical indicating whether analysis was WAPLS (TRUE) or PLS (FALSE). |

`T` |
sample scores. |

`P` |
variable (species) scores. |

`npls` |
number of pls components extracted. |

`fitted.values` |
fitted values for the training set. |

`call` |
original function call. |

`x` |
environmental variable used in the model. |

`standx` , `meanT sdx` |
additional information returned for a PLS model. |

Function `crossval`

also returns an object of class `WAPLS`

and adds the following named elements:

`predicted` |
predicted values of each training set sample under cross-validation. |

`residuals.cv` |
prediction residuals. |

If function `predict`

is called with `newdata=NULL`

it returns the fitted values of the original model, otherwise it returns a list with the following named elements:

`fit` |
predicted values for |

If sample specific errors were requested the list will also include:

`fit.boot` |
mean of the bootstrap estimates of newdata. |

`v1` |
standard error of the bootstrap estimates for each new sample. |

`v2` |
root mean squared error for the training set samples, across all bootstram samples. |

`SEP` |
standard error of prediction, calculated as the square root of v1^2 + v2^2. |

Function `performance`

returns a matrix of performance statistics for the WAPLS model. See `performance`

, for a description of the summary.

Function `rand.t.test`

returns a matrix of performance statistics together with columns indicating the p-value and percentage change in RMSE with each higher component (see van der Veot (1994) for details).

Steve Juggins

ter Braak, C.J.F. & Juggins, S. (1993) Weighted averaging partial least squares regression (WA-PLS): an improved method for reconstructing environmental variables from species assemblages. *Hydrobiologia*, **269/270**, 485-502.

ter Braak, C.J.F., Juggins, S., Birks, H.J.B., & Voet, H., van der (1993). Weighted averaging partial least squares regression (WA-PLS): definition and comparison with other methods for species-environment calibration. In *Multivariate Environmental Statistics* (eds G.P. Patil & C.R. Rao), pp. 525-560. Elsevier Science Publishers.

van der Voet, H. (1994) Comparing the predictive accuracy of models uing a simple randomization test. *Chemometrics and Intelligent Laboratory Systems*, **25**, 313-323.

`WA`

, `MAT`

, `performance`

, and `compare.datasets`

for diagnostics.

```
data(IK)
spec <- IK$spec
SumSST <- IK$env$SumSST
core <- IK$core
fit <- WAPLS(spec, SumSST)
fit
# cross-validate model
fit.cv <- crossval(fit, cv.method="loo")
# How many components to use?
rand.t.test(fit.cv)
screeplot(fit.cv)
#predict the core
pred <- predict(fit, core, npls=2)
#plot predictions - depths are in rownames
depth <- as.numeric(rownames(core))
plot(depth, pred$fit[, 2], type="b", ylab="Predicted SumSST", las=1)
# predictions with sample specific errors
## Not run:
pred <- predict(fit, core, npls=2, sse=TRUE, nboot=1000)
pred
## End(Not run)
```

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.