IKFA | R Documentation |

Functions for reconstructing (predicting) environmental values from biological assemblages using Imbrie & Kipp Factor Analysis (IKFA), as used in palaeoceanography.

```
IKFA(y, x, nFact = 5, IsPoly = FALSE, IsRot = TRUE,
ccoef = 1:nFact, check.data=TRUE, lean=FALSE, ...)
IKFA.fit(y, x, nFact = 5, IsPoly = FALSE, IsRot = TRUE,
ccoef = 1:nFact, lean=FALSE)
## S3 method for class 'IKFA'
predict(object, newdata=NULL, sse=FALSE, nboot=100,
match.data=TRUE, verbose=TRUE, ...)
communality(object, y)
## S3 method for class 'IKFA'
crossval(object, cv.method="loo", verbose=TRUE, ngroups=10,
nboot=100, h.cutoff=0, h.dist=NULL, ...)
## S3 method for class 'IKFA'
performance(object, ...)
## S3 method for class 'IKFA'
rand.t.test(object, n.perm=999, ...)
## S3 method for class 'IKFA'
screeplot(x, rand.test=TRUE, ...)
## S3 method for class 'IKFA'
print(x, ...)
## S3 method for class 'IKFA'
summary(object, full=FALSE, ...)
## S3 method for class 'IKFA'
plot(x, resid=FALSE, xval=FALSE, nFact=max(x$ccoef),
xlab="", ylab="", ylim=NULL, xlim=NULL, add.ref=TRUE,
add.smooth=FALSE, ...)
## S3 method for class 'IKFA'
residuals(object, cv=FALSE, ...)
## S3 method for class 'IKFA'
coef(object, ...)
## S3 method for class 'IKFA'
fitted(object, ...)
```

`y` |
a data frame or matrix of biological abundance data. |

`x` , `object` |
a vector of environmental values to be modelled or an object of class |

`newdata` |
new biological data to be predicted. |

`nFact` |
number of factor to extract. |

`IsRot` |
logical to rotate factors. |

`ccoef` |
vector of factor numbers to include in the predictions. |

`IsPoly` |
logical to include quadratic of the factors as predictors in the regression. |

`check.data` |
logical to perform simple checks on the input data. |

`match.data` |
logical indicate the function will match two species datasets by their column names. You should only set this to |

`lean` |
logical to exclude some output from the resulting models (used when cross-validating to speed calculations). |

`full` |
logical to show head and tail of output in summaries. |

`resid` |
logical to plot residuals instead of fitted values. |

`xval` |
logical to plot cross-validation estimates. |

`xlab` , `ylab` , `xlim` , `ylim` |
additional graphical arguments to |

`add.ref` |
add 1:1 line on plot. |

`add.smooth` |
add loess smooth to plot. |

`cv.method` |
cross-validation method, either "loo", "lgo", "bootstrap" or "h-block". |

`verbose` |
logical to show feedback during cross-validation. |

`nboot` |
number of bootstrap samples. |

`ngroups` |
number of groups in leave-group-out cross-validation, or a vector contain leave-out group menbership. |

`h.cutoff` |
cutoff for h-block cross-validation. Only training samples greater than |

`h.dist` |
distance matrix for use in h-block cross-validation. Usually a matrix of geographical distances between samples. |

`sse` |
logical indicating that sample specific errors should be calculated. |

`rand.test` |
logical to perform a randomisation t-test to test significance of cross validated factors. |

`n.perm` |
number of permutations for randomisation t-test. |

`cv` |
logical to indicate model or cross-validation residuals. |

`...` |
additional arguments. |

Function `IKFA`

performs Imbrie and Kipp Factor Analysis, a form of Principal Components Regrssion (Imbrie & Kipp 1971).

Function `predict`

predicts values of the environemntal variable for `newdata`

or returns the fitted (predicted) values from the original modern dataset if `newdata`

is `NULL`

. Variables are matched between training and newdata by column name (if `match.data`

is `TRUE`

). Use `compare.datasets`

to assess conformity of two species datasets and identify possible no-analogue samples.

`IKFA`

has methods `fitted`

and `rediduals`

that return the fitted values (estimates) and residuals for the training set, `performance`

, which returns summary performance statistics (see below), `coef`

which returns the species coefficients, and `print`

and `summary`

to summarise the output. `IKFA`

also has a `plot`

method that produces scatter plots of predicted vs observed measurements for the training set.

Function `rand.t.test`

performs a randomisation t-test to test the significance of the cross-validated components after van der Voet (1994).

Function `screeplot`

displays the RMSE of prediction for the training set as a function of the number of factors and is useful for estimating the optimal number for use in prediction. By default `screeplot`

will also carry out a randomisation t-test and add a line to scree plot indicating percentage change in RMSE with each component annotate with the p-value from the randomisation test.

Function `IKFA`

returns an object of class `IKFA`

with the following named elements:

`coefficients` |
species coefficients (the updated "optima"). |

`fitted.values` |
fitted values for the training set. |

`call` |
original function call. |

`x` |
environmental variable used in the model. |

`standx` , `meanT sdx` |
additional information returned for a PLSif model. |

Function `crossval`

also returns an object of class `IKFA`

and adds the following named elements:

`predicted` |
predicted values of each training set sample under cross-validation. |

`residuals.cv` |
prediction residuals. |

If function `predict`

is called with `newdata=NULL`

it returns the fitted values of the original model, otherwise it returns a list with the following named elements:

`fit` |
predicted values for |

If sample specific errors were requested the list will also include:

`fit.boot` |
mean of the bootstrap estimates of newdata. |

`v1` |
standard error of the bootstrap estimates for each new sample. |

`v2` |
root mean squared error for the training set samples, across all bootstram samples. |

`SEP` |
standard error of prediction, calculated as the square root of v1^2 + v2^2. |

Function `performance`

returns a matrix of performance statistics for the IKFA model. See `performance`

, for a description of the summary.

Function `rand.t.test`

returns a matrix of performance statistics together with columns indicating the p-value and percentage change in RMSE with each higher component (see van der Veot (1994) for details).

Steve Juggins

Imbrie, J. & Kipp, N.G. (1971). A new micropaleontological method for quantitative paleoclimatology: application to a Late Pleistocene Caribbean core. In *The Late Cenozoic Glacial Ages* (ed K.K. Turekian), pp. 77-181. Yale University Press, New Haven.

van der Voet, H. (1994) Comparing the predictive accuracy of models uing a simple randomization test. *Chemometrics and Intelligent Laboratory Systems*, **25**, 313-323.

`WA`

, `MAT`

, `performance`

, and `compare.datasets`

for diagnostics.

```
data(IK)
spec <- IK$spec
SumSST <- IK$env$SumSST
core <- IK$core
fit <- IKFA(spec, SumSST)
fit
# cross-validate model
fit.cv <- crossval(fit, cv.method="lgo")
# How many components to use?
screeplot(fit.cv)
#predict the core
pred <- predict(fit, core, npls=2)
#plot predictions - depths are in rownames
depth <- as.numeric(rownames(core))
plot(depth, pred$fit[, 2], type="b")
# fit using only factors 1, 2, 4, & 5
# and using polynomial terms
# as Imbrie & Kipp (1971)
fit2 <- IKFA(spec, SumSST, ccoef=c(1, 2, 4, 5), IsPoly=TRUE)
fit2.cv <- crossval(fit2, cv.method="lgo")
screeplot(fit2.cv)
## Not run:
# predictions with sample specific errors
# takes approximately 1 minute to run
pred <- predict(fit, core, sse=TRUE, nboot=1000)
pred
## End(Not run)
```

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.