ipls: Variable selection with interval PLS

View source: R/ipls.R

iplsR Documentation

Variable selection with interval PLS

Description

Applies iPLS algorithm to find variable intervals most important for prediction.

Usage

ipls(
  x,
  y,
  glob.ncomp = 10,
  center = TRUE,
  scale = FALSE,
  cv = list("ven", 10),
  exclcols = NULL,
  exclrows = NULL,
  int.ncomp = glob.ncomp,
  int.num = NULL,
  int.width = NULL,
  int.limits = NULL,
  int.niter = NULL,
  ncomp.selcrit = "min",
  method = "forward",
  x.test = NULL,
  y.test = NULL,
  silent = FALSE,
  full = FALSE,
  cv.scope = "local"
)

Arguments

x

a matrix with predictor values.

y

a vector with response values.

glob.ncomp

maximum number of components for a global PLS model.

center

logical, center or not the data values.

scale

logical, standardize or not the data values.

cv

cross-validation settings (see details).

exclcols

columns of x to be excluded from calculations (numbers, names or vector with logical values).

exclrows

rows to be excluded from calculations (numbers, names or vector with logical values).

int.ncomp

maximum number of components for interval PLS models.

int.num

number of intervals.

int.width

width of intervals.

int.limits

a two column matrix with manual intervals specification.

int.niter

maximum number of iterations (if NULL it will be the smallest of two values: number of intervals and 30).

ncomp.selcrit

criterion for selecting optimal number of components ('min' for minimum of RMSECV).

method

iPLS method ('forward' or 'backward').

x.test

matrix with predictors for test set (by default is NULL, if specified, is used instead of cv).

y.test

matrix with responses for test set.

silent

logical, show or not information about selection process.

full

logical, if TRUE the procedure will continue even if no improvements is observed.

cv.scope

scope for center/scale operations inside CV loop: 'global' — using globally computed mean and std or 'local' — recompute new for each local calibration set.

Details

The algorithm splits the predictors into several intervals and tries to find a combination of the intervals, which gives best prediction performance. There are two selection methods: "forward" when the intervals are successively included, and "backward" when the intervals are successively excluded from a model. On the first step the algorithm finds the best (forward) or the worst (backward) individual interval. Then it tests the others to find the one which gives the best model in a combination with the already selected/excluded one. The procedure continues until no improvements is observed or the maximum number of iteration is reached.

There are several ways to specify the intervals. First of all either number of intervals (int.num) or width of the intervals (int.width) can be provided. Alternatively one can specify the limits (first and last variable number) of the intervals manually with int.limits.

Cross-validation settings, cv, can be a number or a list. If cv is a number, it will be used as a number of segments for random cross-validation (if cv = 1, full cross-validation will be preformed). If it is a list, the following syntax can be used: cv = list('rand', nseg, nrep) for random repeated cross-validation with nseg segments and nrep repetitions or cv = list('ven', nseg) for systematic splits to nseg segments ('venetian blinds').

Value

object of 'ipls' class with several fields, including:

var.selected

a vector with indices of selected variables

int.selected

a vector with indices of selected intervals

int.num

total number of intervals

int.width

width of the intervals

int.limits

a matrix with limits for each interval

int.stat

a data frame with statistics for the selection algorithm

glob.stat

a data frame with statistics for the first step (individual intervals)

gm

global PLS model with all variables included

om

optimized PLS model with selected variables

References

[1] Lars Noergaard at al. Interval partial least-squares regression (iPLS): a comparative chemometric study with an example from near-infrared spectroscopy. Appl.Spec. 2000; 54: 413-419

Examples

library(mdatools)

## forward selection for simdata

data(simdata)
Xc = simdata$spectra.c
yc = simdata$conc.c[, 3, drop = FALSE]

# run iPLS and show results
im = ipls(Xc, yc, int.ncomp = 5, int.num = 10, cv = 4, method = "forward")
summary(im)
plot(im)

# show "developing" of RMSECV during the algorithm execution
plotRMSE(im)

# plot predictions before and after selection
par(mfrow = c(1, 2))
plotPredictions(im$gm)
plotPredictions(im$om)

# show selected intervals on spectral plot
ind = im$var.selected
mspectrum = apply(Xc, 2, mean)
plot(simdata$wavelength, mspectrum, type = 'l', col = 'lightblue')
points(simdata$wavelength[ind], mspectrum[ind], pch = 16, col = 'blue')


svkucheryavski/mdatools documentation built on Aug. 25, 2023, 12:27 p.m.