autopls: autopls

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

Partial least squares regression with backward selection of predictors

Usage

1
2
3
4
autopls (formula, data, testset = NULL, tselect = "none", prep = "none", 
  val = "LOO", scaling = TRUE, stingy = TRUE, verbose = TRUE, 
  backselect = "auto", jt.thresh = 0.1, vip.thresh = 0.2, jump = NA, 
  lower = NA, method = "oscorespls")

Arguments

formula

model formula

data

optional data frame with the data to fit the model

testset

optional vector defining a test set (row indices)

tselect

string specifying the role of the test set in model selection ("none", "passive" or "active", see details)

prep

character. optional preprocessing (only one choice implemented: "bn" (see details)

val

character. Validation used ("CV" or "LOO", see details)

scaling

logical. if TRUE, predictors are scaled by dividing each variable by its standard deviation. This is repeated in all validation steps

stingy

logical. If TRUE, the number of latent vectors is kept low during backward selection

verbose

logical. If TRUE, details about the backward selection processes are reported

backselect

one or more character strings defining the methods used in backwards selection (see details). "no" means no backselection. Defaults to "auto"

jt.thresh

threshold used in predictor selections that are based on jackknife testing (methods based on A1, see details)

vip.thresh

threshold used in predictor selections that are based on VIP (methods based on A2, see details). VIP is scaled to a maximum of 1.

jump

numeric. If a number is given, backward selection starts with a forced reduction of predictors to the given number (see A0 in details). This reduction is based on significance in jackknifing. The argument can be useful in the case of large predictor matrices.

lower

numeric. Backward selection proceeds as long as R2 in validation reaches the given value (experimental, backward selection continues further if models improve in other respects such as decreasing numbers of latent vectors).

method

character string indicating what plsr method to use. autopls works with the orthogonal scores algorithm ("oscorespls") and with the kernel algorithm ("kernelpls").

Details

The autopls function is a wrapper for pls in package pls written by Bjørn-Helge Mevik, Ron Wehrens and Kristian Hovde Liland. As for now, the wrapper can be cited as Schmidtlein et al. (2012). autopls works only for single target variables.

If validation = “CV”, 10-fold cross-validation is performed. If validation = “LOO”, leave-one-out cross-validation is performed. Test set validation takes always place if a test set has been defined. tselect specifies how the test set is used in model selection. "none": just use it for external validation; "passive": use error in external validation for model selection but do not use it for the determination of the number of latent vectors; "active" use the error in external validation for model selection and for the determination of the number of latent vectors. With stingy = TRUE the errors that are used in the selection are measured at a number of latent vectors that depends on the number of observations (1/10 at maximum). Otherwise, the number of latent vectors is chosen where errors approach a first minimum. In order to avoid minor local minima the error values are first smoothed.

Large data matrices: Examine the arguments jump (forced reduction of predictors in the first iteration). Large model objects can be shrinked using the function slim but some functionality (like plotting or change of the number of latent vectors) is lost. Shrinked models can still be used for predictions.

Preprocessing options: The only implemented option is currently "bn", which is a brightness normalization according to Feilhauer et al. (2010).

Several methods for predictor selection are available. In default mode (backselect = "auto") the selection follows an optimization procedure using methods A1 and A3. However, apart from A0 any user-defined combination can be selected using the backselect argument. Note that VIP-based methods (A2, A3, B3 to B6) are meant to be used with the oscorespls method and methods B1 to B6 and C1 do only make sense with sequences of spectral bands or similar sequences of autocorrelated predictors. The methods are coded as follows:

A) Filtering based on thresholds

(A0 and A1) Based on significance, A0 with user-defined threshold (see argument jump); (A2) based on VIP; (A3) based on combined significance and VIP; (A4) removal of 10 % predictors with the lowest significance; (A5) removal of 25 % predictors with the lowest significance.

B) Filtering followed by reduction of autocorrelation

(B1) Filtering based on significance, thinning starting with local maxima in weighted regression coefficients; (B2) filtering based on significance, thinning starting with local maxima in significance; (B3) filtering based on significance, thinning starting with local maxima in VIP; (B4) filtering based on VIP, thinning starting with local maxima in weighted regression coefficients; (B5) filtering based on VIP, thinning starting with local maxima in significance; (B6) filtering based on VIP, thinning starting with local maxima in VIP.

C) Just reduction of autocorrelation

(C1): reduction starting with local maxima in regression coefficients.

Value

An object of class autopls is returned. This equals a pls object and some added objects:

predictors

logical. Vector of predictors that have been or have not been used in the current model

metapls

outcomes of the backward selection process

iterations

models selected during the backward selection process

The $metapls item consists of the following:

current.iter

iteration of the backward selection procedure the current model is based upon

autopls.iter

iteration of the backward selection procedure originally selected by autopls

current.lv

number of latent vectors the current model is based upon

autopls.lv

number of latent vectors originally selected by autopls

lv.history

sequence of number of latent vectors values selected during iterations in backward selection

rmse.history

sequence of root mean squared errors obtained during iterations in backward selection. Errors are reported for calibration and validation. The validation errors are also reported for the number of latent vectors corresponding to ceiling (nrow (pred) / 10).

r2.history

sequence of number of r2 values obtained during iterations in backward selection

X

original predictors

Y

original target variable

X.testset

test set: predictors

Y.testset

test set: target variable

preprocessing

method used for preprocessing

scaling

TRUE if scaling was requested

val

LOO or CV

call

the function call

Author(s)

Sebastian Schmidtlein with contributions from Carsten Oldenburg and Hannes Feilhauer. The code for computing VIP is borrowed from Bjørn-Helge Mevik.

References

Feilhauer. H., Asner, G.P., Martin, R.E., Schmidtlein, S. (2010): Brightness-normalized Partial Least Squares regression for hyperspectral data. Journal of Quantitative Spectroscopy and Radiative Transfer 111: 1947–1957.

Schmidtlein, S., Feilhauer, H., Bruelheide, H. (2012): Mapping plant strategy types using remote sensing. Journal of Vegetation Science 23: 395–405. Open Access.

See Also

pls, set.iter, set.lv, predict.autopls, plot.autopls

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
  ## load predictor and response data to the current environment
  data (murnau.X)
  data (murnau.Y)
  
  ## call autopls with the standard options
  model <- autopls (murnau.Y ~ murnau.X)
  
  ## S3 plot method
  ## Not run: plot (model)
  ## Not run: plot (model, type = "rc")
  
  ## Loading and score plots
  ## Not run: plot (model$loadings, main = "Loadings")
  ## Not run: plot (model$loadings [,c(1,3)], main = "Loadings")
  ## Not run: plot (model$scores, main = "Scores")
  

Example output

Loading required package: pls

Attaching package: 'pls'

The following object is masked from 'package:stats':

    loadings

autopls 1.3
1   Pred: 26  LV: 3   R2v: 0.74   RMSEv: 4.727  
2   Pred: 23  LV: 3   R2v: 0.742  RMSEv: 4.705  Criterion: A1
3   Pred: 20  LV: 3   R2v: 0.749  RMSEv: 4.645  Criterion: A4
4   Pred: 18  LV: 3   R2v: 0.752  RMSEv: 4.611  Criterion: A4
5   Pred: 16  LV: 3   R2v: 0.752  RMSEv: 4.61   Criterion: A4
6   Pred: 13  LV: 3   R2v: 0.76   RMSEv: 4.537  Criterion: A1
7   Pred: 11  LV: 3   R2v: 0.768  RMSEv: 4.466  Criterion: A4
8   Pred: 9   LV: 3   R2v: 0.775  RMSEv: 4.397  Criterion: A4

Predictors: 9   Observations: 40   Latent vectors: 3   Run: 8 
RMSE(CAL): 4.09   RMSE(LOO): 4.4   
R2(CAL): 0.805    R2(LOO): 0.775   

autopls documentation built on May 2, 2019, 9:39 a.m.

Related to autopls in autopls...