Description Usage Arguments Details Value Author(s) References See Also Examples
Partial least squares regression with backward selection of predictors
1 2 3 4 |
formula |
model formula |
data |
optional data frame with the data to fit the model |
testset |
optional vector defining a test set (row indices) |
tselect |
string specifying the role of the test set in model selection
( |
prep |
character. optional preprocessing (only one choice implemented:
|
val |
character. Validation used ( |
scaling |
logical. if |
stingy |
logical. If |
verbose |
logical. If |
backselect |
one or more character strings defining the methods used in backwards
selection (see details). |
jt.thresh |
threshold used in predictor selections that are based on jackknife testing (methods based on A1, see details) |
vip.thresh |
threshold used in predictor selections that are based on VIP (methods based on A2, see details). VIP is scaled to a maximum of 1. |
jump |
numeric. If a number is given, backward selection starts with a forced reduction of predictors to the given number (see A0 in details). This reduction is based on significance in jackknifing. The argument can be useful in the case of large predictor matrices. |
lower |
numeric. Backward selection proceeds as long as R2 in validation reaches the given value (experimental, backward selection continues further if models improve in other respects such as decreasing numbers of latent vectors). |
method |
character string indicating what plsr method to use. |
The autopls
function is a wrapper for pls
in package pls written by Bjørn-Helge Mevik, Ron Wehrens
and Kristian Hovde Liland. As for now, the wrapper can be cited as
Schmidtlein et al. (2012). autopls
works only for single target
variables.
If validation = “CV”, 10-fold cross-validation is performed. If
validation = “LOO”, leave-one-out cross-validation is performed.
Test set validation takes always place if a test set has been defined.
tselect
specifies how the test set is used in model selection.
"none"
: just use it for external validation; "passive"
:
use error in external validation for model selection but do not use it
for the determination of the number of latent vectors; "active"
use the error in external validation for model selection and for the
determination of the number of latent vectors. With stingy = TRUE
the errors that are used in the selection are measured at a number of latent
vectors that depends on the number of observations (1/10 at maximum).
Otherwise, the number of latent vectors is chosen where errors approach
a first minimum. In order to avoid minor local minima the error values are
first smoothed.
Large data matrices: Examine the arguments jump
(forced reduction of
predictors in the first iteration). Large model objects can be
shrinked using the function slim
but some functionality (like
plotting or change of the number of latent vectors) is lost. Shrinked models
can still be used for predictions.
Preprocessing options: The only implemented option is currently
"bn"
, which is a brightness normalization according to
Feilhauer et al. (2010).
Several methods for predictor selection are available. In default mode
(backselect = "auto"
) the selection follows an optimization procedure
using methods A1 and A3. However, apart from A0 any user-defined
combination can be selected using the backselect
argument. Note that
VIP-based methods (A2, A3, B3 to B6) are meant to be used with the oscorespls
method and methods B1 to B6 and C1 do only make sense with sequences of
spectral bands or similar sequences of autocorrelated predictors. The methods
are coded as follows:
A) Filtering based on thresholds
(A0 and A1) Based on significance, A0 with user-defined threshold (see
argument jump
); (A2) based on VIP; (A3) based on combined
significance and VIP; (A4) removal of 10 % predictors with the lowest
significance; (A5) removal of 25 % predictors with the lowest significance.
B) Filtering followed by reduction of autocorrelation
(B1) Filtering based on significance, thinning starting with local maxima in weighted regression coefficients; (B2) filtering based on significance, thinning starting with local maxima in significance; (B3) filtering based on significance, thinning starting with local maxima in VIP; (B4) filtering based on VIP, thinning starting with local maxima in weighted regression coefficients; (B5) filtering based on VIP, thinning starting with local maxima in significance; (B6) filtering based on VIP, thinning starting with local maxima in VIP.
C) Just reduction of autocorrelation
(C1): reduction starting with local maxima in regression coefficients.
An object of class autopls
is returned. This equals a
pls object and some added objects:
predictors |
logical. Vector of predictors that have been or have not been used in the current model |
metapls |
outcomes of the backward selection process |
iterations |
models selected during the backward selection process |
The $metapls
item consists of the following:
current.iter |
iteration of the backward selection procedure the current model is based upon |
autopls.iter |
iteration of the backward selection procedure originally selected by autopls |
current.lv |
number of latent vectors the current model is based upon |
autopls.lv |
number of latent vectors originally selected by autopls |
lv.history |
sequence of number of latent vectors values selected during iterations in backward selection |
rmse.history |
sequence of root mean squared errors obtained during
iterations in backward selection. Errors are reported for
calibration and validation. The validation errors are also
reported for the number of latent vectors corresponding to
|
r2.history |
sequence of number of r2 values obtained during iterations in backward selection |
X |
original predictors |
Y |
original target variable |
X.testset |
test set: predictors |
Y.testset |
test set: target variable |
preprocessing |
method used for preprocessing |
scaling |
|
val |
|
call |
the function call |
Sebastian Schmidtlein with contributions from Carsten Oldenburg
and Hannes Feilhauer. The code for computing VIP
is borrowed from
Bjørn-Helge Mevik.
Feilhauer. H., Asner, G.P., Martin, R.E., Schmidtlein, S. (2010): Brightness-normalized Partial Least Squares regression for hyperspectral data. Journal of Quantitative Spectroscopy and Radiative Transfer 111: 1947–1957.
Schmidtlein, S., Feilhauer, H., Bruelheide, H. (2012): Mapping plant strategy types using remote sensing. Journal of Vegetation Science 23: 395–405. Open Access.
pls
, set.iter
, set.lv
,
predict.autopls
, plot.autopls
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | ## load predictor and response data to the current environment
data (murnau.X)
data (murnau.Y)
## call autopls with the standard options
model <- autopls (murnau.Y ~ murnau.X)
## S3 plot method
## Not run: plot (model)
## Not run: plot (model, type = "rc")
## Loading and score plots
## Not run: plot (model$loadings, main = "Loadings")
## Not run: plot (model$loadings [,c(1,3)], main = "Loadings")
## Not run: plot (model$scores, main = "Scores")
|
Loading required package: pls
Attaching package: 'pls'
The following object is masked from 'package:stats':
loadings
autopls 1.3
1 Pred: 26 LV: 3 R2v: 0.74 RMSEv: 4.727
2 Pred: 23 LV: 3 R2v: 0.742 RMSEv: 4.705 Criterion: A1
3 Pred: 20 LV: 3 R2v: 0.749 RMSEv: 4.645 Criterion: A4
4 Pred: 18 LV: 3 R2v: 0.752 RMSEv: 4.611 Criterion: A4
5 Pred: 16 LV: 3 R2v: 0.752 RMSEv: 4.61 Criterion: A4
6 Pred: 13 LV: 3 R2v: 0.76 RMSEv: 4.537 Criterion: A1
7 Pred: 11 LV: 3 R2v: 0.768 RMSEv: 4.466 Criterion: A4
8 Pred: 9 LV: 3 R2v: 0.775 RMSEv: 4.397 Criterion: A4
Predictors: 9 Observations: 40 Latent vectors: 3 Run: 8
RMSE(CAL): 4.09 RMSE(LOO): 4.4
R2(CAL): 0.805 R2(LOO): 0.775
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.