evaluatorPLS | R Documentation |
Creates the object that controls the evaluation step in the genetic algorithm
evaluatorPLS(
numReplications = 30L,
innerSegments = 7L,
outerSegments = 1L,
testSetSize = NULL,
numThreads = NULL,
maxNComp = NULL,
method = c("simpls"),
sdfact = 1
)
numReplications |
The number of replications used to evaluate a variable subset (must be between 1 and 2^16) |
innerSegments |
The number of CV segments used in one replication (must be between 2 and 2^16) |
outerSegments |
The number of outer CV segments used in one replication (between 0 and 2^16). If this is greater than 1, repeated double cross-validation strategy (rdCV) will be used instead of simple repeated cross-validation (srCV) (see details) |
testSetSize |
The relative size of the test set used for simple repeated CV (between 0 and 1). This parameter is ignored if outerSegments > 1 and a warning will be issued. |
numThreads |
The maximum number of threads the algorithm is allowed to spawn (a value less than 1 or NULL means no threads) |
maxNComp |
The maximum number of components the PLS models should consider (if not specified, the number of components is not constrained) |
method |
The PLS method used to fit the PLS model (currently only SIMPLS is implemented) |
sdfact |
The factor to scale the stand. dev. of the MSEP values when selecting the optimal number
of components. For the "one standard error rule", |
With this method the genetic algorithm uses PLS regression models to assess the prediction power of
variable subsets. By default, simple repeated cross-validation (srCV) is used. The optimal number
of PLS components is estimated using cross-validation (with innerSegments
segments) on a
training set. The prediction power is then evaluated by fitting a PLS regression model with this optimal
number of components to the training set and predicting the values of a test set (of either
testSetSize
size or 1 / innerSegments
, if testSetSize
is not specified).
If the parameter outerSegments
is given, repeated double cross-validation is used instead.
There, the data set is first split into outerSegments
segments and one segment is used as
prediction set and the other segments as test set. This is repeated for each outer segment.
The whole procedure is repeated numReplications
times to get a more reliable estimate of the
prediction power.
Returns an S4 object of type GenAlgPLSEvaluator
to be used as argument to
a call of genAlg
.
Other GenAlg Evaluators:
evaluatorFit()
,
evaluatorLM()
,
evaluatorUserFunction()
ctrl <- genAlgControl(populationSize = 100, numGenerations = 15, minVariables = 5,
maxVariables = 12, verbosity = 1)
evaluatorSRCV <- evaluatorPLS(numReplications = 2, innerSegments = 7, testSetSize = 0.4,
numThreads = 1)
evaluatorRDCV <- evaluatorPLS(numReplications = 2, innerSegments = 5, outerSegments = 3,
numThreads = 1)
# Generate demo-data
set.seed(12345)
X <- matrix(rnorm(10000, sd = 1:5), ncol = 50, byrow = TRUE)
y <- drop(-1.2 + rowSums(X[, seq(1, 43, length = 8)]) + rnorm(nrow(X), 1.5));
resultSRCV <- genAlg(y, X, control = ctrl, evaluator = evaluatorSRCV, seed = 123)
resultRDCV <- genAlg(y, X, control = ctrl, evaluator = evaluatorRDCV, seed = 123)
subsets(resultSRCV, 1:5)
subsets(resultRDCV, 1:5)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.