GenAlgPLSEvaluator-constructor: PLS Evaluator

evaluatorPLSR Documentation

PLS Evaluator

Description

Creates the object that controls the evaluation step in the genetic algorithm

Usage

evaluatorPLS(
  numReplications = 30L,
  innerSegments = 7L,
  outerSegments = 1L,
  testSetSize = NULL,
  numThreads = NULL,
  maxNComp = NULL,
  method = c("simpls"),
  sdfact = 1
)

Arguments

numReplications

The number of replications used to evaluate a variable subset (must be between 1 and 2^16)

innerSegments

The number of CV segments used in one replication (must be between 2 and 2^16)

outerSegments

The number of outer CV segments used in one replication (between 0 and 2^16). If this is greater than 1, repeated double cross-validation strategy (rdCV) will be used instead of simple repeated cross-validation (srCV) (see details)

testSetSize

The relative size of the test set used for simple repeated CV (between 0 and 1). This parameter is ignored if outerSegments > 1 and a warning will be issued.

numThreads

The maximum number of threads the algorithm is allowed to spawn (a value less than 1 or NULL means no threads)

maxNComp

The maximum number of components the PLS models should consider (if not specified, the number of components is not constrained)

method

The PLS method used to fit the PLS model (currently only SIMPLS is implemented)

sdfact

The factor to scale the stand. dev. of the MSEP values when selecting the optimal number of components. For the "one standard error rule", sdfact is 1.

Details

With this method the genetic algorithm uses PLS regression models to assess the prediction power of variable subsets. By default, simple repeated cross-validation (srCV) is used. The optimal number of PLS components is estimated using cross-validation (with innerSegments segments) on a training set. The prediction power is then evaluated by fitting a PLS regression model with this optimal number of components to the training set and predicting the values of a test set (of either testSetSize size or 1 / innerSegments, if testSetSize is not specified).

If the parameter outerSegments is given, repeated double cross-validation is used instead. There, the data set is first split into outerSegments segments and one segment is used as prediction set and the other segments as test set. This is repeated for each outer segment.

The whole procedure is repeated numReplications times to get a more reliable estimate of the prediction power.

Value

Returns an S4 object of type GenAlgPLSEvaluator to be used as argument to a call of genAlg.

See Also

Other GenAlg Evaluators: evaluatorFit(), evaluatorLM(), evaluatorUserFunction()

Examples

ctrl <- genAlgControl(populationSize = 100, numGenerations = 15, minVariables = 5,
    maxVariables = 12, verbosity = 1)

evaluatorSRCV <- evaluatorPLS(numReplications = 2, innerSegments = 7, testSetSize = 0.4,
    numThreads = 1)

evaluatorRDCV <- evaluatorPLS(numReplications = 2, innerSegments = 5, outerSegments = 3,
    numThreads = 1)

# Generate demo-data
set.seed(12345)
X <- matrix(rnorm(10000, sd = 1:5), ncol = 50, byrow = TRUE)
y <- drop(-1.2 + rowSums(X[, seq(1, 43, length = 8)]) + rnorm(nrow(X), 1.5));

resultSRCV <- genAlg(y, X, control = ctrl, evaluator = evaluatorSRCV, seed = 123)
resultRDCV <- genAlg(y, X, control = ctrl, evaluator = evaluatorRDCV, seed = 123)

subsets(resultSRCV, 1:5)
subsets(resultRDCV, 1:5)

gaselect documentation built on Feb. 16, 2023, 6:14 p.m.