spFSR.default: Default Function of SP-FSR for Feature Selection and Ranking

View source: R/spFSR.default.R

spFSR.defaultR Documentation

Default Function of SP-FSR for Feature Selection and Ranking

Description

This is the default function of spFeatureSelection. See spFeatureSelection for example.

Usage

spFSR.default(
  task,
  wrapper = NULL,
  scoring = NULL,
  perturb.amount = 0.05,
  gain.min = 0.01,
  gain.max = 2,
  change.min = 0,
  change.max = 0.2,
  bb.bottom.threshold = 10^(-8),
  mon.gain.A = 100,
  mon.gain.a = 0.75,
  mon.gain.alpha = 0.6,
  hot.start.num.ft.factor = 15,
  hot.start.max.auto.num.ft = 150,
  use.hot.start = TRUE,
  hot.start.range = 0.2,
  rf.n.estimators = 50,
  gain.type = "bb",
  num.features.selected = 0L,
  iters.max = 100L,
  stall.limit = 35L,
  n.samples.max = 5000,
  ft.weighting = FALSE,
  encoding.type = "encode",
  is.debug = FALSE,
  stall.tolerance = 10^(-8),
  random.state = 1,
  rounding = 3,
  run.parallel = TRUE,
  n.jobs = NULL,
  show.info = TRUE,
  print.freq = 10L,
  num.cv.folds = 5L,
  num.cv.reps.eval = 3L,
  num.cv.reps.grad = 1L,
  num.grad.avg = 4L,
  perf.eval.method = "cv"
)

Arguments

task

A task tsk object created using mlr3 package. It must be either a ClassifTask or RegrTask object.

wrapper

A Learner lrn object created using mlr3 package or a GraphLearner object created using mlr3pipelines package. Multiple learners object is not supported. If left empty will select random forest by default.

scoring

A performance measure msr within the mlr3 package supported by the task. If left blank will select accuracy for classification and r-squared for regression.

perturb.amount

Perturbation amount for feature importances during gradient approximation. It must be a value between 0.01 and 0.1. Default value is 0.05.

gain.min

The minimum gain value. It must be greater than or equal to 0.001. Default value is 0.01.

gain.max

The maximum gain value. It must be greater than or equal to gain.min. Default value is 1.0.

change.min

The minimum change value. It must be non-negative. Default value is 0.0.

change.max

The maximum change value. It must be greater than change.min. Default is 0.2.

bb.bottom.threshold

The threshold value of denominator for the Barzilai-Borwein gain sequence. It must be positive. Default is 1/10^8.

mon.gain.A

Parameter for the monetone gain sequence. It must be a positive integer. Default is 100.

mon.gain.a

Parameter for the monetone gain sequence. It must be positive. Default is 0.75.

mon.gain.alpha

Parameter for the monetone gain sequence. It must be between (0, 1). Default is 0.6.

hot.start.num.ft.factor

The factor of features to select for hot start. Must be an integer greater than 1. Default is 15.

hot.start.max.auto.num.ft

The maximum initial number of features for automatic hot start. Must be an integer greater than 1. Default is 75.

use.hot.start

Logical argument. Whether hot start should be used. Default is True.

hot.start.range

Float, the initial range of imputations carried over from hot start. It must be between (0,1). Default is 0.2.

rf.n.estimators

integer, The number of trees to use in the random forest hot start. The default is 50.

gain.type

The gain sequence to use. Accepted methods are 'bb' for Barzilai-Borwein or 'mon' for a monetonic gain sequence. Default is 'bb'.

num.features.selected

Number of features selected. It must be a nonnegative integer and must not exceed the total number of features in the task. A value of 0 results in automatic feature selection. Default value is 0L.

iters.max

Maximum number of iterations to execute. The minimum value is 2L. Default value is 300L.

stall.limit

Number of iterations to stall, that is, to continue without at least stall.tolerance improvement to the measure value. The mininum value is 2L. Default value is 100L.

n.samples.max

The maximum number of samples to select from sampling. It must be a non-negative integer. Default is 2500.

ft.weighting

Logical argument. Include simultaneous feature weighting and selection?. Default is FALSE.

encoding.type

Encoding method for factor features for feature weighting, default is 'encoded'.

is.debug

Logical argument. Print additional debug messages? Default value is FALSE.

stall.tolerance

Value of stall tolerance. It must be strictly positive. Default value is 1/10^8.

random.state

random state used. Default is 1.

rounding

The number of digits to round results. It must be a positive integer. Default value is 3.

run.parallel

Logical argument. Perform cross-validations in parallel? Default value is TRUE.

n.jobs

Number of cores to use in case of a parallel run. It must be less than or equal to the total number of cores on the host machine. If set to NULL when run.parallel is TRUE, it is taken as one less of the total number of cores.

show.info

If set to TRUE, iteration information is displayed at print frequency.

print.freq

Iteration information printing frequency. It must be a positive integer. Default value is 10L.

num.cv.folds

The number of cross-validation folds when 'cv' is selected as perf.eval.method. The minimum value is 3L. Default value is 5L.

num.cv.reps.eval

The number of cross-validation repetitions for feature subset evaluation. It must be a positive integer. Default value is 3L.

num.cv.reps.grad

The number of cross-validation repetitions for gradient averaging. It must be a positive integer. Default value is 1L.

num.grad.avg

Number of gradients to average for gradient approximation. It must be a positive integer. Default value is 4L.

perf.eval.method

Performance evaluation method. It must be either 'cv' for cross-validation or 'resub' for resubstitution. Default is 'cv'.

Value

spFSR returns an object of class "spFSR". An object of class "spFSR" consists of the following:

task.spfs

An mlr3 package tsk object defined on the best performing features.

wrapper

An mlr3 package lrn object or a mlr3pipelines package GraphLearner object as specified by the user.

scoring

An mlr3 package msr as specified by the user.

param best.model

An mlr3 package model object trained by the wrapper using task.spfs.

iter.results

A data.frame object containing detailed information on each iteration.

features

Names of the best performing features.

num.features

The number of best performing features.

importance

A vector of importance ranks of the best performing features.

total.iters

The total number of iterations executed.

best.iter

The iteration where the best performing feature subset was encountered.

best.value

The best measure value encountered during execution.

best.std

The standard deviation corresponding to the best measure value encountered.

run.time

Total run time in minutes.

results

Dataframe with boolean of selected features, names and measure

call

Call.

References

David V. Akman et al. (2022) k-best feature selection and ranking via stochastic approximation, Expert Systems with Applications, Vol. 213. See \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/j.eswa.2022.118864")}

G.F.A Yeo and V. Aksakalli (2021) A stochastic approximation approach to simultaneous feature weighting and selection for nearest neighbour learners, Expert Systems with Applications, Vol. 185. See \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/j.eswa.2021.115671")}

See Also

spFeatureSelection.


spFSR documentation built on March 31, 2023, 9:05 p.m.

Related to spFSR.default in spFSR...