forward_selection_by_rank: Forward selection by rank

Description Usage Arguments Value Examples

View source: R/forward_selection_by_rank_class.R

Description

A model is trained and performance metric computed by including increasing numbers of features in the model. The features to be included in each step are defined by their rank, which is computed from another variable e.g. VIP score. An "optimal"subset of features is suggested by minimising the input performance metric.

Usage

1
2
3
4
5
6
7
8
forward_selection_by_rank(
  min_no_vars = 1,
  max_no_vars = 100,
  step_size = 1,
  factor_name,
  variable_rank,
  ...
)

Arguments

min_no_vars

(numeric) The minimum number of variables to include in the model. The default is 1.

max_no_vars

(numeric) The maximum number of variables to include in the model. The default is 100.

step_size

(numeric) The incremental change in number of features in the model. The default is 1.

factor_name

(character) The name of a sample-meta column to use.

variable_rank

(numeric, integer) The values used to rank the features.

...

Additional slots and values passed to struct_class.

Value

A forward_selection_by_rank object.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
# some data
D = MTBLS79_DatasetExperiment(filtered=TRUE)

# normalise, impute and scale then remove QCs
P = pqn_norm(qc_label='QC',factor_name='class') +
    knn_impute(neighbours=5) +
    glog_transform(qc_label='QC',factor_name='class') +
    filter_smeta(mode='exclude',levels='QC',factor_name='class')
P = model_apply(P,D)
D = predicted(P)

# forward selection using a PLSDA model
M = forward_selection_by_rank(factor_name='class',
                             min_no_vars=2,
                             max_no_vars=11,
                             variable_rank=1:2063) *
    (mean_centre() + PLSDA(number_components=1,
                           factor_name='class'))
M = run(M,D,balanced_accuracy())

structToolbox documentation built on Nov. 8, 2020, 6:54 p.m.