variable.selection: Selecting the appropriate sets of variables in...

Description Usage Arguments Value Author(s) Examples

Description

variable.selection is the core function of the package which identifies the appropriate sets of predictors of a binary outcome. Based on the dependency of variable selection methods on the classification method, they are categorized into Filter and Wrapper approaches. Based on the product of variable selection, filters are categorized into Ranker and Subset Selector methods. Rankers rank the variables based on their quality while Subset Selectors generate a subset of good quality variables. Wrappers are all Subset Selectors. Based on the methodology of the selected method, you may consider specifying additional parameters. However, all the parameters have default values so that the researchers not interested in or familiar with methodology are able to implement lots of variable selection methods on their own dataset without being involved in the complexity of methods by using a single command (see details).

Ranker methods: "correlation", "information", "relief", "OneR", "RF"

Subset Selector methods: "CFS", "consistency", "wrapper"

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
variable.selection(...)

## Default S3 method:
variable.selection(input, target, data,
  methods = c("correlation", "information", "consistency", "relief", "OneR",
  "RF", "CFS", "wrapper"), control = control.selection(), trace = FALSE,
  ...)

## S3 method for class 'formula'
variable.selection(formul, data, methods = c("correlation",
  "information", "consistency", "relief", "OneR", "RF", "CFS", "wrapper"),
  control = control.selection(), trace = FALSE, ...)

Arguments

...

further arguments.

input

a character vector with the name of input variables to select from.

target

a character string with the name of the outcome that distinguishes nondiseased from diseased individuals. Only applies for the method "variable.selection.default".

data

a data frame containing all needed variables.

methods

"correlation" (Ranker; Based on a correlation measure to compute the relation of each individual variable with the outcome.); "information" (Ranker; Based on a concept called entropy which is a measure of uncertainty or unpredictability.); "relief" (Ranker; Based on how well the values of each variable distinguish between neighbor subjects; It takes advantage of a nearest-neighbor procedure.); "OneR" (Ranker; "One Rule"; Based on the performance measure of the one-level decision trees obtained for each individual variable in the data. Ranking of the variables is based on the fact that variables.); "RF" (Ranker; Based on the simple idea that if a variable is not important for prediction of a particular outcome, relocating its values randomly among the instances will not change the performance of the prediction model; the average performance of the random forst trees are used.); "CFS" (Subset Selector; Based on the idea that a good variable subset is one that contains variables uncorrelated with each other while being highly correlated with the outcome.); "consistency" (Subset Selector; Based on the idea that a dataset containing only the selected variables must be consistent, i.e. two subjects with the same predictors must belong to the same outcome.); "wrapper" (Subset Selector; Based on the performance measure of the diagnostic/prognostic classification) (see details).

control

output of the control.selection function.

trace

a logical value. If TRUE, information on progress is shown. The default is FALSE.

formul

a formula (method "variable.selection.formula" is called). It must be an object of class "formula". Right side of ~ must contain the name of the variable that distinguishes diseased from non-diseased individuals, and left side of ~ must contain the name of the diagnostic/prognostic test variables.

Value

Returns an object of class "optimal.cutpoints" with the following components:

Author(s)

Farideh Bagherzadeh-Khiabani.

Examples

1
2
3
4
5
6
7
8
library(VariableSelection)
 data(tlgs)

 ###########################################
 # Variable Selection with the deafault method (all methods with the default parameters):
 object <- variable.selection (input=names(tlgs)[-1],
 target=names(tlgs)[1], data=tlgs)
 print(object)

faridehbagherzadeh/VariableSelection documentation built on May 16, 2019, 10:10 a.m.