Description Usage Arguments Value Author(s) Examples
variable.selection is the core function of the package which identifies the appropriate sets of predictors of a binary outcome. Based on the dependency of variable selection methods on the classification method, they are categorized into Filter and Wrapper approaches. Based on the product of variable selection, filters are categorized into Ranker and Subset Selector methods. Rankers rank the variables based on their quality while Subset Selectors generate a subset of good quality variables. Wrappers are all Subset Selectors. Based on the methodology of the selected method, you may consider specifying additional parameters. However, all the parameters have default values so that the researchers not interested in or familiar with methodology are able to implement lots of variable selection methods on their own dataset without being involved in the complexity of methods by using a single command (see details).
Ranker methods: "correlation", "information", "relief", "OneR", "RF"
Subset Selector methods: "CFS", "consistency", "wrapper"
1 2 3 4 5 6 7 8 9 10 11 12 | variable.selection(...)
## Default S3 method:
variable.selection(input, target, data,
methods = c("correlation", "information", "consistency", "relief", "OneR",
"RF", "CFS", "wrapper"), control = control.selection(), trace = FALSE,
...)
## S3 method for class 'formula'
variable.selection(formul, data, methods = c("correlation",
"information", "consistency", "relief", "OneR", "RF", "CFS", "wrapper"),
control = control.selection(), trace = FALSE, ...)
|
... |
further arguments. |
input |
a character vector with the name of input variables to select from. |
target |
a character string with the name of the outcome that distinguishes nondiseased from diseased individuals. Only applies for the method "variable.selection.default". |
data |
a data frame containing all needed variables. |
methods |
"correlation" (Ranker; Based on a correlation measure to compute the relation of each individual variable with the outcome.); "information" (Ranker; Based on a concept called entropy which is a measure of uncertainty or unpredictability.); "relief" (Ranker; Based on how well the values of each variable distinguish between neighbor subjects; It takes advantage of a nearest-neighbor procedure.); "OneR" (Ranker; "One Rule"; Based on the performance measure of the one-level decision trees obtained for each individual variable in the data. Ranking of the variables is based on the fact that variables.); "RF" (Ranker; Based on the simple idea that if a variable is not important for prediction of a particular outcome, relocating its values randomly among the instances will not change the performance of the prediction model; the average performance of the random forst trees are used.); "CFS" (Subset Selector; Based on the idea that a good variable subset is one that contains variables uncorrelated with each other while being highly correlated with the outcome.); "consistency" (Subset Selector; Based on the idea that a dataset containing only the selected variables must be consistent, i.e. two subjects with the same predictors must belong to the same outcome.); "wrapper" (Subset Selector; Based on the performance measure of the diagnostic/prognostic classification) (see details). |
control |
output of the |
trace |
a logical value. If TRUE, information on progress is shown. The default is FALSE. |
formul |
a formula (method "variable.selection.formula" is called). It must be an object of class "formula". Right side of ~ must contain the name of the variable that distinguishes diseased from non-diseased individuals, and left side of ~ must contain the name of the diagnostic/prognostic test variables. |
Returns an object of class "optimal.cutpoints" with the following components:
"Selection" a list of items where each item corresponds to a selected varaiable subset, names of this list commonly consist of three componenets seperated by ".". The first component shows the method of variable selection, the second one shows the measure used in the corresponding method and the third one shows the search method. The firs componenet may be: "correlation" ,"information" , "relief", "OneR" , "RF" , "CFS", "consistency" showing the filter method of variable selecction "logistic" showing the model used in wrapper variable selection The second componenet may be: "Chi2", "CramerV" for "orrelation" method "IG", "GR", "SU" for "information" method "acc" for "OneR" method "acc", "imp" for "RF" method "IG", "GR", "SU", "Chi2", "CramerV" for "CFS" method "aic" for logistic "wrapper" method The third componenet may be: "BR", "FR" for ranker methods "SF", "SB", "BF", "HC" for subset selector methods in each of these names there are either "subset" if the method is subset selector or both "subset" and "weights" if the method is ranker
"frequency"a numeric vector containing the number of times each variable was selected by variable selection methods.
"percentage"a numeric vector containing the percentage each variable was selected by variable selection methods.
"ranker"a character vector containing the names of the implemented Ranker variable selection methods.
"subsetSelector"a character vector containing the names of the implemented Subset Selector variable selection methods.
"methods"a character vector containing the names of the implemented variable selection methods.
"input"a character vector with the name of input variables to select from.
"target"target a character string with the name of the outcome that distinguishes nondiseased from diseased individuals.
"call"the matched call.
Farideh Bagherzadeh-Khiabani.
1 2 3 4 5 6 7 8 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.