View source: R/preprocessing_feature_selection.R
preprocessing_feature_selection | R Documentation |
`VI`
The variable importance method based on random forest - long time, worst results,
`MCFS`
The Monte Carlo Feature Selection - short time, reasonable results,
`MI`
The Varrank method based on mutual information scores - short time,
if we set too big 'max_features' it can work really long, bad results,
`BORUTA`
The BORUTA algorithm - long time, best results.
preprocessing_feature_selection(
data,
y,
feature_selection_method = "BORUTA",
max_features = "default",
nperm = 1,
cutoffPermutations = 20,
threadsNumber = NULL,
method = "estevez",
verbose = FALSE
)
data |
A data source, that is one of the major R formats: data.table, data.frame, matrix and so on. |
y |
A string that indicates a target column name. |
feature_selection_method |
A string value indication the feature selection method. The imputation method must be one of 'VI', 'MCFS', 'MI', or 'BORUTA' (default). |
max_features |
A positive integer value describing the desired number of selected features. Initial value set as 'default' which is min(10, ncol(data) - 1) for 'VI' and 'MI', and NULL (number of relevant features chosen by the method) for ‘MCFS'. Only 'MCFS' can use the NULL value. 'BORUTA' doesn’t use this parameter. |
nperm |
An integer describing the number of permutations performed, relevant for the 'VI' method. By default set to 1. |
cutoffPermutations |
An non-negative integer value that determines the number of permutation runs. It needs at least 20 permutations for a statistically significant result. Minimum value of this parameter is 3, however if it is 0 then permutations method is turned off. Relevant for the 'MCFS' method. |
threadsNumber |
A positive integer value describing the number of threads to use in computation. More threads needs more CPU cores as well as memory usage is a bit higher. It is recommended to set this value equal to or less than CPU available cores. By default set to NULL, which will use maximal number of cores minus 1. Relevant for the 'MCFS' method. |
method |
A string that indicates which algorithm will be used for 'MI' method. Available options are the default 'estevez' which works well for smaller datasets, but can raise errors for bigger ones, and simpler 'peng'. More details present in the documentation of ?varrank method. |
verbose |
A logical value, if set to TRUE, provides all information about preprocessing process, if FALSE gives none. |
A list containing two objects:
`data`
A dataset with selected columns,
`idx`
The indexes of removed columns.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.