knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" )
Provides the step_fcbf function, which allows fast correlation based filter (FCBF) feature selection to be added as a recipe step in 'tidymodels'. FCBF is a feature selection method that aims to retain a set of features with high correlation to the outcome, and low correlation to other retained features. It uses the metric 'Symmetrical Uncertainty' to determine the magnitude of correlation.
The underlying calculations are conducted by the Bioconductor package FCBF, and the algorithm is described in Yu, L. and Liu, H.; Feature Selection for High-Dimensional Data: A Fast Correlation Based Filter Solution,Proc. 20th Intl. Conf. Mach. Learn. (ICML-2003), Washington DC, 2003.
step_fcbf can handle both nominal and numeric features. However, the underlying FCBF algorithm can only handle nominal features, so numeric features first need to be discretized. step_fcbf function internally converts numeric features to binary nominal features, using a median split by default (other quantiles between 0-1 can be provided with the 'cutpoint = ' argument). Discretization is only used within the feature selection algorithm, once features have been selected the numeric version of the feature is retained for further processing and modeling.
FCBF requires a cut point to be provided for symmetrical uncertainty (between 0-1). Smaller thresholds of SU will result in more features being retained. Appropriate thresholds are data-dependent, so it is recommended that different values of SU be explored using a subset of the training set. e.g. see 'FCBF::su_plot()'
You can install the development version of stepFCBF from GitHub with:
# install.packages("devtools") devtools::install_github("rowanjh/stepFCBF")
stepFCBF depends on the bioconductor package "FCBF". Run the below code or see the FCBF package website for more detailed installation instructions:
BiocManager::install("FCBF")
Basic example for including step_fcbf into a recipe
library(recipes) library(stepFCBF) data("iris")
my_recipe <- recipe(Species ~ ., data = iris) %>% step_fcbf(all_predictors(), min_su = 0.001) prepped_recipe <- my_recipe %>% prep(iris) # Original features prepped_recipe$var_info # Selected features prepped_recipe$term_info
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.