mlr_filters_kruskal_test | R Documentation |
Kruskal-Wallis rank sum test filter calling stats::kruskal.test()
.
The filter value is -log10(p)
where p
is the p-value. This
transformation is necessary to ensure numerical stability for very small
p-values.
mlr3filters::Filter
-> FilterKruskalTest
new()
Create a FilterKruskalTest object.
FilterKruskalTest$new()
clone()
The objects of this class are cloneable with this method.
FilterKruskalTest$clone(deep = FALSE)
deep
Whether to make a deep clone.
This filter, in its default settings, can handle missing values in the features. However, the resulting filter scores may be misleading or at least difficult to compare if some features have a large proportion of missing values.
If a feature has not at least one non-missing observation per label, the resulting score will be NA. Missing scores appear in a random, non-deterministic order at the end of the vector of scores.
For a benchmark of filter methods:
Bommert A, Sun X, Bischl B, Rahnenführer J, Lang M (2020). “Benchmark for filter methods for feature selection in high-dimensional classification data.” Computational Statistics & Data Analysis, 143, 106839. doi: 10.1016/j.csda.2019.106839.
PipeOpFilter for filter-based feature selection.
Dictionary of Filters: mlr_filters
Other Filter:
Filter
,
mlr_filters_anova
,
mlr_filters_auc
,
mlr_filters_carscore
,
mlr_filters_carsurvscore
,
mlr_filters_cmim
,
mlr_filters_correlation
,
mlr_filters_disr
,
mlr_filters_find_correlation
,
mlr_filters_importance
,
mlr_filters_information_gain
,
mlr_filters_jmim
,
mlr_filters_jmi
,
mlr_filters_mim
,
mlr_filters_mrmr
,
mlr_filters_njmim
,
mlr_filters_performance
,
mlr_filters_permutation
,
mlr_filters_relief
,
mlr_filters_selected_features
,
mlr_filters_variance
,
mlr_filters
task = mlr3::tsk("iris") filter = flt("kruskal_test") filter$calculate(task) as.data.table(filter) # transform to p-value 10^(-filter$scores) if (mlr3misc::require_namespaces(c("mlr3pipelines", "rpart"), quietly = TRUE)) { library("mlr3pipelines") task = mlr3::tsk("spam") # Note: `filter.frac` is selected randomly and should be tuned. graph = po("filter", filter = flt("kruskal_test"), filter.frac = 0.5) %>>% po("learner", mlr3::lrn("classif.rpart")) graph$train(task) }
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.