stat_filter | R Documentation |
Univariate statistic filter for dataframes of predictors with mixed numeric and categorical datatypes. Different statistical tests are used depending on the data type of response vector and predictors:
bin_stat_filter()
t-test for continuous data, chi-squared test for categorical data
class_stat_filter()
one-way ANOVA for continuous data, chi-squared test for categorical data
cor_stat_filter()
correlation (or linear regression) for continuous data and binary data, one-way ANOVA for categorical data
stat_filter(y, x, ...)
bin_stat_filter(
y,
x,
force_vars = NULL,
nfilter = NULL,
p_cutoff = 0.05,
rsq_cutoff = NULL,
type = c("index", "names", "full", "list"),
...
)
class_stat_filter(
y,
x,
force_vars = NULL,
nfilter = NULL,
p_cutoff = 0.05,
rsq_cutoff = NULL,
type = c("index", "names", "full", "list"),
...
)
cor_stat_filter(
y,
x,
cor_method = c("pearson", "spearman", "lm"),
force_vars = NULL,
nfilter = NULL,
p_cutoff = 0.05,
rsq_cutoff = NULL,
rsq_method = "pearson",
type = c("index", "names", "full", "list"),
...
)
y |
Response vector |
x |
Matrix or dataframe of predictors |
... |
optional arguments, e.g. |
force_vars |
Vector of column names within |
nfilter |
Number of predictors to return. If |
p_cutoff |
p value cut-off |
rsq_cutoff |
r^2 cutoff for removing predictors due to collinearity.
Default |
type |
Type of vector returned. Default "index" returns indices, "names" returns predictor names, "full" returns a dataframe of statistics, "list" returns a list of 2 matrices of statistics, one for continuous predictors, one for categorical predictors. |
cor_method |
For |
rsq_method |
character string indicating which correlation coefficient
is to be computed. One of "pearson" (default), "kendall", or "spearman".
See |
stat_filter()
is a wrapper which calls bin_stat_filter()
,
class_stat_filter()
or cor_stat_filter()
depending on whether y
is
binary, multiclass or continuous respectively. Ordered factors are converted
to numeric (integer) levels and analysed as if continuous.
Integer vector of indices of filtered parameters (type = "index") or
character vector of names (type = "names") of filtered parameters in order
of test p-value. If type
is "full"
full output is
returned containing a dataframe of statistical results. If type
is
"list"
the output is returned as a list of 2 matrices containing
statistical results separated by continuous and categorical predictors.
library(mlbench)
data(BostonHousing2)
dat <- BostonHousing2
y <- dat$cmedv ## continuous outcome
x <- subset(dat, select = -c(cmedv, medv, town))
stat_filter(y, x, type = "full")
stat_filter(y, x, nfilter = 5, type = "names")
stat_filter(y, x)
data(iris)
y <- iris$Species ## 3 class outcome
x <- subset(iris, select = -Species)
stat_filter(y, x, type = "full")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.