fcbf: Fast Correlation Based Filter function.

Description Usage Arguments Details Value Examples

View source: R/fcbf.R

Description

This functions allows selection of variables from a feature table of discrete/categorial variables and a target class. The function is based on the algorithm described in Yu, L. and Liu, H.; Feature Selection for High-Dimensional Data A Fast Correlation Based Filter Solution, Proc. 20th Intl. Conf. Mach. Learn. (ICML-2003), Washington DC, 2003

Usage

1
2
3
4
5
6
7
8
9
fcbf(
  x,
  y,
  thresh = 0.25,
  n_genes = NULL,
  verbose = FALSE,
  samples_in_rows = FALSE,
  balance_classes = FALSE
)

Arguments

x

A table of features (samples in rows, variables in columns, and each observation in each cell)

y

A target vector, factor containing classes of the observations. Note: the observations must be in the same order as the parameter x

thresh

A threshold for the minimum correlation (as determined by symettrical uncertainty) between each variable and the class. Defaults to 0.25. Note: this might drastically change the number of selected features.

n_genes

Sets the number of genes to be selected in the first part of the algorithm. If left unchanged, it defaults to NULL and the thresh parameter is used. Caution: it overrides the thresh parameter altogether.

verbose

Adds verbosity. Defaults to FALSE.

samples_in_rows

A flag for the case in which samples are in rows and variables/genes in columns. Defaults to FALSE.

balance_classes

Balances number of instances in the target vector y by sampling the number of instances in the minor class from all others. The number of samplings is controlled by resampling_number. Defaults to FALSE.

Details

Obs: For gene expression, you will need to run discretize_exprs first

Value

Returns a data frame with the selected features index (first row) and their symmetrical uncertainty values regarding the class (second row). Variable names are present in rownames

Examples

1
2
3
4
5
6
7
8
 data(scDengue)
 exprs <- SummarizedExperiment::assay(scDengue, 'logcounts')
 discrete_expression <- as.data.frame(discretize_exprs(exprs))
 head(discrete_expression[,1:4])
 infection <- SummarizedExperiment::colData(scDengue)
 target <- infection$infection
 fcbf(discrete_expression,target, thresh = 0.05, verbose = TRUE)
 fcbf(discrete_expression,target, n_genes = 100)

FCBF documentation built on Nov. 8, 2020, 8:30 p.m.