fcbf: Fast Correlation Based Filter function.

Description Usage Arguments Details Value Examples

View source: R/fcbf.R

Description

This functions allows selection of variables from a feature table of discrete/categorial variables and a target class. The function is based on the algorithm described in Yu, L. and Liu, H.; Feature Selection for High-Dimensional Data A Fast Correlation Based Filter Solution, Proc. 20th Intl. Conf. Mach. Learn. (ICML-2003), Washington DC, 2003

Usage

1
2
3
4
5
6
7
8
9
fcbf(
  feature_table,
  target_vector,
  minimum_su = 0.25,
  n_genes_selected_in_first_step = NULL,
  verbose = FALSE,
  samples_in_rows = FALSE,
  balance_classes = FALSE
)

Arguments

feature_table

A table of features (samples in rows, variables in columns, and each observation in each cell)

target_vector

A target vector, factor containing classes of the observations. Note: the observations must be in the same order as the parameter x

minimum_su

A minimum_suold for the minimum correlation (as determined by symettrical uncertainty) between each variable and the class. Defaults to 0.25.

Note: this might drastically change the number of selected features.

n_genes_selected_in_first_step

Sets the number of genes to be selected in the first part of the algorithm. The final number of selected genes is related to this paramenter, but depends on the correlation structure of the data. It overrides the minimum_su parameter. If left unchanged, it defaults to NULL and the minimum_su parameter is used.

verbose

Adds verbosity. Defaults to FALSE.

samples_in_rows

A flag for the case in which samples are in rows and variables/genes in columns. Defaults to FALSE.

balance_classes

Balances number of instances in the target vector y by sampling the number of instances in the minor class from all others. The number of samplings is controlled by resampling_number. Defaults to FALSE.

Details

Obs: For gene expression, you will need to run discretize_exprs first

Value

Returns a data frame with the selected features index (first row) and their symmetrical uncertainty values regarding the class (second row). Variable names are present in rownames

Examples

1
2
3
4
5
6
7
8
 data(scDengue)
 exprs <- SummarizedExperiment::assay(scDengue, 'logcounts')
 discrete_expression <- as.data.frame(discretize_exprs(exprs))
 head(discrete_expression[,1:4])
 infection <- SummarizedExperiment::colData(scDengue)
 target <- infection$infection
 fcbf(discrete_expression,target, minimum_su = 0.05, verbose = TRUE)
 fcbf(discrete_expression,target, n_genes_selected_in_first_step = 100)

lubianat/FCBF documentation built on March 3, 2021, 12:35 a.m.