featureSelection: Which features are most important for your gene list?

Description Usage Arguments Value See Also Examples

View source: R/featselect.R

Description

featureSelection computes the best features that discriminate between your list of disease genes and control genes. Uses bootstrapping to form balanced sets of disease and non-disease genes then selects the best features based on a random forest algorithm implemented through the caret::rfe function.

Usage

1
2
3
featureSelection(genes = NULL, seed = 12345, sizes = c(5, 10, 20),
  k = 5, controls = "allghosh", trnProp = 0.9, repeats = 10,
  gacontrols = -1)

Arguments

genes

chr vector. Gene symbols - can be returned from getGenesFromPanelApp.

seed

num scalar. Random seed for reproducibility.

sizes

int vector. Sizes to be used in the recursive feature elimination caret::rfe.

k

int scalar. Factor by which to split training set for k-fold cross validation.

trnProp

num scalar. Between 0-1 - proportion of disease genes to keep when bootstrapping.

repeats

int scalar. Number of times you want to bootstrap/iterate. For each iteration, featureSelection will compute rfe on an random proportion (trnProp) of disease genes and a random size-matched set of controls.

gacontrols

Value

list of length repeats. Each element contains an rfe class object fitted for a set of randomly sampled disease and control genes.

See Also

For more details on rfe: http://topepo.github.io/caret/recursive-feature-elimination.html

Examples

1
2
3
genes <- getGenesFromPanelApp(disorder="Neurology and neurodevelopmental disorders",
  panel="Parkinson Disease and Complex Parkinsonism", color = "green")
featureSelection(genes, controls = "allgenome")

juanbot/G2PML documentation built on Aug. 1, 2020, 5:07 a.m.