featureSelection: Which features are most important for your gene list?
In juanbot/G2PML:

Description Usage Arguments Value See Also Examples

featureSelection computes the best features that discriminate between your list of disease genes and control genes. Uses bootstrapping to form balanced sets of disease and non-disease genes then selects the best features based on a random forest algorithm implemented through the caret::rfe function.

1
2
3

featureSelection(genes = NULL, seed = 12345, sizes = c(5, 10, 20),
  k = 5, controls = "allghosh", trnProp = 0.9, repeats = 10,
  gacontrols = -1)

`genes`	chr vector. Gene symbols - can be returned from `getGenesFromPanelApp`.
`seed`	num scalar. Random seed for reproducibility.
`sizes`	int vector. Sizes to be used in the recursive feature elimination `caret::rfe`.
`k`	int scalar. Factor by which to split training set for k-fold cross validation.
`trnProp`	num scalar. Between 0-1 - proportion of disease genes to keep when bootstrapping.
`repeats`	int scalar. Number of times you want to bootstrap/iterate. For each iteration, `featureSelection` will compute rfe on an random proportion (trnProp) of disease genes and a random size-matched set of controls.
`gacontrols`

list of length repeats. Each element contains an rfe class object fitted for a set of randomly sampled disease and control genes.

For more details on rfe: http://topepo.github.io/caret/recursive-feature-elimination.html

1
2
3

genes <- getGenesFromPanelApp(disorder="Neurology and neurodevelopmental disorders",
  panel="Parkinson Disease and Complex Parkinsonism", color = "green")
featureSelection(genes, controls = "allgenome")