fs.ensembl.stability: Ensemble Classification & Feature Selection
In cdeterman/OmicsMarkeR: Classification and Feature Selection for 'Omics' Datasets

Description Usage Arguments Value Author(s) References Examples

Applies ensembles of models to high-dimensional data to both classify and determine important features for classification. The function bootstraps a user-specified number of times to facilitate stability metrics of features selected thereby providing an important metric for biomarker investigations, namely whether the important variables can be identified if the models are refit on 'different' data.

fs.ensembl.stability(X, Y, method, k = 10, p = 0.9,
  f = ceiling(ncol(X)/10), bags = 40, aggregation.metric = "CLA",
  stability.metric = "jaccard", optimize = TRUE,
  optimize.resample = FALSE, tuning.grid = NULL, k.folds = if (optimize)
  10 else NULL, repeats = if (k.folds == "LOO") NULL else if (optimize) 3 else
  NULL, resolution = if (optimize) 3 else NULL, metric = "Accuracy",
  model.features = FALSE, allowParallel = FALSE, verbose = "none", ...)

`X`	A matrix containing numeric values of each feature
`Y`	A factor vector containing group membership of samples
`method`	A vector listing models to be fit. Available options are `"plsda"` (Partial Least Squares Discriminant Analysis), `"rf"` (Random Forest), `"gbm"` (Gradient Boosting Machine), `"svm"` (Support Vector Machines), `"glmnet"` (Elastic-net Generalized Linear Model), and `"pam"` (Prediction Analysis of Microarrays)
`k`	Number of bootstrapped interations
`p`	Percent of data to by 'trained'
`f`	Number of features desired. Default is top 10 `"f = ceiling(ncol(variables)/10)"`. If rank correlation is desired, set `"f = NULL"`
`bags`	Number of iterations for ensemble bagging. Default `"bags = 40"`
`aggregation.metric`	String indicating which aggregation metric for features selected during bagging. Avialable options are `"CLA"` (Complete Linear), `"EM"` (Ensemble Mean), `"ES"` (Ensemble Stability), and `"EE"` (Ensemble Exponential)
`stability.metric`	string indicating the type of stability metric. Avialable options are `"jaccard"` (Jaccard Index/Tanimoto Distance), `"sorensen"` (Dice-Sorensen's Index), `"ochiai"` (Ochiai's Index), `"pof"` (Percent of Overlapping Features), `"kuncheva"` (Kuncheva's Stability Measures), `"spearman"` (Spearman Rank Correlation), and `"canberra"` (Canberra Distance)
`optimize`	Logical argument determining if each model should be optimized. Default `"optimize = TRUE"`
`optimize.resample`	Logical argument determining if each resample should be re-optimized. Default `"optimize.resample = FALSE"` - Only one optimization run, subsequent models use initially determined parameters
`tuning.grid`	Optional list of grids containing parameters to optimize for each algorithm. Default `"tuning.grid = NULL"` lets function create grid determined by `"res"`
`k.folds`	Number of folds generated during cross-validation. May optionally be set to `"LOO"` for leave-one-out cross-validation. Default `"k.folds = 10"`
`repeats`	Number of times cross-validation repeated. Default `"repeats = 3"`
`resolution`	Optional - Resolution of model optimization grid. Default `"res = 3"`
`metric`	Criteria for model optimization. Available options are `"Accuracy"` (Predication Accuracy), `"Kappa"` (Kappa Statistic), and `"AUC-ROC"` (Area Under the Curve - Receiver Operator Curve)
`model.features`	Logical argument if should have number of features selected to be determined by the individual model runs. Default `"model.features = FALSE"`
`allowParallel`	Logical argument dictating if parallel processing is allowed via foreach package. Default `allowParallel = FALSE`
`verbose`	Character argument specifying how much output progress to print. Options are 'none', 'minimal' or 'full'.
`...`	Extra arguments that the user would like to apply to the models

`Methods`	Vector of models fit to data
`performance`	Performance metrics of each model and bootstrap iteration
`RPT`	Robustness-Performance Trade-Off as defined in Saeys 2008
`features`	List concerning features determined via each algorithms feature selection criteria.

metric: Stability metric applied
features: Matrix of selected features
stability: Matrix of pairwise comparions and average stability

`stability.models`	Function perturbation metric - i.e. how similar are the features selected by each model.
`all.tunes`	If `"optimize.resample = TRUE"` then returns list of optimized parameters for each bagging and bootstrap interation.
`final.best.tunes`	If `"optimize.resample = TRUE"` then returns list of optimized parameters for each bootstrap of the bagged models refit to aggregated selected features.
`specs`	List with the following elements:

total.samples: Number of samples in original dataset
number.features: Number of features in orginal dataset
number.groups: Number of groups
group.levels: The specific levels of the groups
number.observations.group: Number of observations in each group

Charles Determan Jr

Saeys Y., Abeel T., et. al. (2008) Machine Learning and Knowledge Discovery in Databases. 313-325. http://link.springer.com/chapter/10.1007/978-3-540-87481-2_21

1 2 3 4 5 6 7 8 9 10	## Not run: fits <- fs.ensembl.stability(vars, groups, method = c("plsda", "rf"), f = 10, k = 3, k.folds = 10, verbose = 'none') ## End(Not run)