Description Usage Arguments Value Author(s) References Examples
View source: R/fs.ensembl.stability.R
Applies ensembles of models to highdimensional data to both classify and determine important features for classification. The function bootstraps a userspecified number of times to facilitate stability metrics of features selected thereby providing an important metric for biomarker investigations, namely whether the important variables can be identified if the models are refit on 'different' data.
1 2 3 4 5 6 7  fs.ensembl.stability(X, Y, method, k = 10, p = 0.9,
f = ceiling(ncol(X)/10), bags = 40, aggregation.metric = "CLA",
stability.metric = "jaccard", optimize = TRUE,
optimize.resample = FALSE, tuning.grid = NULL, k.folds = if (optimize)
10 else NULL, repeats = if (k.folds == "LOO") NULL else if (optimize) 3 else
NULL, resolution = if (optimize) 3 else NULL, metric = "Accuracy",
model.features = FALSE, allowParallel = FALSE, verbose = "none", ...)

X 
A matrix containing numeric values of each feature 
Y 
A factor vector containing group membership of samples 
method 
A vector listing models to be fit.
Available options are 
k 
Number of bootstrapped interations 
p 
Percent of data to by 'trained' 
f 
Number of features desired. Default is top 10

bags 
Number of iterations for ensemble bagging. Default

aggregation.metric 
String indicating which aggregation metric
for features selected during bagging. Avialable options are 
stability.metric 
string indicating the type of stability metric.
Avialable options are 
optimize 
Logical argument determining if each model should
be optimized. Default 
optimize.resample 
Logical argument determining if each resample
should be reoptimized. Default 
tuning.grid 
Optional list of grids containing parameters to optimize
for each algorithm. Default 
k.folds 
Number of folds generated during crossvalidation. May
optionally be set to 
repeats 
Number of times crossvalidation repeated.
Default 
resolution 
Optional  Resolution of model optimization grid.
Default 
metric 
Criteria for model optimization. Available options are

model.features 
Logical argument if should have number of features
selected to be determined by the individual model runs. Default

allowParallel 
Logical argument dictating if parallel processing is
allowed via foreach package. Default 
verbose 
Character argument specifying how much output progress to print. Options are 'none', 'minimal' or 'full'. 
... 
Extra arguments that the user would like to apply to the models 
Methods 
Vector of models fit to data 
performance 
Performance metrics of each model and bootstrap iteration 
RPT 
RobustnessPerformance TradeOff as defined in Saeys 2008 
features 
List concerning features determined via each algorithms feature selection criteria. 
metric: Stability metric applied
features: Matrix of selected features
stability: Matrix of pairwise comparions and average stability
stability.models 
Function perturbation metric  i.e. how similar are the features selected by each model. 
all.tunes 
If 
final.best.tunes 
If 
specs 
List with the following elements: 
total.samples: Number of samples in original dataset
number.features: Number of features in orginal dataset
number.groups: Number of groups
group.levels: The specific levels of the groups
number.observations.group: Number of observations in each group
Charles Determan Jr
Saeys Y., Abeel T., et. al. (2008) Machine Learning and Knowledge Discovery in Databases. 313325. http://link.springer.com/chapter/10.1007/9783540874812_21
1 2 3 4 5 6 7 8 9 10  ## Not run:
fits < fs.ensembl.stability(vars,
groups,
method = c("plsda", "rf"),
f = 10,
k = 3,
k.folds = 10,
verbose = 'none')
## End(Not run)

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.