Description Usage Arguments Value Author(s) References Examples
View source: R/fs.ensembl.stability.R
Applies ensembles of models to high-dimensional data to both classify and determine important features for classification. The function bootstraps a user-specified number of times to facilitate stability metrics of features selected thereby providing an important metric for biomarker investigations, namely whether the important variables can be identified if the models are refit on 'different' data.
1 2 3 4 5 6 7 | fs.ensembl.stability(X, Y, method, k = 10, p = 0.9,
f = ceiling(ncol(X)/10), bags = 40, aggregation.metric = "CLA",
stability.metric = "jaccard", optimize = TRUE,
optimize.resample = FALSE, tuning.grid = NULL, k.folds = if (optimize)
10 else NULL, repeats = if (k.folds == "LOO") NULL else if (optimize) 3 else
NULL, resolution = if (optimize) 3 else NULL, metric = "Accuracy",
model.features = FALSE, allowParallel = FALSE, verbose = "none", ...)
|
X |
A matrix containing numeric values of each feature |
Y |
A factor vector containing group membership of samples |
method |
A vector listing models to be fit.
Available options are |
k |
Number of bootstrapped interations |
p |
Percent of data to by 'trained' |
f |
Number of features desired. Default is top 10
|
bags |
Number of iterations for ensemble bagging. Default
|
aggregation.metric |
String indicating which aggregation metric
for features selected during bagging. Avialable options are |
stability.metric |
string indicating the type of stability metric.
Avialable options are |
optimize |
Logical argument determining if each model should
be optimized. Default |
optimize.resample |
Logical argument determining if each resample
should be re-optimized. Default |
tuning.grid |
Optional list of grids containing parameters to optimize
for each algorithm. Default |
k.folds |
Number of folds generated during cross-validation. May
optionally be set to |
repeats |
Number of times cross-validation repeated.
Default |
resolution |
Optional - Resolution of model optimization grid.
Default |
metric |
Criteria for model optimization. Available options are
|
model.features |
Logical argument if should have number of features
selected to be determined by the individual model runs. Default
|
allowParallel |
Logical argument dictating if parallel processing is
allowed via foreach package. Default |
verbose |
Character argument specifying how much output progress to print. Options are 'none', 'minimal' or 'full'. |
... |
Extra arguments that the user would like to apply to the models |
Methods |
Vector of models fit to data |
performance |
Performance metrics of each model and bootstrap iteration |
RPT |
Robustness-Performance Trade-Off as defined in Saeys 2008 |
features |
List concerning features determined via each algorithms feature selection criteria. |
metric: Stability metric applied
features: Matrix of selected features
stability: Matrix of pairwise comparions and average stability
stability.models |
Function perturbation metric - i.e. how similar are the features selected by each model. |
all.tunes |
If |
final.best.tunes |
If |
specs |
List with the following elements: |
total.samples: Number of samples in original dataset
number.features: Number of features in orginal dataset
number.groups: Number of groups
group.levels: The specific levels of the groups
number.observations.group: Number of observations in each group
Charles Determan Jr
Saeys Y., Abeel T., et. al. (2008) Machine Learning and Knowledge Discovery in Databases. 313-325. http://link.springer.com/chapter/10.1007/978-3-540-87481-2_21
1 2 3 4 5 6 7 8 9 10 | ## Not run:
fits <- fs.ensembl.stability(vars,
groups,
method = c("plsda", "rf"),
f = 10,
k = 3,
k.folds = 10,
verbose = 'none')
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.