Description Usage Arguments Value Author(s) References Examples
View source: R/fs.stability.v2.R
Applies models to high-dimensional data to both classify and determine important features for classification. The function bootstraps a user-specified number of times to facilitate stability metrics of features selected thereby providing an important metric for biomarker investigations, namely whether the important variables can be identified if the models are refit on 'different' data.
1 2 3 4 5 6 7 | fs.stability(X, Y, method, k = 10, p = 0.9, f = NULL,
stability.metric = "jaccard", optimize = TRUE,
optimize.resample = FALSE, tuning.grid = NULL, k.folds = if (optimize)
10 else NULL, repeats = if (k.folds == "LOO") NULL else if (optimize) 3 else
NULL, resolution = if (is.null(tuning.grid) && optimize) 3 else NULL,
metric = "Accuracy", model.features = FALSE, allowParallel = FALSE,
verbose = "none", ...)
|
X |
A scaled matrix or dataframe containing numeric values of each feature |
Y |
A factor vector containing group membership of samples |
method |
A vector listing models to be fit.
Available options are |
k |
Number of bootstrapped interations |
p |
Percent of data to by 'trained' |
f |
Number of features desired.
If rank correlation is desired, set |
stability.metric |
string indicating the type of stability metric.
Avialable options are |
optimize |
Logical argument determining if each model should
be optimized. Default |
optimize.resample |
Logical argument determining if each resample
should be re-optimized. Default |
tuning.grid |
Optional list of grids containing parameters to optimize
for each algorithm. Default |
k.folds |
Number of folds generated during cross-validation.
May optionally be set to |
repeats |
Number of times cross-validation repeated.
Default |
resolution |
Resolution of model optimization grid.
Default |
metric |
Criteria for model optimization. Available options
are |
model.features |
Logical argument if should have number of
features selected to be determined by the individual model runs.
Default |
allowParallel |
Logical argument dictating if parallel processing
is allowed via foreach package. Default |
verbose |
Character argument specifying how much output progress to print. Options are 'none', 'minimal' or 'full'. |
... |
Extra arguments that the user would like to apply to the models |
Methods |
Vector of models fit to data |
performance |
Performance metrics of each model and bootstrap iteration |
RPT |
Robustness-Performance Trade-Off as defined in Saeys 2008 |
features |
List concerning features determined via each algorithms feature selection criteria. |
metric: Stability metric applied
features: Matrix of selected features
stability: Matrix of pairwise comparions and average stability
stability.models |
Function perturbation metric - i.e. how similar are the features selected by each model. |
original.best.tunes |
If |
final.best.tunes |
If |
specs |
List with the following elements: |
total.samples: Number of samples in original dataset
number.features: Number of features in orginal dataset
number.groups: Number of groups
group.levels: The specific levels of the groups
number.observations.group: Number of observations in each group
Charles Determan Jr
Saeys Y., Abeel T., et. al. (2008) Machine Learning and Knowledge Discovery in Databases. 313-325. http://link.springer.com/chapter/10.1007/978-3-540-87481-2_21
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | dat.discr <- create.discr.matrix(
create.corr.matrix(
create.random.matrix(nvar = 50,
nsamp = 100,
st.dev = 1,
perturb = 0.2)),
D = 10
)
vars <- dat.discr$discr.mat
groups <- dat.discr$classes
fits <- fs.stability(vars,
groups,
method = c("plsda", "rf"),
f = 10,
k = 3,
k.folds = 10,
verbose = 'none')
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.