splendid | R Documentation |
Supervised learning classification algorithms are performed on bootstrap replicates and an ensemble classifier is built and evaluated across these variants.
splendid(
data,
class,
algorithms = NULL,
n = 1,
seed_boot = NULL,
seed_samp = NULL,
seed_alg = NULL,
convert = FALSE,
rfe = FALSE,
ova = FALSE,
standardize = FALSE,
sampling = c("none", "up", "down", "smote"),
stratify = FALSE,
plus = TRUE,
threshold = 0,
trees = 100,
tune = FALSE,
vi = FALSE,
top = 3,
seed_rank = 1,
sequential = FALSE
)
data |
data frame with rows as samples, columns as features |
class |
true/reference class vector used for supervised learning |
algorithms |
character vector of algorithms to use for supervised
learning. See Algorithms section for possible options. By default,
this argument is |
n |
number of bootstrap replicates to generate |
seed_boot |
random seed used for reproducibility in bootstrapping training sets for model generation |
seed_samp |
random seed used for reproducibility in subsampling training sets for model generation |
seed_alg |
random seed used for reproducibility when running algorithms with an intrinsic random element (random forests) |
convert |
logical; if |
rfe |
logical; if |
ova |
logical; if |
standardize |
logical; if |
sampling |
the default is "none", in which no subsampling is performed. Other options include "up" (Up-sampling the minority class), "down" (Down-sampling the majority class), and "smote" (synthetic points for the minority class and down-sampling the majority class). Subsampling is only applicable to the training set. |
stratify |
logical; if |
plus |
logical; if |
threshold |
a number between 0 and 1 indicating the lowest maximum class probability below which a sample will be unclassified. |
trees |
number of trees to use in "rf" or boosting iterations (trees) in "adaboost" |
tune |
logical; if |
vi |
logical; if |
top |
the number of highest-performing algorithms to retain for ensemble |
seed_rank |
random seed used for reproducibility in rank aggregation of ensemble algorithms |
sequential |
logical; if |
Training sets are bootstrap replicates of the original data sampled with replacement. Test sets comprise of all remaining samples left out from each training set, also called Out-Of-Bag samples. This framework uses the 0.632 bootstrap rule for large n.
An ensemble classifier is constructed using Rank Aggregation across multiple evaluation measures such as precision, recall, F1-score, and Matthew's Correlation Coefficient (MCC).
A nested list with five elements
models
: A list with an element for each algorithm, each of which is a
list with length n
. Shows the model object for each algorithm and bootstrap
replicate on the training set.
preds
: A list with an element for each algorithm, each of which is a list
with length n
. Shows the predicted classes for each algorithm and bootstrap
replicate on the test set.
evals
: For each bootstrap sample, we can calculate various evaluation
measures for the predicted classes from each algorithm. Evaluation measures
include macro-averaged precision/recall/F1-score, micro-averaged precision,
and (micro-averaged MCC) The return value of eval
is a tibble that shows
some summary statistics (e.g. mean, median) of the evaluation measures across
bootstrap samples, for each classification algorithm.
bests
: best-performing algorithm for each bootstrapped replicate of the
data, chosen by rank aggregation.
ensemble_algs
: tallies the frequencies in bests
, returning the top
algorithms chosen.
ensemble
: list of model fits for each of the algorithms in
ensemble_algs
, fit on the full data.
The classification algorithms currently supported are:
Prediction Analysis for Microarrays ("pam")
Support Vector Machines ("svm")
Random Forests ("rf")
Linear Discriminant Analysis ("lda")
Shrinkage Linear Discriminant Analysis ("slda")
Shrinkage Diagonal Discriminant Analysis ("sdda")
Multinomial Logistic Regression using
Generalized Linear Model with no penalization ("mlr_glm")
GLM with LASSO penalty ("mlr_lasso")
GLM with ridge penalty ("mlr_ridge")
GLM with elastic net penalty ("mlr_enet")
Neural Networks ("mlr_nnet")
Neural Networks ("nnet")
Naive Bayes ("nbayes")
Adaptive Boosting ("adaboost")
AdaBoost.M1 ("adaboost_m1")
Extreme Gradient Boosting ("xgboost")
K-Nearest Neighbours ("knn")
Derek Chiu
## Not run:
data(hgsc)
class <- attr(hgsc, "class.true")
sl_result <- splendid(hgsc, class, n = 2, algorithms = c("lda", "xgboost"))
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.