splendid: Ensemble framework for Supervised Learning classification...

Description Usage Arguments Details Value Algorithms Author(s) Examples

View source: R/splendid.R

Description

Supervised learning classification algorithms are performed on bootstrap replicates and an ensemble classifier is built and evaluated across these variants.

Usage

1
2
3
4
splendid(data, class, algorithms = NULL, n = 1, seed_boot = NULL,
  seed_alg = NULL, convert = FALSE, rfe = FALSE, ova = FALSE,
  standardize = FALSE, plus = TRUE, threshold = 0, trees = 100,
  tune = FALSE, top = 3, seed_rank = 1, sequential = FALSE)

Arguments

data

data frame with rows as samples, columns as features

class

true/reference class vector used for supervised learning

algorithms

character vector of algorithms to use for supervised learning. See Algorithms section for possible options. By default, this argument is NULL, in which case all algorithms are used.

n

number of bootstrap replicates to generate

seed_boot

random seed used for reproducibility in bootstrapping training sets for model generation

seed_alg

random seed used for reproducibility when running algorithms with an intrinsic random element (random forests)

convert

logical; if TRUE, converts all categorical variables in data to dummy variables. Certain algorithms only work with such limitations (e.g. LDA).

rfe

logical; if TRUE, run Recursive Feature Elimination as a feature selection method for "lda", "rf", and "svm" algorithms.

ova

logical; if TRUE, a One-Vs-All classification approach is performed for every algorithm in algorithms. The relevant results are prefixed with the string ova_.

standardize

logical; if TRUE, the training sets are standardized on features to have mean zero and unit variance. The test sets are standardized using the vectors of centers and standard deviations used in corresponding training sets.

plus

logical; if TRUE (default), the .632+ estimator is calculated. Otherwise, the .632 estimator is calculated.

threshold

a number between 0 and 1 indicating the lowest maximum class probability below which a sample will be unclassified.

trees

number of trees to use in "rf" or boosting iterations (trees) in "adaboost"

tune

logical; if TRUE, algorithms with hyperparameters are tuned

top

the number of highest-performing algorithms to retain for ensemble

seed_rank

random seed used for reproducibility in rank aggregation of ensemble algorithms

sequential

logical; if TRUE, a sequential model is fit on the algorithms that had the best performance with one-vs-all classification.

Details

Training sets are bootstrap replicates of the original data sampled with replacement. Test sets comprise of all remaining samples left out from each training set, also called Out-Of-Bag samples. This framework uses the 0.632 bootstrap rule for large n.

An ensemble classifier is constructed using Rank Aggregation across multiple evaluation measures such as precision, recall, F1-score, and Matthew's Correlation Coefficient (MCC).

Value

A nested list with five elements

Algorithms

The classification algorithms currently supported are:

Author(s)

Derek Chiu

Examples

1
2
3
4
5
6
## Not run: 
data(hgsc)
class <- attr(hgsc, "class.true")
sl_result <- splendid(hgsc, class, n = 2, algorithms = c("lda", "xgboost"))

## End(Not run)

AlineTalhouk/splendid documentation built on Aug. 30, 2018, 7:54 a.m.