D2MCS | R Documentation |
The class is responsible of managing the whole process. Concretely builds the M.L. models (optimizes models hyperparameters), selects the best M.L. model for each cluster and executes the classification stage.
new()
The function is used to initialize all parameters needed to build a Multiple Classifier System.
D2MCS$new( dir.path, num.cores = NULL, socket.type = "PSOCK", outfile = NULL, serialize = FALSE )
dir.path
A character defining location were the trained models should be saved.
num.cores
An optional numeric value specifying the number of CPU cores used for training the models (only if parallelization is allowed). If not defined (num.cores - 2) cores will be used.
socket.type
A character value defining the type of socket
used to communicate the workers. The default type, "PSOCK"
, calls
makePSOCKcluster. Type "FORK"
calls makeForkCluster. For more
information see makeCluster
outfile
Where to direct the stdout and stderr connection output from the workers. "" indicates no redirection (which may only be useful for workers on the local machine). Defaults to '/dev/null'
serialize
A logical
value. If TRUE (default)
serialization will use XDR: where large amounts of data are to be
transferred and all the nodes are little-endian, communication may be
substantially faster if this is set to false.
train()
The function is responsible of performing the M.L. model training stage.
D2MCS$train( train.set, train.function, num.clusters = NULL, model.recipe = DefaultModelFit$new(), ex.classifiers = c(), ig.classifiers = c(), metrics = NULL, saveAllModels = FALSE )
train.set
A Trainset
object used as training input
for the M.L. models
train.function
A TrainFunction
defining the training
configuration options.
num.clusters
An numeric value used to define the number of
clusters from the Trainset
that should be utilized during
the training stage. If not defined all clusters will we taken into
account for training.
model.recipe
An unprepared recipe object inherited from
GenericModelFit
class.
ex.classifiers
A character vector containing the name of
the M.L. models used in training stage. See
getModelInfo
and
https://topepo.github.io/caret/available-models.html for more
information about all the available models.
ig.classifiers
A character vector containing the name of
the M.L. that should be ignored when performing the training stage. See
getModelInfo
and
https://topepo.github.io/caret/available-models.html for more
information about all the available models.
metrics
A character vector containing the metrics used to
perform the M.L. model hyperparameter optimization during the training
stage. See SummaryFunction
, UseProbability
and NoProbability
for more information.
saveAllModels
A logical parameter. A TRUE saves all trained models while A FALSE saves only the M.L. model achieving the best performance on each cluster.
A TrainOutput
object containing all the information
computed during the training stage.
classify()
The function is responsible for executing the classification stage.
D2MCS$classify(train.output, subset, voting.types, positive.class = NULL)
train.output
The TrainOutput
object computed in the
train stage.
subset
A Subset
containing the data to be classified.
voting.types
A list containing SingleVoting
or CombinedVoting
objects.
positive.class
An optional character parameter used to define the positive class value.
A ClassificationOutput
with all the values computed
during classification stage.
getAvailableModels()
The function obtains all the available M.L. models.
D2MCS$getAvailableModels()
A data.frame containing the information of the available M.L. models.
clone()
The objects of this class are cloneable with this method.
D2MCS$clone(deep = FALSE)
deep
Whether to make a deep clone.
Dataset
, Subset
, Trainset
# Specify the random number generation set.seed(1234) ## Create Dataset Handler object. loader <- DatasetLoader$new() ## Load 'hcc-data-complete-balanced.csv' dataset file. data <- loader$load(filepath = system.file(file.path("examples", "hcc-data-complete-balanced.csv"), package = "D2MCS"), header = TRUE, normalize.names = TRUE) ## Get column names data$getColumnNames() ## Split data into 4 partitions keeping balance ratio of 'Class' column. data$createPartitions(num.folds = 4, class.balance = "Class") ## Create a subset comprising the first 2 partitions for clustering purposes. cluster.subset <- data$createSubset(num.folds = c(1, 2), class.index = "Class", positive.class = "1") ## Create a subset comprising second and third partitions for trainning purposes. train.subset <- data$createSubset(num.folds = c(2, 3), class.index = "Class", positive.class = "1") ## Create a subset comprising last partitions for testing purposes. test.subset <- data$createSubset(num.folds = 4, class.index = "Class", positive.class = "1") ## Distribute the features into clusters using MCC heuristic. distribution <- SimpleStrategy$new(subset = cluster.subset, heuristic = MCCHeuristic$new()) distribution$execute() ## Get the best achieved distribution distribution$getBestClusterDistribution() ## Create a train set from the computed clustering distribution train.set <- distribution$createTrain(subset = train.subset) ## Not run: ## Initialization of D2MCS configuration parameters. ## - Defining training operation. ## + 10-fold cross-validation ## + Use only 1 CPU core. ## + Seed was set to ensure straightforward reproductivity of experiments. trFunction <- TwoClass$new(method = "cv", number = 10, savePredictions = "final", classProbs = TRUE, allowParallel = TRUE, verboseIter = FALSE, seed = 1234) #' ## - Specify the models to be trained ex.classifiers <- c("ranger", "lda", "lda2") ## Initialize D2MCS #' d2mcs <- D2MCS$new(dir.path = tempdir(), num.cores = 1) ## Execute training stage for using 'MCC' and 'PPV' measures to optimize model hyperparameters. trained.models <- d2mcs$train(train.set = train.set, train.function = trFunction, ex.classifiers = ex.classifiers, metrics = c("MCC", "PPV")) ## Execute classification stage using two different voting schemes predictions <- d2mcs$classify(train.output = trained.models, subset = test.subset, voting.types = c( SingleVoting$new(voting.schemes = c(ClassMajorityVoting$new(), ClassWeightedVoting$new()), metrics = c("MCC", "PPV")))) ## Compute the performance of each voting scheme using PPV and MMC measures. predictions$getPerformances(test.subset, measures = list(MCC$new(), PPV$new())) ## Execute classification stage using multiple voting schemes (simple and combined) predictions <- d2mcs$classify(train.output = trained.models, subset = test.subset, voting.types = c( SingleVoting$new(voting.schemes = c(ClassMajorityVoting$new(), ClassWeightedVoting$new()), metrics = c("MCC", "PPV")), CombinedVoting$new(voting.schemes = ClassMajorityVoting$new(), combined.metrics = MinimizeFP$new(), methodology = ProbBasedMethodology$new(), metrics = c("MCC", "PPV")))) ## Compute the performance of each voting scheme using PPV and MMC measures. predictions$getPerformances(test.subset, measures = list(MCC$new(), PPV$new())) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.