library("mlr") library("BBmisc") library("ParamHelpers") # show grouped code output instead of single lines knitr::opts_chunk$set(collapse = TRUE)
The following classes provide a unified interface to all popular machine learning methods in R:
(cost-sensitive) classification, regression, survival analysis, and clustering.
Many are already integrated in mlr
, others are not, but the package is specifically designed to make extensions simple.
Section integrated learners{target="_blank"} shows the already implemented machine learning methods and their properties. If your favorite method is missing, either open an issue or take a look at how to integrate a learning method yourself{target="_blank"}. This basic introduction demonstrates how to use already implemented learners.
A learner in mlr
is generated by calling makeLearner()
.
In the constructor you need to specify which learning method you want to use.
Moreover, you can:
# Classification tree, set it up for predicting probabilities classif.lrn = makeLearner("classif.randomForest", predict.type = "prob", fix.factors.prediction = TRUE) # Regression gradient boosting machine, specify hyperparameters via a list regr.lrn = makeLearner("regr.gbm", par.vals = list(n.trees = 500, interaction.depth = 3)) # Cox proportional hazards model with custom name surv.lrn = makeLearner("surv.coxph", id = "cph") # K-means with 5 clusters cluster.lrn = makeLearner("cluster.kmeans", centers = 5) # Multilabel Random Ferns classification algorithm multilabel.lrn = makeLearner("multilabel.rFerns")
The first argument specifies which algorithm to use.
The naming convention is classif.<R_method_name>
for classification methods, regr.<R_method_name>
for regression methods, surv.<R_method_name>
for survival analysis, cluster.<R_method_name>
for clustering methods, and multilabel.<R_method_name>
for multilabel classification.
Hyperparameter values can be specified either via the ...
argument or as a list
via par.vals
.
The first option is preferred as par.vals
is mainly used to declare hyperparameters that are set differently in mlr
compared to the defaults of the underlying model.
If you want to change a hyperparameter in mlr
by default that differs from the actual default, make sure to also add an entry in the "note"
slot of the learner.
This entry should describe the reason for the change.
Common ones are turning off automatic parallelization or changing logical arguments of the learner to enable a more conservative memory management.
Occasionally, factor
features may cause problems when fewer levels are present in the test data set than in the training data.
By setting fix.factors.prediction = TRUE
these are avoided by adding a factor level for missing data in the test data set.
Let's have a look at two of the learners created above.
classif.lrn surv.lrn
All generated learners are objects of class Learner (makeLearner()
).
This class contains the properties of the method, e.g., which types of features it can handle, what kind of output is possible during prediction, and whether multi-class problems, observations weights or missing values are supported.
As you might have noticed, there is currently no special learner class for cost-sensitive classification. For ordinary misclassification costs you can use standard classification methods. For example-dependent costs there are several ways to generate cost-sensitive learners from ordinary regression and classification learners. This is explained in greater detail in the section about cost-sensitive classification{target="_blank"}.
The Learner (makeLearner()
) object is a list
and the following elements contain information regarding the hyperparameters and the type of prediction.
# Get the configured hyperparameter settings that deviate from the defaults cluster.lrn$par.vals # Get the set of hyperparameters classif.lrn$par.set # Get the type of prediction regr.lrn$predict.type
Slot $par.set
is an object of class ParamSet
(ParamHelpers::makeParamSet()
).
It contains, among others, the type of hyperparameters (e.g., numeric, logical), potential default values and the range of allowed values.
Moreover, mlr
provides function getHyperPars()
or its alternative getLearnerParVals()
to access the current hyperparameter setting of a Learner, (makeLearner()
) and getParamSet()
to get a description of all possible settings.
These are particularly useful in case of wrapped Learner (makeLearner()
)s, for example if a learner is fused with a feature selection strategy, and both, the learner as well the feature selection method, have hyperparameters.
For details see the section on wrapped learners.
# Get current hyperparameter settings getHyperPars(cluster.lrn) # Get a description of all possible hyperparameter settings getParamSet(classif.lrn)
We can also use getParamSet()
or its alias getLearnerParamSet()
to get a quick overview about the available hyperparameters and defaults of a learning method without explicitly constructing it (by calling makeLearner()
).
getParamSet("classif.randomForest")
Functions for accessing a Learner's meta information are available in mlr
. We can use getLearnerId()
, getLearnerShortName()
and getLearnerType()
to get Learner's ID, short name and type, respectively.
Moreover, in order to show the required packages for the Learner, one can call getLearnerPackages()
.
# Get object's id getLearnerId(surv.lrn) # Get the short name getLearnerShortName(classif.lrn) # Get the type of the learner getLearnerType(multilabel.lrn) # Get required packages getLearnerPackages(cluster.lrn)
There are also some functions that enable you to change certain aspects of a Learner (makeLearner()
) without needing to create a new Learner (makeLearner()
) from scratch.
Here are some examples.
# Change the ID surv.lrn = setLearnerId(surv.lrn, "CoxModel") surv.lrn # Change the prediction type, predict a factor with class labels instead of probabilities classif.lrn = setPredictType(classif.lrn, "response") # Change hyperparameter values cluster.lrn = setHyperPars(cluster.lrn, centers = 4) # Go back to default hyperparameter values regr.lrn = removeHyperPars(regr.lrn, c("n.trees", "interaction.depth"))
A list of all learners integrated in mlr
and their respective properties is shown in the Appendix{target="_blank"}.
If you would like a list of available learners, maybe only with certain properties or suitable for a certain learning Task()
use function listLearners()
.
# List everything in mlr lrns = listLearners() head(lrns[c("class", "package")]) # List classifiers that can output probabilities lrns = listLearners("classif", properties = "prob") head(lrns[c("class", "package")]) # List classifiers that can be applied to iris (i.e., multiclass) and output probabilities lrns = listLearners(iris.task, properties = "prob") head(lrns[c("class", "package")]) # The calls above return character vectors, but you can also create learner objects head(listLearners("cluster", create = TRUE), 2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.