mlearning-package: Machine Learning Algorithms with Unified Interface and...

mlearning-packageR Documentation

Machine Learning Algorithms with Unified Interface and Confusion Matrices

Description

This package provides wrappers around several existing machine learning algorithms in R, under a unified user interface. Confusion matrices can also be calculated and viewed as tables or plots. Key features are:

  • Unified, formula-based interface for all algorithms, similar to stats::lm().

  • Optimized code when a simplified formula y ~ . is used, meaning all variables in data are used (one of them (y here) is the class to be predicted (classification problem, a factor variable), or the dependent variable of the model (regression problem, a numeric variable).

  • Similar way of dealing with missing data, both in the training set and in predictions. Underlying algorithms deal differently with missing data. Some accept them, other not.

  • Unified way of dealing with factor levels that have no cases in the training set. The training succeeds, but the classifier is, of course, unable to classify items in the missing class.

  • The predict() methods have similar arguments. They return the class, membership to the classes, both, or something else (probabilities, raw predictions, ...) depending on the algorithm or the problem (classification or regression).

  • The cvpredict() method is available for all algorithms and it performs very easily a cross-validation, or even a leave_one_out validation (when cv.k = number of cases). It operates transparently for the end-user.

  • The confusion() method creates a confusion matrix and the object can be printed, summarized, plotted. Various metrics are easily derived from the confusion matrix. Also, it allows to adjust prior probabilities of the classes in a classification problem, in order to obtain more representative estimates of the metrics when priors are adjusted to values closes to real proportions of classes in the data.

See mlearning() for further explanations and an example analysis. See mlLda() for examples of the different forms of the formula that can be used. See plot.confusion() for the different ways to explore the confusion matrix.

Important functions

  • ml_lda(), ml_qda(), ml_naive_bayes(), ml_knn(), ml_lvq(), ml_nnet(), ml_rpart(), ml_rforest() and ml_svm() to train classifiers or regressors with the different algorithms that are supported in the package,

  • predict() and cvpredict() for predictions, including using cross-validation,

  • confusion() to calculate the confusion matrix (with various methods to analyze it and to calculate derived metrics like recall, precision, F-score, ...)

  • prior() to adjust prior probabilities,

  • response() and train() to extract response and training variables from an mlearning object.


mlearning documentation built on Aug. 31, 2023, 1:09 a.m.