The package CORElearn is an R port of CORElearn data mining system. This document is a short description of the C++ part which can also serve as a standalone Linux or Windows data mining system, its organization and main classes and data structures.
The C++ part is called from R functions collected in file
The C++ functions called from R and providing interface to R are collected in
Rconvert.cpp. The front end for standalone version is in file
For many parts of the code there are two variants, classification and regression one.
Regression part usually has
Reg somewhere in its name.
The main classes are
marray, mmatrix are templates for storing vectors and matrixes
dataStore contains data storage and data manipulation methods, of which the most important are
mmatrix<int> DiscData, DiscPredictData contain values of discrete attributes and class for training and prediction (optional).
In classification column 0 always stores class values.
mmatrix<double> ContData, ContPredictData contain values of numeric attribute and prediction values for training and prediction (optional).
In regression column 0 always stores target values.
marray<attribute> AttrDesc with information about attributes' types, number of values, min, max, column index in DiscData or ContData, ...
estimation, estimationReg evaluate attributes with different purposes: decision/regression tree splitting, binarization,
discretization, constructive induction, feature selection, etc. Because of efficiency these classes store its own data in
mmatrix<int> DiscValues containing discrete attributes and class values,
mmatrix<double> ContValues containing numeric attribute and prediction values.
Options stores and handles all the parameters of the system.
featureTree, regressionTree build all the models, predict with them, and create output.