Internal structures of CORElearn C++ part
The package CORElearn is an R port of CORElearn data mining system. This document is a short description of the C++ part which can also serve as a standalone Linux or Windows data mining system, its organization and main classes and data structures.
The C++ part is called from R functions collected in file
The C++ functions called from R and providing interface to R are collected in
Rconvert.cpp. The front end for standalone version is in file
For many parts of the code there are two variants, classification and regression one.
Regression part usually has
Reg somewhere in its name.
The main classes are
marray, mmatrixare templates for storing vectors and matrixes
dataStorecontains data storage and data manipulation methods, of which the most important are
mmatrix<int> DiscData, DiscPredictDatacontain values of discrete attributes and class for training and prediction (optional). In classification column 0 always stores class values.
mmatrix<double> ContData, ContPredictDatacontain values of numeric attribute and prediction values for training and prediction (optional). In regression column 0 always stores target values.
marray<attribute> AttrDescwith information about attributes' types, number of values, min, max, column index in DiscData or ContData, ...
estimation, estimationRegevaluate attributes with different purposes: decision/regression tree splitting, binarization, discretization, constructive induction, feature selection, etc. Because of efficiency these classes store its own data in
mmatrix<int> DiscValuescontaining discrete attributes and class values,
mmatrix<double> ContValuescontaining numeric attribute and prediction values.
Optionsstores and handles all the parameters of the system.
featureTree, regressionTreebuild all the models, predict with them, and create output.