Compboost | R Documentation |
Fit a component-wise boosting model (Buehlmann (2003)).
This class wraps the S4
class system with Compboost_internal
as internal model representation exposed by Rcpp
.
The two convenient wrapper boostLinear()
and boostSplines()
are
also creating objects of this class.
Visualizing the internals see plotBaselearnerTraces()
, plotBaselearner()
, plotFeatureImportance()
,
plotPEUni()
, plotTensor()
, and plotRisk()
. Visualizing the contribution for
one new observation see plotIndividualContribution()
.
data
(data.frame()
)
The data used for training the model. Note: If oob_fraction
is set, the
input data is split into data
and data_oob
. Hence, data
contains a
subset of the input data to train the model.
data_oob
(data.frame()
)
An out-of-bag data set used for risk logging or early stopping. data_oob
is split from the input data (see the data
field).
oob_fraction
(numeric(1)
)
The fraction of nrow(input data)
defining the number of observations in
data_oob
.
response
(ResponseRegr | ResponseBinaryClassif)
A S4
response object. See ?ResponseRegr
or ?ResponseBinaryClassif
for help.
This object holds the current prediction, pseudo residuals and functions to
transform scores. Note: This response corresponds to the data
field and holds
the predictions for that data.frame
.
response_oob
(ResponseRegr | ResponseBinaryClassif)
A S4
response object. See ?ResponseRegr
or ?ResponseBinaryClassif
for help.
Same as response
but for data_oob
.
target
(character(1)
)
Name of the target variable in data
.
id
(character(1)
)
Name of the data object defined in $new(data, ...)
.
optimizer
(OptimizerCoordinateDescent | OptimizerCoordinateDescentLineSearch | OptimizerAGBM | OptimizerCosineAnnealing)
An initialized S4
optimizer object (requires to call Optimizer*$new(..)
.
See the respective help page for further information.
loss
(LossQuadratic | LossBinomial | LossHuber | LossAbsolute | LossQuantile)
An initialized S4
loss object (requires to call Loss*$new(...)
).
See the respective help page for further information.
learning_rate
(numeric(1)
)
The learning rate of the model. Note: Some optimizer do dynamically vary the learning rate.
model
(Compboost_internal)
The internal Compboost object exported from Rcpp
. See ?Compboost_internal
for details.
bl_factory_list
([BlearnerFactoryList)
A container with all base learners. See ?BlearnerFactoryList
for details.
positive
(character(1)
)
The positive class in the case of binary classification.
stop_all
(logical(1)
)
Indicator whether all stopper must return TRUE
to early stop the algorithm.
Comparable to all()
if stop_all = TRUE
and any()
if stop_all = FALSE
.
early_stop
(logical(1)
)
Indicator whether early stopping is used or not.
offset
(numeric()
)
Offset of the estimated model.
baselearner_list
(list()
)
Named list
with names $getBaselearnerNames()
. Each elements contains
"feature"
(character(1)
): The name of the feature from data
.
"factory"
(Baselearner*
): The raw base learner as factory
object. See ?Baselearner*
for details.
boost_intercept
(logical(1)
)
Logical value indicating whether an intercept base learner was added with $addIntercept()
or not.
logs
(data.frame
)
Basic information such as risk, selected base learner etc. about each iteration.
If oob_data
is set, further information about the validation/oob risk is also logged.
The same applies for time logging etc. Note: Using the field logs
internally is set and updated
after each call to $getLoggerData()
. Hence, it cashes the logged data set instead of
recalculating the data set as it is done for $getLoggerData()
.
idx_oob
(integer()
)
An index vector used to split data
into data = data[idx_train, ]
and data_oob = data[idx_oob, ]
.
Note: oob_fraction
is ignored if this argument is set.
idx_train
(integer()
)
An index vector used to split data
into data = data[idx_train, ]
and data_oob = data[idx_oob, ]
.
Note: oob_fraction
is ignored if this argument is set.
new()
Creates a new instance of this R6 class.
Compboost$new( data = NULL, target = NULL, optimizer = NULL, loss = NULL, learning_rate = 0.05, positive = NULL, oob_fraction = NULL, early_stop = FALSE, idx_oob = NULL, stop_args = list(eps_for_break = 0, patience = 10L), file = NULL )
data
(data.frame
)
The data set to build the object. Note: This data set is completely used for training if is.null(idx_oob)
.
Otherwise, the data set is split into data = data[idx_train, ]
and data_oob = data[idx_oob, ]
.
target
(character(1)
)
Character indicating the name of the target variable.
optimizer
(OptimizerCoordinateDescent | OptimizerCoordinateDescentLineSearch | OptimizerAGBM | OptimizerCosineAnnealing)
An initialized S4
optimizer object (requires to call Optimizer*.new(..)
.
See the respective help page for further information.
loss
(LossQuadratic | LossBinomial | LossHuber | LossAbsolute | LossQuantile)
An initialized S4
loss object (requires to call Loss*$new(...)
).
See the respective help page for further information.
learning_rate
(numeric(1)
)
Learning rate of the model (default is 0.05
).
positive
(character(1)
)
The name of the positive class (in the case of binary classification).
oob_fraction
(numeric(1)
)
The fraction of nrow(input data)
defining the number of observations in
data_oob
. This argument is ignored if idx_oob
is set.
early_stop
(logical(1)
)
Indicator whether early stopping should be used or not.
idx_oob
(integer()
)
An index vector used to split data
into data = data[idx_train, ]
and data_oob = data[idx_oob, ]
.
Note: oob_fraction
is ignored if this argument is set.
stop_args
(list(integer(1), integer(1))
)
list
containing two elements patience
and eps_for_break
which are used for early stopping.
file
(character(1
)
File from which a model should be loaded. If NULL
, data
and target
must be defined.
addLogger()
Add a logger to the model.
Compboost$addLogger(logger, use_as_stopper = FALSE, logger_id, ...)
logger
(LoggerIteration | LoggerTime | LoggerInbagRisk | LoggerOobRisk)
The uninitialized logger.
use_as_stopper
(logical(1)
)
Indicator defining the logger as stopper considering it for early stopping.
logger_id
(character(1)
)
The id of the logger. This allows to define two logger of the same type (e.g. risk logging
) but with different arguments.
...
Additional arguments passed to loger$new(logger_id, use_as_stopper, ...)
.
getCurrentIteration()
Get the number of the current iteration.
Compboost$getCurrentIteration()
integer(1)
value.
addIntercept()
This functions adds a base learner that adjusts the intercept (if selected). Adding an intercept base learner may be necessary, e.g., when adding linear effects without intercept.
Compboost$addIntercept(id = "intercept", data_source = InMemoryData)
id
(character(1)
)
The id of the base learner (default is "intercept"
).
data_source
(InMemoryData)
Uninitialized data object used to store the meta data. Note: At the moment, just in memory
storing is supported, see ?InMemorydata
for details.
data_source
(InMemoryData)
Uninitialized data object used to store the meta data. Note: At the moment, just in memory
storing is supported, see ?InMemorydata
for details.
addBaselearner()
Add a base learner of one feature to the model that is considered in each iteration.
Using $addBaselearner()
just allows including univariate features. See $addTensor()
for
bivariate effect modelling and $addComponents()
for an effect decomposition.
Compboost$addBaselearner( feature, id, bl_factory, data_source = InMemoryData, ... )
feature
(character(1)
)
Name of the feature, must be a column in data
.
feature
(character(1)
)
Name of the feature, must be a column in data
.
id
(character(1)
)
The name of the base learner.
bl_factory
(BaselearnerPolynomial | BaselearnerPSpline | BaselearnerCategoricalBinary | BaselearnerCategoricalRidge)
Uninitialized base learner class. See the respective help page for details.
data_source
(InMemoryData)
Uninitialized data object used to store the meta data. Note: At the moment, just in memory
storing is supported, see ?InMemorydata
for details.
data_source
(InMemoryData)
Uninitialized data object used to store the meta data. Note: At the moment, just in memory
storing is supported, see ?InMemorydata
for details.
...
Further argument spassed to the $new(...)
constructor of bl_factory
.
rmBaselearner()
Remove a base learner from the model.
Compboost$rmBaselearner(blname)
blname
(character(1)
)
Name of the base learner that should be removed. Must be an element of $getBaselearnerNames()
.
addTensor()
Add a row-wise tensor product of features. Note: The base learner are pre-defined
by the type of the feature. Numerical features uses a BaselearnerPSpline
while categorical
features are included using a BaselearnerCategoricalRidge
base learner.
To include an arbitrary tensor product requires to use the S4
API with using
BaselearnerTensor
on two base learners of any type.
Compboost$addTensor( feature1, feature2, df = NULL, df1 = NULL, df2 = NULL, isotrop = FALSE, ... )
feature1
(character(1)
)
Name of the first feature. Must be an element of names(data)
.
feature2
(character(1)
)
Name of the second feature. Must be an element of names(data)
.
df
(numeric(1)
)
The degrees of freedom used for both base learner (this parameter overwrites df1
and df2
).
df1
(numeric(1)
)
The degrees of freedom used for the first base learner.
df2
(numeric(1)
)
The degrees of freedom used for the first base learner.
isotrop
(logical(1)
)
Indicator how the two penalties should be combined, if isotrop == TRUE
,
the total degrees of freedom are uniformly distributed over the dimensions while
isotrop == FALSE
allows to define how strong each of the two dimensions is penalized.
...
Additional arguments passed to the $new()
constructor of the BaselearnerPSpline class.
addComponents()
Add an effect with individual components. A linear term is added as well as a non-linear term without the linear effect. This ensures that the linear component is selected prior to the non-linear effect. The non-linear effect is only included if a deviation from a linear effect is required.
Note: Internally, a BaselearnerPolynomial with degree one and a BaselearnerCentered is added. Centering a base learner makes the design matrix dense and hence memory is filled very fast. Considering binning may be an option to reduce the memory consumption.
Compboost$addComponents(feature, ...)
feature
(character(1)
)
Name of the feature, must be a column in data
.
feature
(character(1)
)
Name of the feature, must be a column in data
.
...
Additional arguments passed to the $new()
constructor of the BaselearnerPSpline class.
train()
Start fitting a model.
Compboost$train(iteration = 100, trace = -1)
iteration
(integer(1)
)
The maximal number of iteration. The algorithm can be stopped earlier
if early stopping is active.
trace
(integer(1)
)
The number of integers after which the status of the fitting is printed to the screen.
The default trace = -1
internally uses trace = round(iteration / 40)
.
To silently fit the model use trace = 0
.
prepareData()
Internally, each base learner is build on a InMemoryData object. Some
methods (e.g. adding a LoggerOobRisk) requires to pass the data as
list(InMemoryData | CategoricalDataRaw)
with data objects as elements.
This function converts the given data.frame
into that format.
Compboost$prepareData(newdata)
newdata
(data.frame
)
New data set of the same structure as data
.
newdata
(data.frame
)
New data set of the same structure as data
.
newdata
(data.frame
)
New data set of the same structure as data
.
newdata
(data.frame
)
New data set of the same structure as data
.
list(InMemoryData | CategoricalDataRaw)
with data container as elements.
Numeric features are wrapped by InMemoryData while categorical features
are included with CategoricalDataRaw.
prepareResponse()
Same as for $prepareData()
but for the response. Internally, vectorToResponse()
is
used to generate a ResponseRegr or ResponseBinaryClassif object.
Compboost$prepareResponse(response)
response
(vector()
)
A vector of type numberic
or categorical
that is transformed to an
response object.
ResponseRegr | ResponseBinaryClassif object.
predict()
Calculate predictions.
Compboost$predict(newdata = NULL, as_response = FALSE)
newdata
(data.frame
)
New data set of the same structure as data
.
newdata
(data.frame
)
New data set of the same structure as data
.
newdata
(data.frame
)
New data set of the same structure as data
.
newdata
(data.frame
)
New data set of the same structure as data
.
as_response
(logical(1)
)
In the case of binary classification, as_response = TRUE
returns predictions as
response, i.e. classes.
Vector of predictions.
predictIndividual()
While $predict()
returns the sum of all base learner predictions, this function
returns a list
with the predictions for each base learner.
Compboost$predictIndividual(newdata)
newdata
(data.frame
)
New data set of the same structure as data
.
newdata
(data.frame
)
New data set of the same structure as data
.
newdata
(data.frame
)
New data set of the same structure as data
.
newdata
(data.frame
)
New data set of the same structure as data
.
Named list()
with the included base learner names as names and the base learner
predictions as elements.
transformData()
Get design matrices of all (or a subset) base learners for a new data.frame
.
Compboost$transformData(newdata, blnames = NULL)
newdata
(data.frame
)
New data set of the same structure as data
.
newdata
(data.frame
)
New data set of the same structure as data
.
newdata
(data.frame
)
New data set of the same structure as data
.
newdata
(data.frame
)
New data set of the same structure as data
.
blnames
(character()
)
Names of the base learners for which the design matrices are returned. If
is.null(blnames)
, compboost tries to guess all base learners that were
constructed based on the feature names of newdata
.
list(matrix | Matrix::Matrix)
matrices as elements.
getInbagRisk()
Return the training risk of each iteration.
Compboost$getInbagRisk()
numeric()
vector of risk values or NULL
if $train()
was not called previously.
getSelectedBaselearner()
Get a vector with the name of the selected base learner of each iteration.
Compboost$getSelectedBaselearner()
character()
vector of base learner names.
print()
Printer of the object.
Compboost$print()
Invisibly returns the object.
getCoef()
Get the estimated coefficients.
Compboost$getCoef()
list(pars, offset)
with estimated coefficients/parameters and intercept/offset.
getEstimatedCoef()
DEPRICATED use $getCoef()
instead.
Get the estimated coefficients.
Compboost$getEstimatedCoef()
list(pars, offset)
with estimated coefficients/parameters and intercept/offset.
getBaselearnerNames()
Get the names of the registered base learners.
Compboost$getBaselearnerNames()
charcter()
of base learner names.
getLoggerData()
Get the logged information.
Compboost$getLoggerData()
data.frame
of logging information.
calculateFeatureImportance()
Calculate feature important based on the training risk. Note that early stopping should be used to get adequate importance measures.
Compboost$calculateFeatureImportance( num_feats = NULL, aggregate_bl_by_feat = FALSE )
num_feats
(integer(1)
)
The number considered features, the num_feats
most important feature names and
the respective value is returned. If num_feats = NULL
, all features are considered.
aggregate_bl_by_feat
(logical(1)
)
Indicator whether the importance is aggregated based on feature level. For example,
adding components included two different base learners for the same feature. If
aggregate_bl_by_feat == TRUE
, the importance of these two base learners is aggregated
instead of considering them individually.
Named numeric()
vector of length num_feats
(if at least num_feats
were selected)
with importance values as elements.
saveToJson()
Save a Compboost object to a JSON file. Because of the underlying C++
objects,
it is not possible to use R
's native load and save methods.
Compboost$saveToJson(file, rm_data = FALSE)
file
(character(1)
)
Name/path to the file.
rm_data
(logical(1)
)
Remove all data from the model. This applies to the training data, response, as well as
the test data and response used for the test risk logging. Note: If data is removed, no
continuation of the training is possible after reloading. Also, everything related to
predictions based on the training data throws an error.
clone()
The objects of this class are cloneable with this method.
Compboost$clone(deep = FALSE)
deep
Whether to make a deep clone.
Buehlmann, Peter, Yu, Bin (2003). “Boosting with the L2 loss: regression and classification.” Journal of the American Statistical Association, 98(462), 324–339. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1198/016214503000125")}.
cboost = Compboost$new(mtcars, "mpg", loss = LossQuadratic$new(), oob_fraction = 0.3)
cboost$addBaselearner("hp", "spline", BaselearnerPSpline, degree = 3,
n_knots = 10, df = 3, differences = 2)
cboost$addBaselearner("wt", "spline", BaselearnerPSpline)
cboost$train(1000, 0)
table(cboost$getSelectedBaselearner())
head(cboost$logs)
names(cboost$baselearner_list)
# Access information about the a base learner in the list:
cboost$baselearner_list$hp_spline$factory$getDF()
cboost$baselearner_list$hp_spline$factory$getPenalty()
plotBaselearner(cboost, "hp_spline")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.