knitr::opts_chunk$set( collapse = TRUE, comment = "#>" # fig.path = "Readme_files/" ) library(compboost)
We use the titanic dataset with binary
classification on survived
. First of all we store the train and test data
in two data frames and remove all rows that contains NA
s:
# Store train and test data: df_train = na.omit(titanic::titanic_train) str(df_train)
In the next step we transform the response to a factor with more intuitive levels:
df_train$Survived = factor(df_train$Survived, labels = c("no", "yes"))
Due to the R6
API it is necessary to create a new class object which gets the data, the target as character, and the used loss. Note that it is important to give an initialized loss object:
cboost = Compboost$new(data = df_train, target = "Survived", oob_fraction = 0.3)
Use an initialized object for the loss gives the opportunity to use a loss initialized with a custom offset.
Adding new base-learners is also done by giving a character to indicate the feature. As second argument it is important to name an identifier for the factory since we can define multiple base-learner on the same source.
For instance, we can define a spline and a linear base-learner of the same feature:
# Spline base-learner of age: cboost$addBaselearner("Age", "spline", BaselearnerPSpline) # Linear base-learner of age (degree = 1 with intercept is default): cboost$addBaselearner("Age", "linear", BaselearnerPolynomial)
Additional arguments can be specified after naming the base-learner:
# Spline base-learner of fare: cboost$addBaselearner("Fare", "spline", BaselearnerPSpline, degree = 2, n_knots = 14, penalty = 10, differences = 2)
For references to the base learner documentation see functionality at the project page.
When adding categorical features we use a dummy coded representation with a ridge penalty:
cboost$addBaselearner("Sex", "categorical", BaselearnerCategoricalRidge)
Finally, we can check what factories are registered:
cboost$getBaselearnerNames()
This logger logs the elapsed time. The time unit can be one of microseconds
, seconds
or minutes
. The logger stops if max_time
is reached. But we do not use that logger as stopper here:
cboost$addLogger(logger = LoggerTime, use_as_stopper = FALSE, logger_id = "time", max_time = 0, time_unit = "microseconds")
cboost$train(2000, trace = 250) cboost
Objects of the Compboost
class do have member functions such as getCoef()
, getInbagRisk()
or predict()
to access the results:
str(cboost$getCoef()) str(cboost$getInbagRisk()) str(cboost$predict())
To obtain a vector of selected base learners use getSelectedBaselearner()
:
table(cboost$getSelectedBaselearner())
We can also access predictions directly from the response object cboost$response
and cboost$response_oob
. Note that $response_oob
was created automatically when defining an oob_fraction
within the constructor:
oob_label = cboost$response_oob$getResponse() oob_pred = cboost$response_oob$getPredictionResponse() table(true_label = oob_label, predicted = oob_pred)
To continue the training or set the whole model to another iteration simply re-call train()
:
cboost$train(3000) str(cboost$getCoef()) str(cboost$getInbagRisk()) table(cboost$getSelectedBaselearner())
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.