Description Usage Format Methods
This is the main super learner class. This class contains everything related to the super learner machine learning model.
1 |
An object of class R6ClassGenerator
of length 24.
initialize(SL.library.definition = c("ML.Local.lm", "ML.H2O.glm"), summaryMeasureGenerator, verbose = FALSE)
starts a new OnlineSuperLearner. The provided
SL.library.definition
contains the machine learning models to use.
@param SL.library.definition list a list of machine learning algorithms. This could be either a vector with with the name of each estimator or a list according to the libraryFactroy. Look in the LibraryFactory class for the specification of this list.
@param summaryMeasureGenerator SummaryMeasureGenerator an object of the type SummaryMeasureGenerator. This generator is used to get new observations with the correct aggregated columns.
@param should_fit_osl boolean (default = TRUE) should an instance of the OSL fit the online version of the osl?
@param should_fit_dosl boolean (default = TRUE) should an instance of the OSL fit the discrete online version of the osl?
@param pre_processor PreProcessor (default = NULL) an instance of the
PreProcessor
which is used to normalize the in and output values
for the OSL.
@param test_set_size integer (default = 1) the size of the test set to use.
@param verbose (default = FALSE) the verbosity (how much logging). Note that this might be propagated to other classes.
set_verbosity(verbosity)
Method that can change the verbosity of a superlearner instance
@param verbose the new verbosity (how much logging) to use.
fit(data, initial_data_size = 5, max_iterations = 20
Fits an instance of the OnlineSuperLearner class. This is the main
method used for training the online SuperLearner (and this will actually
trigger the training process). This will fit the provided
SL.library.definition
estimators as well as the
OnlineSuperLearner and the DiscreteOnlineSuperLearner.
@param data Data.Base the data to use to train the instance on. Note
that this can be any instance of a Data.Base
superclass, as long
as it extends the correct functions.
@param initial_data_size integer (default = 5) when training an online algoritm, one first needs to specify a small number of rows of data that can be used for calculating the initial versions of the estimators (these are essentially trained as batch algorithms). How much blocks are used for training this initial set can be specified here.
@param max_iterations integer (default = 20) the maximum number of iterations the algorithm can use to train the algorithm. If this is more than the number of blocks in the data, the OSL will stop gracefully. This is useful when there is a stream of data which would provide data indefinitely.
@param mini_batch_size integer (default = 20) the size of the mini batch
to use for each update. Note that this needs to be larger than the
specified test_set_size
on initialization. Part of this
collection of blocks / mini batch will be used as a validation set
while training.
@return data.table a data.table with the risks of each estimator.
predict(data, relevantVariables = NULL, all_estimators = TRUE, discrete = TRUE, continuous = TRUE, sample = FALSE, plot = FALSE)
Method to perform a prediction on the estimators. It can run in
different configurations. It can be configured to predict the outcome
using all estimators (the all_estimators
flag), using the
discrete superlearner (the discrete
flag), or using the
continuous online superlearner (the continous
flag). At least one
of these three flags must be true.
Note that the predict function in this case yields the predicted probability of an outome. That is, it does NOT predict an actual outcome, just the probability.
@param data Data.Base the data to use to train the instance on. Note
that this can be any instance of a Data.Base
superclass, as long
as it extends the correct functions.
@param relevantVariables list (default = NULL) the relevant variables used
for doing the predictions (these should be the same as the ones used for
fitting). If NULL
, we will use the list provided on
initialization.
@param all_estimators boolean (default = TRUE) whether or not to include the output of all candidate estimators in the output
@param discrete boolean (default = TRUE) = whether or not to include the output of the discrete super learner in the output
@param continuous boolean (default = TRUE) whether or not to include the output of the continuous super learner in the output
@param sample boolean (default = FALSE) is the goal to sample from the underlying densities, or should we predict a probability of an outcome?
@param plot (default = FALSE) if set to true, the algorithm will plot the outcomes to a file for further inspection. This is useful when inspecting the performance of the estimators.
@return list a list with two entries; normalized and denormalized. The
normalized outcomes are the values scaled between 0-1 (using the
PreProcessor
), the denormalized outcomes are the values
transformed back to their original range.
sample_iteratively(data, tau = 10, intervention = NULL)
Method to sample iteratively from the densities. It works by providing
an initial observation (data
), from which iteretitatively the
next measurement is estimated. This is done until tau
steps in
the future. Furthermore, this sampling step can be augmented with an
intervention. That is, we could set a given time step (or all) to a
certain value. The intervention
provided should be a list
containing a when
and what
entry. the when
entry
should show when the intervention is performed, the what
entry
shows what should be done.
@param data = the initial data to start the sampling from. At most 1 row of data.
@param tau integer (default = 10) the timestep at which you want to evaluate the output
@param intervention list/intervention (default = NULL) the intervention,
e.g.: list(when = c(1,2), what = c(1,0), variable = 'A')
. See the
InterventionParser
for more details.
@param return_type string (default = 'observations') the
OnlineSuperlearner.SampleIteratively
can return data in different
configurations. It can return all data, only a subset, or denormalized
outcomes. Check the OnlineSuperlearner.SampleIteratively
class for more
details.
@param start_from_variable RelevantVariable (default = NULL) if we don't start with sampling from the first argument in the sequence of variables, specify which one to start from.
@param start_from_time integer (default = 1) generally the sampling procedure starts from $t = 1$, but sometimes one might want to sample from a different point in time. This can be specified here.
@param check boolean (default = FALSE) should we perform a check on the provided arguments?
@return list/dataframe of sampled values.
train_library(data_current)
Function to train the initial set of models
@param data_current data.table the dataset to train the estimators on.
update_library(max_iterations, mini_batch_size)
Updates the initial / trained estimators with the available data. This data does not need to be provided as it is already part of the Data.Base provided on initialization / fitting.
@param max_iterations integer the maximal number of iterations of online learning to do.
@param mini_batch_size integer (default = 20) the size of the mini batch to use for each update.
fit_dosl()
Finds the best estimator among the current set, for each of the densities (WAY)
get_cv_risk()
Method to retrieve the current cv_risk
(note that this is not an active
method, so it can easiliy be stubbed).
@return an overview of the CV risk of the different estimators
set_relevant_variables(relevant_variables)
Method to set the relevant_variables in the osl class. Generally not needed (apart from initialization).
retrieve_list_of_relevant_variables(relevant_variables)
Retrieves a list of relevant variables according to a specification. This
function allows for a more flexible way of retrieving relevant variables
from the OSL model.
@param relevant_variables the relevant_variables for which we want to
receive the list of variables, in a form that our model accepts. This
can be specified as follows:
- List of RelevantVariable
objects to predict
- Single RelevantVariable
object to predict
- List of strings with the names of the outputs (list('X','Y')
)
- Single string with the name of the output ('Y'
)
@return a list of RelevantVariable
objects to use in the prediction function.
is_fitted
Active method. Returns whether the OSL has been fitted or not
@return boolean true if it has been fitted, false if not
is_online
Active method to deterimine whether the actual algorithm is fitted in an online way. That is to say, that all of the estimators are in fact online.
@return boolean true if it all algorithms are online, false if not
fits_osl
Active method to know whether the current OSL fits an online super learner (that is, the weighted combination). This setting comes from the initialization step of OSL.
@return boolean true if it fits an osl (false if not)
fits_dosl
Active method to know whether the current OSL fits a discrete online super learner. This setting comes from the initialization step of OSL.
@return boolean true if it fits a discrete osl (false if not)
info
Active method to print some general info related to the current OSL
get_estimators
Active method to retrieve a list of estimators. These can be the fitted versions (if the osl is fitted), or the plain unfitted versions. Check the is_online version for that.
@return list a list object containing all estimators.
get_osl_weights
Active method to retrieve a vector of weights that the OSL has found for its continuous online super learner fit.
@return vector a vector containing the estimates of the OSL weights
get_dosl
Active method to retrieve the actual DOSL fit. this could be nil if no dosl has been fit yet.
@return list a list containing the best estimator for each of the relevant variables.
get_cv_risk
Active method to retrieve the crossvalidated risk of each of the estimators
@return list a list containing the risk estimates for each of hte estimators.
get_relevant_variables
Active method. Returns all RelevantVariables
in the OSL object.
@return list a list containing all RelevantVariable
s
get_valididy
Active method that throws an error if the current state of the OSL is not valid (i.e., that it has invalid parameters in it).
get_osl_sampler
Active method. Returns the OSL sampler (which is an instance of the
OnlineSuperLearner.SampleIteratively
object.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.