OnlineSuperLearner: OnlineSuperLearner

Description Usage Format Methods

Description

This is the main super learner class. This class contains everything related to the super learner machine learning model.

Usage

1

Format

An object of class R6ClassGenerator of length 24.

Methods

initialize(SL.library.definition = c("ML.Local.lm", "ML.H2O.glm"), summaryMeasureGenerator, verbose = FALSE)

starts a new OnlineSuperLearner. The provided SL.library.definition contains the machine learning models to use.

@param SL.library.definition list a list of machine learning algorithms. This could be either a vector with with the name of each estimator or a list according to the libraryFactroy. Look in the LibraryFactory class for the specification of this list.

@param summaryMeasureGenerator SummaryMeasureGenerator an object of the type SummaryMeasureGenerator. This generator is used to get new observations with the correct aggregated columns.

@param should_fit_osl boolean (default = TRUE) should an instance of the OSL fit the online version of the osl?

@param should_fit_dosl boolean (default = TRUE) should an instance of the OSL fit the discrete online version of the osl?

@param pre_processor PreProcessor (default = NULL) an instance of the PreProcessor which is used to normalize the in and output values for the OSL.

@param test_set_size integer (default = 1) the size of the test set to use.

@param verbose (default = FALSE) the verbosity (how much logging). Note that this might be propagated to other classes.

set_verbosity(verbosity)

Method that can change the verbosity of a superlearner instance

@param verbose the new verbosity (how much logging) to use.

fit(data, initial_data_size = 5, max_iterations = 20

Fits an instance of the OnlineSuperLearner class. This is the main method used for training the online SuperLearner (and this will actually trigger the training process). This will fit the provided SL.library.definition estimators as well as the OnlineSuperLearner and the DiscreteOnlineSuperLearner.

@param data Data.Base the data to use to train the instance on. Note that this can be any instance of a Data.Base superclass, as long as it extends the correct functions.

@param initial_data_size integer (default = 5) when training an online algoritm, one first needs to specify a small number of rows of data that can be used for calculating the initial versions of the estimators (these are essentially trained as batch algorithms). How much blocks are used for training this initial set can be specified here.

@param max_iterations integer (default = 20) the maximum number of iterations the algorithm can use to train the algorithm. If this is more than the number of blocks in the data, the OSL will stop gracefully. This is useful when there is a stream of data which would provide data indefinitely.

@param mini_batch_size integer (default = 20) the size of the mini batch to use for each update. Note that this needs to be larger than the specified test_set_size on initialization. Part of this collection of blocks / mini batch will be used as a validation set while training.

@return data.table a data.table with the risks of each estimator.

predict(data, relevantVariables = NULL, all_estimators = TRUE, discrete = TRUE, continuous = TRUE, sample = FALSE, plot = FALSE)

Method to perform a prediction on the estimators. It can run in different configurations. It can be configured to predict the outcome using all estimators (the all_estimators flag), using the discrete superlearner (the discrete flag), or using the continuous online superlearner (the continous flag). At least one of these three flags must be true.

Note that the predict function in this case yields the predicted probability of an outome. That is, it does NOT predict an actual outcome, just the probability.

@param data Data.Base the data to use to train the instance on. Note that this can be any instance of a Data.Base superclass, as long as it extends the correct functions.

@param relevantVariables list (default = NULL) the relevant variables used for doing the predictions (these should be the same as the ones used for fitting). If NULL, we will use the list provided on initialization.

@param all_estimators boolean (default = TRUE) whether or not to include the output of all candidate estimators in the output

@param discrete boolean (default = TRUE) = whether or not to include the output of the discrete super learner in the output

@param continuous boolean (default = TRUE) whether or not to include the output of the continuous super learner in the output

@param sample boolean (default = FALSE) is the goal to sample from the underlying densities, or should we predict a probability of an outcome?

@param plot (default = FALSE) if set to true, the algorithm will plot the outcomes to a file for further inspection. This is useful when inspecting the performance of the estimators.

@return list a list with two entries; normalized and denormalized. The normalized outcomes are the values scaled between 0-1 (using the PreProcessor), the denormalized outcomes are the values transformed back to their original range.

sample_iteratively(data, tau = 10, intervention = NULL)

Method to sample iteratively from the densities. It works by providing an initial observation (data), from which iteretitatively the next measurement is estimated. This is done until tau steps in the future. Furthermore, this sampling step can be augmented with an intervention. That is, we could set a given time step (or all) to a certain value. The intervention provided should be a list containing a when and what entry. the when entry should show when the intervention is performed, the what entry shows what should be done.

@param data = the initial data to start the sampling from. At most 1 row of data.

@param tau integer (default = 10) the timestep at which you want to evaluate the output

@param intervention list/intervention (default = NULL) the intervention, e.g.: list(when = c(1,2), what = c(1,0), variable = 'A'). See the InterventionParser for more details.

@param return_type string (default = 'observations') the OnlineSuperlearner.SampleIteratively can return data in different configurations. It can return all data, only a subset, or denormalized outcomes. Check the OnlineSuperlearner.SampleIteratively class for more details.

@param start_from_variable RelevantVariable (default = NULL) if we don't start with sampling from the first argument in the sequence of variables, specify which one to start from.

@param start_from_time integer (default = 1) generally the sampling procedure starts from $t = 1$, but sometimes one might want to sample from a different point in time. This can be specified here.

@param check boolean (default = FALSE) should we perform a check on the provided arguments?

@return list/dataframe of sampled values.

train_library(data_current)

Function to train the initial set of models

@param data_current data.table the dataset to train the estimators on.

update_library(max_iterations, mini_batch_size)

Updates the initial / trained estimators with the available data. This data does not need to be provided as it is already part of the Data.Base provided on initialization / fitting.

@param max_iterations integer the maximal number of iterations of online learning to do.

@param mini_batch_size integer (default = 20) the size of the mini batch to use for each update.

fit_dosl()

Finds the best estimator among the current set, for each of the densities (WAY)

get_cv_risk()

Method to retrieve the current cv_risk (note that this is not an active method, so it can easiliy be stubbed).

@return an overview of the CV risk of the different estimators

set_relevant_variables(relevant_variables)

Method to set the relevant_variables in the osl class. Generally not needed (apart from initialization).

retrieve_list_of_relevant_variables(relevant_variables)

Retrieves a list of relevant variables according to a specification. This function allows for a more flexible way of retrieving relevant variables from the OSL model. @param relevant_variables the relevant_variables for which we want to receive the list of variables, in a form that our model accepts. This can be specified as follows: - List of RelevantVariable objects to predict - Single RelevantVariable object to predict - List of strings with the names of the outputs (list('X','Y')) - Single string with the name of the output ('Y')

@return a list of RelevantVariable objects to use in the prediction function.

is_fitted

Active method. Returns whether the OSL has been fitted or not

@return boolean true if it has been fitted, false if not

is_online

Active method to deterimine whether the actual algorithm is fitted in an online way. That is to say, that all of the estimators are in fact online.

@return boolean true if it all algorithms are online, false if not

fits_osl

Active method to know whether the current OSL fits an online super learner (that is, the weighted combination). This setting comes from the initialization step of OSL.

@return boolean true if it fits an osl (false if not)

fits_dosl

Active method to know whether the current OSL fits a discrete online super learner. This setting comes from the initialization step of OSL.

@return boolean true if it fits a discrete osl (false if not)

info

Active method to print some general info related to the current OSL

get_estimators

Active method to retrieve a list of estimators. These can be the fitted versions (if the osl is fitted), or the plain unfitted versions. Check the is_online version for that.

@return list a list object containing all estimators.

get_osl_weights

Active method to retrieve a vector of weights that the OSL has found for its continuous online super learner fit.

@return vector a vector containing the estimates of the OSL weights

get_dosl

Active method to retrieve the actual DOSL fit. this could be nil if no dosl has been fit yet.

@return list a list containing the best estimator for each of the relevant variables.

get_cv_risk

Active method to retrieve the crossvalidated risk of each of the estimators

@return list a list containing the risk estimates for each of hte estimators.

get_relevant_variables

Active method. Returns all RelevantVariables in the OSL object.

@return list a list containing all RelevantVariables

get_valididy

Active method that throws an error if the current state of the OSL is not valid (i.e., that it has invalid parameters in it).

get_osl_sampler

Active method. Returns the OSL sampler (which is an instance of the OnlineSuperLearner.SampleIteratively object.


frbl/OnlineSuperLearner documentation built on Feb. 9, 2020, 9:28 p.m.