DensityEstimation: DensityEstimation
In frbl/OnlineSuperLearner: Online SuperLearner package

This class performs the actual density estimation for each of the relevant variables provided to it.

1	DensityEstimation

An object of class R6ClassGenerator of length 24.

initialize(nbins = 30, bin_estimator = NULL, online = FALSE, name = 'default', verbose = FALSE)

Creates a new density estimator. One can provide various hyper parameters. First of all the number of bins the estimator uses can be configured, by default this is 30. Then one can define the actual estimator the density estimator uses for estimating the conditional densities. By default it uses speedglm. Finally, one can select whether to treat the algorithm as an online or batch algorithm. If online is set to false, we will keep a record of the data that has been used to train an estimator. Note that this is currently very inefficient.

@param nbins (default = 30) integer the number of bins to use for the estimator.

@param bin_estimator (default = NULL) ML.Base the actual estimator used for fitting the conditional density

@param online (default = FALSE) boolean does the algorithm have an updating possibility? And if so, should we treat it as an online algorithm?

@param name (default = 'default') the name to use for the density estimator.

@param verbose (default = FALSE) the verbosity (log level) to use while running.

predict(data, sample = FALSE, subset = NULL, plot = FALSE, check = FALSE)

Method to perform the prediction on all (or a subset of) the relevant variables / conditional densities. One can provide the option to sample, which means that if this is set, the predict function will sample new values from the underlying distributions. If this argument is set to false, this function will return predicted probabilities.

@param data the data from which to predict the outcome. It depends on the goal of the prediction what this needs to be. If the goal is to sample a new relevant variable, the value for the relevant variables to sample does not need to be set in the data table (they can be NA). However, if one wants to know the probability of an instance of a relevant variable given the other variables, ($P(Y | X_1, X2_)$), then the relevant variable cannot be empty.

@param sample (default = FALSE) boolean would we like to sample a value (true) or a probability (false) from the conditional density

@param subset (default = NULL) stringarray do we want to perform predictions for all variables? (NULL), or just a subset thereof?

@param plot (default = FALSE) boolean plot the predicted outcomes to a file in tmp. This can be used for debugging (i.e., it shows the sampled distribution over the actual distribution.

@return list a list containing the predictions, where each entry is one of the relevantvariables for which a conditional distribution was fit.

predict_probability(datO, X, Y, plot = FALSE, check = FALSE)

Internal method used by the predict function. This function predicts a $P(Y | X)$. These arguments are therefore instances of the RelevantVariable class. The data of these relevant variables needs to be included in the datO argument.

@param datO the data from which to predict the probability. As we want to predict the probability of Y given a set of X, ($P(Y | X_1, X2_)$), the relevant variable column in datO cannot be empty.

@param sample (default = FALSE) boolean would we like to sample a value (true) or a probability (false) from the conditional density

@param subset (default = NULL) stringarray do we want to perform predictions for all variables? (NULL), or just a subset thereof?

@param plot (default = FALSE) boolean plot the predicted outcomes to a file in tmp. This can be used for debugging (i.e., it shows the sampled distribution over the actual distribution.

@return list a list containing the predictions, where each entry is one of the relevantvariables for which a conditional distribution was fit.

getConditionalDensities(outcome = NULL)

Function to get all fitted conditional densities. By default it will return the full list of conditional densities (when outcome = NULL). One could also provide a subset of relevant variables to the function. Note that if only a single variable is returned (e.g. outcome = 'Y'), it will return a single conditional density (i.e., this outcome is not encapsulated in a list). If a vector of outcomes is provided it will return a list.

@param outcome (default = NULL) the subset of outcomes for which a conditional density needs to be returned. When NULL it will return all outcomes.

@return either all conditional densities in a list, a subset (in a list), or a single density.

is_online()

Active method. returns whether the estimator is initialized to be online or not.

@return boolean true if the estimator is fitted as an online estimator.

get_bin_estimator()

Active method. Returns the algorithm that is used for fitting the bins (i.e., the machine learning algorithm).

@return ML.base the actual algorithm used to fit the density.

get_nbins()

Active method. the number of bins used to split the continuous density distribution.

@return integer the number of bins.

get_name()

Active method. Returns the set name for the current estimator.

@return string the name of the estimator.

get_raw_conditional_densities()

Active method. Returns the conditional densities. Note that this method should preferably not be used. Using the getConditionalDensities() method is prefered.

@return list the conditional densities

get_estimator_type()

Active method. returns a list with two elements. First fitfunname the name of the function used to fit the density, and lmclass the lmclass for each of the algorithms.

@return list with the fitfunname and the lmclass

frbl/OnlineSuperLearner documentation built on Feb. 9, 2020, 9:28 p.m.