Description Usage Arguments Details Value Examples
This function attempts to perform automated modeling (use machine learning models, select features). It is optimized for maximum speed, therefore the user has a lot of chore to perform before using this function.
1 2 3 4 5 | LauraeML(data, label, folds, seed = 0, models = NULL, parallelized = NULL,
optimize = TRUE, no_train = FALSE, logging = NULL, maximize = TRUE,
features = 0.5, hyperparams = NULL, n_tries = 50, n_iters = 50,
early_stop = 5, elites = 0.1, feature_smoothing = 1,
converge_cont = 0.1, converge_disc = 0.1)
|
data |
Type: data.table (mandatory). The data features. |
label |
Type: vector (numeric). The labels. For classes, use a numbering starting from |
folds |
Type: list of numerics. A list containing per element, the observation rows for the folds, which is passed to your modeling functions. |
seed |
Type: numeric. The seed for random number generation. Defaults to |
models |
Type: list of functions. A list of functions, taking each a |
parallelized |
Type: parallel socket cluster (makeCluster or similar). When specified, data is split (in a list) before being fed to the modeling functions (with a list per fold containing first the training data, and second the testing data), at the expense of drastically increasing memory usage. Defaults to |
optimize |
Type: boolean. Whether to perform optimization or take everything as is (no optimization of any parameters). Defaults to |
no_train |
Type: boolean. When optimize is |
logging |
Type: character. The log file output. The logging must be done in the variable |
maximize |
Type: boolean. Whether to maximize ( |
features |
Type: numeric. The approximate percentage of features that should be selected. This parameter is ignored when features when you underestimate the number of features you really need. Defaults to |
hyperparams |
Type: list of list of vector of numerics. Contains the hyperparameter interval to optimize per function. Each hyperparameter must have 4 lists, containing separately the mean (first list), the standard deviation (second list), the minimum (third list) and the maximum (fourth list) allowed. This is still used to fetch hyperparameters to pass when |
n_tries |
Type: numeric. The number of tries allowed to optimize per iteration of optimization of each model. To get the total number of models trained, you must multiplicate it with |
n_iters |
Type: numeric. The numbers of iterations allowed for optimization of each model. To get the total number of models trained, you must multiplicate it with |
early_stop |
Type: numeric. The number of optimization iterations allowed without any improvements of the metric returned by the model functions. Defaults to |
elites |
Type: numeric. The percentage of best results taken in each iteration of optimization to use as a baseline. The higher the number, the slower the convergence (but the stabler the iteration updates). Must be between |
feature_smoothing |
Type: numeric. The smoothing factor applied to feature selection to not pick strong features too fast. Must be between |
converge_cont |
Type: numeric. The minimum allowed standard deviation of the maximum standard deviations of continuous variables. If all hyperparameters' standard deviation fall below |
converge_disc |
Type: numeric. The minimum allowed single class probability of the maximum single class of discrete variables. If all features' maximum probability (of either 0 or 1) fall below |
This is a mega function.
The score of the models along with their hyperparameters.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | ## Not run:
# Not tabulated well to keep under 100 characters per line
mega_model <- LauraeML(data = data,
label = targets,
folds = list(1:1460, 1461:2919),
seed = 0,
models = list(lgb = LauraeML_lgbreg,
xgb = LauraeML_gblinear),
parallelized = FALSE,
optimize = TRUE,
no_train = FALSE,
logging = NULL,
maximize = FALSE, # FALSE on RMSE, fast example of doing the worst
features = 0.50,
hyperparams = list(lgb = list(Mean = c(5, 5, 1, 0.7, 0.7, 0.5, 0.5),
Sd = c(3, 3, 1, 0.2, 0.2, 0.5, 0.5),
Min = c(1, 1, 0, 0.1, 0.1, 0, 0),
Max = c(15, 50, 50, 1, 1, 50, 50)),
xgb = list(Mean = c(1, 1, 1),
Sd = c(1, 1, 1),
Min = c(0, 0, 0),
Max = c(2, 2, 2))),
n_tries = 10, # Set this big, preferably 10 * number of features
n_iters = 1, # Set this big to like 50
early_stop = 2,
elites = 0.4,
feature_smoothing = 1,
converge_cont = 0.5,
converge_disc = 0.25)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.