LauraeML_gblinear_par: Laurae's Machine Learning (xgboost gblinear helper parallel...
In Laurae2/Laurae: Advanced High Performance Data Science Toolbox for R

Description Usage Arguments Value Examples

This function is a demonstration function for using xgboost gblinear in LauraeML with premade folds (in addition to being parallelized over folds, assuming mcl in the global environment is the parallel cluster). It has alpha, lambda, and lambda_bias as tunable hyperparameters. It also accepts feature selection, and performs full logging (every part is commented in the source) with writing to an external file in order to follow the hyperparameters and feature count.

1 2	LauraeML_gblinear_par(x, y, mobile, parallelized, maximize, logging, data, label, folds)

`x`	Type: vector (numeric). The hyperparameters to use.
`y`	Type: vector (numeric). The features to use, as binary format (0 for not using, 1 for using).
`mobile`	Type: environment. The environment passed from `LauraeML`.
`parallelized`	Type: parallel socket cluster (makeCluster or similar). The `parallelized` parameter passed from `LauraeML` (whether to parallelize training per folds or not).
`maximize`	Type: boolean. The `maximize` parameter passed from `LauraeML` (whether to maximize or not the metric).
`logging`	Type: character. The `logging` parameter passed from `LauraeML` (where to store log file).
`data`	Type: data.table (mandatory). The data features. Comes from `LauraeML`.
`label`	Type: vector (numeric). The labels. Comes from `LauraeML`.
`folds`	Type: list of numerics. The folds as list. Comes from `LauraeML`.

The score of the cross-validated xgboost gblinear model, for the provided hyperparameters and features to use.

## Not run: 
# To run before using LauraeML
library(doParallel)
library(foreach)
mcl <- makeCluster(4)
invisible(clusterEvalQ(mcl, library("xgboost")))
invisible(clusterEvalQ(mcl, library("data.table")))
invisible(clusterEvalQ(mcl, library("Laurae")))

# In case you are doing manual training, try this.
# We suppose our data is in the variable "data" and labels in "label".

folds <- Laurae::kfold(label, k = 5)
temp_data <- list()
temp_label <- list()

for (i in 1:length(folds)) {

temp_data[[i]] <- list()
temp_data[[i]][[1]] <- Laurae::DTsubsample(data,
                                           kept = folds[[i]],
                                           remove = TRUE,
                                           low_mem = FALSE,
                                           collect = 0,
                                           silent = TRUE)
temp_data[[i]][[2]] <- Laurae::DTsubsample(data,
                                           kept = folds[[i]],
                                           remove = FALSE,
                                           low_mem = FALSE,
                                           collect = 0,
                                           silent = TRUE)
temp_label[[i]] <- list()
temp_label[[i]][[1]] <- label[-folds[[i]]]
temp_label[[i]][[2]] <- label[folds[[i]]]

}

clusterExport(mcl, c("temp_data", "temp_label"), envir = environment())
registerDoParallel(cl = mcl)

# This will not run correctly because it's not made to be used like that
LauraeML_gblinear_par(x = c(1, 1, 1),
                      y = rep(1, ncol(data)),
                      mobile = NA,
                      parallelized = mcl,
                      maximize = TRUE,
                      logging = NULL,
                      data = temp_data,
                      label = temp_label,
                      folds = folds)

# Stops the cluster
registerDoSEQ()
stopCluster(mcl)
#closeAllConnections() # In case of emergency if your cluster do not answer

## End(Not run)