LauraeML_gblinear_par: Laurae's Machine Learning (xgboost gblinear helper parallel...

Description Usage Arguments Value Examples

Description

This function is a demonstration function for using xgboost gblinear in LauraeML with premade folds (in addition to being parallelized over folds, assuming mcl in the global environment is the parallel cluster). It has alpha, lambda, and lambda_bias as tunable hyperparameters. It also accepts feature selection, and performs full logging (every part is commented in the source) with writing to an external file in order to follow the hyperparameters and feature count.

Usage

1
2
LauraeML_gblinear_par(x, y, mobile, parallelized, maximize, logging, data,
  label, folds)

Arguments

x

Type: vector (numeric). The hyperparameters to use.

y

Type: vector (numeric). The features to use, as binary format (0 for not using, 1 for using).

mobile

Type: environment. The environment passed from LauraeML.

parallelized

Type: parallel socket cluster (makeCluster or similar). The parallelized parameter passed from LauraeML (whether to parallelize training per folds or not).

maximize

Type: boolean. The maximize parameter passed from LauraeML (whether to maximize or not the metric).

logging

Type: character. The logging parameter passed from LauraeML (where to store log file).

data

Type: data.table (mandatory). The data features. Comes from LauraeML.

label

Type: vector (numeric). The labels. Comes from LauraeML.

folds

Type: list of numerics. The folds as list. Comes from LauraeML.

Value

The score of the cross-validated xgboost gblinear model, for the provided hyperparameters and features to use.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
## Not run: 
# To run before using LauraeML
library(doParallel)
library(foreach)
mcl <- makeCluster(4)
invisible(clusterEvalQ(mcl, library("xgboost")))
invisible(clusterEvalQ(mcl, library("data.table")))
invisible(clusterEvalQ(mcl, library("Laurae")))

# In case you are doing manual training, try this.
# We suppose our data is in the variable "data" and labels in "label".

folds <- Laurae::kfold(label, k = 5)
temp_data <- list()
temp_label <- list()

for (i in 1:length(folds)) {

temp_data[[i]] <- list()
temp_data[[i]][[1]] <- Laurae::DTsubsample(data,
                                           kept = folds[[i]],
                                           remove = TRUE,
                                           low_mem = FALSE,
                                           collect = 0,
                                           silent = TRUE)
temp_data[[i]][[2]] <- Laurae::DTsubsample(data,
                                           kept = folds[[i]],
                                           remove = FALSE,
                                           low_mem = FALSE,
                                           collect = 0,
                                           silent = TRUE)
temp_label[[i]] <- list()
temp_label[[i]][[1]] <- label[-folds[[i]]]
temp_label[[i]][[2]] <- label[folds[[i]]]

}

clusterExport(mcl, c("temp_data", "temp_label"), envir = environment())
registerDoParallel(cl = mcl)

# This will not run correctly because it's not made to be used like that
LauraeML_gblinear_par(x = c(1, 1, 1),
                      y = rep(1, ncol(data)),
                      mobile = NA,
                      parallelized = mcl,
                      maximize = TRUE,
                      logging = NULL,
                      data = temp_data,
                      label = temp_label,
                      folds = folds)

# Stops the cluster
registerDoSEQ()
stopCluster(mcl)
#closeAllConnections() # In case of emergency if your cluster do not answer

## End(Not run)

Laurae2/Laurae documentation built on May 8, 2019, 7:59 p.m.