Laurae.xgb.dmat: xgboost DMatrix generation

Description Usage Arguments Value Examples

View source: R/Laurae.xgb.dmat.R

Description

Geneartes a (list of) xgb.DMatrix. Supported for clusters. Requires Matrix and xgboost packages.

Usage

1
2
Laurae.xgb.dmat(data, label, missing = NA, save_names = NULL,
  save_keep = TRUE, clean_mem = FALSE, progress_bar = TRUE, ...)

Arguments

data

Type: matrix or dgCMatrix or data.frame or data.table or filename, or potentially a list of any of them. When a list is provided, it generates the appropriate xgb.DMatrix for all the sets. The data to convert to xgb.DMatrix. RAM usage required is 2x the current data input RAM usage, and 3x for data.frame and data.table due to internal matrix conversion added before binary matrix generation.

label

Type: numeric, or a list of numeric. The label of associated rows in data. Use NULL for passing no labels.

missing

Type: numeric. The value used to represent missing values in data. Defaults to NA (and missing values for dgCMatrix).

save_names

Type: character or NULL, or a list of characters. If names are provided, the generated xgb.DMatrix are stored physically to the drive. When a list is provided (along with a list of data and labels), it stores files sequentially by name if a list is provided for data but not for save_names. Defaults to NA.

save_keep

Type: logical, or a list of logicals. When names are provided, save_keep allows to selectively choose the xgb.DMatrix to retain for returning to the user. Useful when generating a list of xgb.DMatrix but choosing to keep only a part of them. When FALSE, it returns a NULL instead of the xgb.DMatrix. Defaults to TRUE.

clean_mem

Type: logical. Whether the force garbage collection at the end of each matrix construction in order to reclaim RAM. Defaults to FALSE.

progress_bar

Type: logical. Whether to print a progress bar in case of list inputs. Defaults to TRUE.

...

More arguments to pass to xgboost::xgb.DMatrix.

Value

The xgb.DMatrix

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
library(Matrix)
library(xgboost)

set.seed(0)

# Generate xgb.DMatrix from matrix
random_mat <- matrix(runif(10000, 0, 1), nrow = 1000)
random_labels <- runif(1000, 0, 1)
xgb_from_mat <- Laurae.xgb.dmat(data = random_mat, label = random_labels, missing = NA)

# Generate xgb.DMatrix from data.frame
random_df <- data.frame(random_mat)
random_labels_2 <- runif(1000, 0, 1)
xgb_from_df <- Laurae.xgb.dmat(data = random_df, label = random_labels, missing = NA)

# Generate xgb.DMatrix from respective elements of a list with progress bar
# while keeping memory usage as low as theoretically possible
random_list <- list(random_mat, random_df)
random_labels_3 <- list(random_labels, random_labels_2)
xgb_from_list <- Laurae.xgb.dmat(data = random_list,
                                 label = random_labels_3,
                                 missing = NA,
                                 progress_bar = TRUE,
                                 clean_mem = TRUE)

# Generate xgb.DMatrix from respective elements of a list and keep only first
# while keeping memory usage as low as theoretically possible
xgb_from_list <- Laurae.xgb.dmat(data = random_list,
                                 label = random_labels_3,
                                 missing = NA,
                                 save_keep = c(TRUE, FALSE),
                                 clean_mem = TRUE)

Laurae2/LauraeDS documentation built on May 29, 2019, 2:25 p.m.