Laurae.lgb.dmat: LightGBM Dataset generation

Description Usage Arguments Value Examples

View source: R/Laurae.lgb.dmat.R

Description

Geneartes a (list of) lgb.Dataset. Unsupported for clusters. Requires Matrix and lightgbm packages.

Usage

1
2
Laurae.lgb.dmat(data, label = NULL, missing = NA, save_names = NULL,
  save_keep = TRUE, clean_mem = FALSE, progress_bar = TRUE, ...)

Arguments

data

Type: matrix or dgCMatrix or data.frame or data.table or filename, or potentially a list of any of them. When a list is provided, it generates the appropriate lgb.Dataset for all the sets. The data to convert to lgb.Dataset. RAM usage required is 2x the current data input RAM usage, and 3x for data.frame and data.table due to internal matrix conversion added before binary matrix generation.

label

Type: numeric, or a list of numeric. The label of associated rows in data. Use NULL for passing no labels.

missing

Type: numeric. The value used to represent missing values in data. Defaults to NA (and missing values for dgCMatrix).

save_names

Type: character or NULL, or a list of characters. If names are provided, the generated lgb.Dataset are stored physically to the drive. When a list is provided (along with a list of data and labels), it stores files sequentially by name if a list is provided for data but not for save_names. Defaults to NA.

save_keep

Type: logical, or a list of logicals. When names are provided, save_keep allows to selectively choose the lgb.Dataset to retain for returning to the user. Useful when generating a list of lgb.Dataset but choosing to keep only a part of them. When FALSE, it returns a NULL instead of the lgb.Dataset. Defaults to TRUE.

clean_mem

Type: logical. Whether the force garbage collection at the end of each matrix construction in order to reclaim RAM. Defaults to FALSE.

progress_bar

Type: logical. Whether to print a progress bar in case of list inputs. Defaults to TRUE.

...

More arguments to pass to lightgbm::lgb.Dataset.

Value

The lgb.Dataset

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
library(Matrix)
library(lightgbm)

set.seed(0)

# Generate lgb.Dataset from matrix
random_mat <- matrix(runif(10000, 0, 1), nrow = 1000)
random_labels <- runif(1000, 0, 1)
lgb_from_mat <- Laurae.lgb.dmat(data = random_mat, label = random_labels, missing = NA)

# Generate lgb.Dataset from data.frame
random_df <- data.frame(random_mat)
random_labels_2 <- runif(1000, 0, 1)
lgb_from_df <- Laurae.lgb.dmat(data = random_df, label = random_labels, missing = NA)

# Generate lgb.Dataset from respective elements of a list with progress bar
# while keeping memory usage as low as theoretically possible
random_list <- list(random_mat, random_df)
random_labels_3 <- list(random_labels, random_labels_2)
lgb_from_list <- Laurae.lgb.dmat(data = random_list,
                                 label = random_labels_3,
                                 missing = NA,
                                 progress_bar = TRUE,
                                 clean_mem = TRUE)

# Generate lgb.Dataset from respective elements of a list and keep only first
# while keeping memory usage as low as theoretically possible
lgb_from_list <- Laurae.lgb.dmat(data = random_list,
                                 label = random_labels_3,
                                 missing = NA,
                                 save_keep = c(TRUE, FALSE),
                                 clean_mem = TRUE)

Laurae2/LauraeDS documentation built on May 29, 2019, 2:25 p.m.