Description Usage Arguments Value Examples
This function allows you to prepare the cross-validatation of a LightGBM model.
It is recommended to have your x_train and x_val sets as data.table (or data.frame), and the data.table development version. To install data.table development version, please run in your R console: install.packages("data.table", type = "source", repos = "http://Rdatatable.github.io/data.table")
.
SVMLight conversion requires Laurae's sparsity package, which can be installed using devtools:::install_github("Laurae2/sparsity")
. SVMLight format extension used is .svm
.
Does not handle weights or groups.
1 2 3 4 5 6 7 8 | lgbm.cv.prep(y_train, x_train, x_test = NA, SVMLight = is(x_train,
"dgCMatrix"), data_has_label = FALSE, NA_value = "nan",
workingdir = getwd(), train_all = FALSE, test_all = FALSE,
cv_all = TRUE, train_name = paste0("lgbm_train", ifelse(SVMLight, ".svm",
".csv")), val_name = paste0("lgbm_val", ifelse(SVMLight, ".svm", ".csv")),
test_name = paste0("lgbm_test", ifelse(SVMLight, ".svm", ".csv")),
verbose = TRUE, folds = 5, folds_weight = NA, stratified = TRUE,
fold_seed = 0, fold_cleaning = 50)
|
y_train |
Type: vector. The training labels. |
x_train |
Type: data.table or dgCMatrix (with |
x_test |
Type: data.table or dgCMatrix (with |
SVMLight |
Type: boolean. Whether the input is a dgCMatrix to be output to SVMLight format. Setting this to |
data_has_label |
Type: boolean. Whether the data has labels or not. Do not modify this. Defaults to |
NA_value |
Type: numeric or character. What value replaces NAs. Use |
workingdir |
Type: character. The working directory used for LightGBM. Defaults to |
train_all |
Type: boolean. Whether the full train data should be exported to the requested format for usage with |
test_all |
Type: boolean. Whether the full test data should be exported to the requested format for usage with |
cv_all |
Type: boolean. Whether the full cross-validation data should be exported to the requested format for usage with |
train_name |
Type: character. The name of the default training data file for the model. Defaults to |
val_name |
Type: character. The name of the default validation data file for the model. Defaults to |
test_name |
Type: character. The name of the testing data file for the model. Defaults to |
verbose |
Type: boolean. Whether |
folds |
Type: integer, vector of two integers, vector of integers, or list. If a integer is supplied, performs a |
folds_weight |
Type: vector of numerics. The weights assigned to each fold. If no weight is supplied ( |
stratified |
Type: boolean. Whether the folds should be stratified (keep the same label proportions) or not. Defaults to |
fold_seed |
Type: integer or vector of integers. The seed for the random number generator. If a vector of integer is provided, its length should be at least longer than |
fold_cleaning |
Type: integer. When using cross-validation, data must be subsampled. This parameter controls how aggressive RAM usage should be against speed. The lower this value, the more aggressive the method to keep memory usage as low as possible. Defaults to |
The folds
and folds_weight
elements in a list if cv_all = TRUE
. All files are output and ready to use for lgbm.cv
with files_exist = TRUE
. If using train_all
, it is ready to be used with lgbm.train
and files_exist = TRUE
. Returns "Success"
if cv_all = FALSE
and the code does not error mid-way.
1 2 3 4 5 6 7 8 9 10 11 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.