Description Usage Arguments Value Examples
xg_load_data
returns a list with all the prepared element for loading
and preparing the data for xgboost modelling. The model principally relies
on two categories of input: numeric (num) and category (cat).
1 2 3 | xg_load_data(file, inputs = "auto", output, inputs.class = "auto",
output.class = "auto", train.size = 1, seed = 1,
na.handle = "inf", max.levels = 50)
|
file |
Character. The link to the file containing the data.
The data are imported with the |
inputs |
Character vector. Vector of the column names for the inputs of the model. Only those columns will be used for the model. Using the "auto" value will use as inputs all the columns from the table except the one labelled as output. |
output |
Character. A single string specifying the name of the output column for the model training. |
inputs.class |
Character vector. A vector specifying the classes for the input column. If set to "auto", the classes will be determined from the output of the fread function. Else, it must me a vector whose size is exactly the number of input and whose values can only be num (for numerical inputs) and cat (for categorical inputs). |
output.class |
Character. Class for output. If set to "auto", the class will be determide from the output of the fread function. Else, it must be equal to num or cat for numerical or categorical inputs. |
train.size |
Numeric. Size for training set for the future model. Can go from 0 (no training set: will produce an error) to 1 (no test set). |
seed |
Numeric. Seed for reproducibility of the results. |
na.handle |
Character. Way to handle na value in numeric inputs. Five possibilities have been implemented:
|
max.levels |
Numeric. Maximum number of levels admitted for a category. This parameters is here to make sure that the model does not have to many input data when transformed into a one-hot encoded matrix. |
A list with following values:
train: training set for the model, with a matrix for the input values and a vector for the target variables.
test: test set for the model, on the same format that the training set
formula: the formula used for constructing the model matrix and that is applied when running the model.
template: an empty data.table
that has saved all the
input values and that is used to appropriately format data when using
the prediction function.
data: A data.table with the cleaned data and an additional logical column, train, that indicates which data are used in the training data set.
na.handle: passed to reapply to prediction
1 2 3 4 | d <- xg_load_data(system.file("extdata", "titanic.csv", package = "ezXg"),
inputs = c("Pclass", "Sex", "Age", "SibSp",
"Parch", "Fare", "Embarked"),
output = "Survived")
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.