View source: R/ml-prepare-dataset.R
ml_prepare_dataset | R Documentation |
Creates the 'label' and 'features' columns
ml_prepare_dataset(
x,
formula = NULL,
label = NULL,
features = NULL,
label_col = "label",
features_col = "features",
keep_original = TRUE,
...
)
x |
A |
formula |
Used when |
label |
The name of the label column. |
features |
The name(s) of the feature columns as a character vector. |
label_col |
Label column name, as a length-one character vector. |
features_col |
Features column name, as a length-one character vector. |
keep_original |
Boolean flag that indicates if the output will contain,
or not, the original columns from |
... |
Added for backwards compatibility. Not in use today. |
At this time, 'Spark ML Connect', does not include a Vector Assembler transformer. The main thing that this function does, is create a 'Pyspark' array column. Pipelines require a 'label' and 'features' columns. Even though it is is single column in the dataset, the 'features' column will contain all of the predictors insde an array. This function also creates a new 'label' column that copies the outcome variable. This makes it a lot easier to remove the 'label', and 'outcome' columns.
A tbl_pyspark
, with either the original columns from x
, plus the
'label' and 'features' column, or, the 'label' and 'features' columns only.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.