spark_pipeline_stage: Create a Pipeline Stage Object
In rstudio/sparklyr: R Interface to Apache Spark

spark_pipeline_stage

R Documentation

Create a Pipeline Stage Object

Description

Helper function to create pipeline stage objects with common parameter setters.

Usage

spark_pipeline_stage(
  sc,
  class,
  uid,
  features_col = NULL,
  label_col = NULL,
  prediction_col = NULL,
  probability_col = NULL,
  raw_prediction_col = NULL,
  k = NULL,
  max_iter = NULL,
  seed = NULL,
  input_col = NULL,
  input_cols = NULL,
  output_col = NULL,
  output_cols = NULL
)

Arguments

`sc`	A 'spark_connection' object.
`class`	Class name for the pipeline stage.
`uid`	A character string used to uniquely identify the ML estimator.
`features_col`	Features column name, as a length-one character vector. The column should be single vector column of numeric values. Usually this column is output by `ft_r_formula`.
`label_col`	Label column name. The column should be a numeric column. Usually this column is output by `ft_r_formula`.
`prediction_col`	Prediction column name.
`probability_col`	Column name for predicted class conditional probabilities.
`raw_prediction_col`	Raw prediction (a.k.a. confidence) column name.
`k`	The number of clusters to create
`max_iter`	The maximum number of iterations to use.
`seed`	A random seed. Set this value if you need your results to be reproducible across repeated calls.
`input_col`	The name of the input column.
`input_cols`	Names of output columns.
`output_col`	The name of the output column.