spark_plot_log_manLasso: A SparklyR Logistic Regression Feature Selection Method

Description Usage Arguments Details

Description

The function compares the AUC between logistic regression models for each feature, dropping each iteratively.

Usage

1
2
spark_plot_log_manLasso(sparklyr_table, predictor, num_folds = 3,
  parallelism = 1)

Arguments

sparklyr_table

is the spark table you will pass to the function. You can pass using a dplyr spark table (tbl).

predictor

is the target column to predict

num_folds

(default=3) this param passes the number of cross-validation folds to use for each logistic regression model

parallelism

(default=1) this param allows us to deploy default models simultaneously

Details

Important package requirements:
You must have ggplot2 installed

Example selection of a spark table and graph
spark_table = tbl(sc, sql("select * from sample_data.iris limit 100"))
outputs = spark_plot_log_manLasso(spark_table, predictor='Species')


GabeChurch/sparkedatools documentation built on June 25, 2019, 12:23 p.m.