spark_plot_overlay_pct: A variable response percentage plot for variables in a...

Description Usage Arguments Details

Description

You will need to reduce the range to around 200 per variable to get effective plots for this method at the moment, as bucketing is not supported (yet).

Usage

1
2
spark_plot_overlay_pct(sparklyr_table, response_var,
  max_numeric_ticks = 40)

Arguments

sparklyr_table

is the sparklyr table to pass to the function

response_var

is the string response variable you want to overlay the histograms with.

max_numeric_ticks

40 is the default, using over 40 is fine but you should increase the output width using knitR.

Details

You must have sparklyr, ggplot2, and purrr installed
You must also have the sparkeda jar installed and referenced the same way as spark_hist
You can change the plot output sizes with the chunk settings using knitR like r fig.height=8, fig.width=20
Example selection of a spark table and plot generation
adult_df = tbl(sc, sql("select * from sample_data.adult_dataset"))
spark_hist_overlay(adult_df, "income"))


GabeChurch/sparkedatools documentation built on June 25, 2019, 12:23 p.m.