Description Usage Arguments Details
A histogram for plotting response ove the variables in a table. You will need to reduce the range to around 200 per variable to get effective plots for this method at the moment, as bucketing is not supported (yet).
1 | spark_hist_overlay(sparklyr_table, response_var, max_numeric_ticks = 40)
|
sparklyr_table |
is the sparklyr table to pass to the function |
response_var |
is the string response variable you want to overlay the histograms with. |
max_numeric_ticks |
40 is the default, using over 40 is fine but you should increase the output width using knitR. |
You must have sparklyr and ggplot2 installed
You must also have the sparkeda jar installed and referenced the same way as spark_hist
You can change the plot output sizes with the chunk settings using knitR like r fig.height=8, fig.width=20
Example selection of a spark table and plot generation
adult_df = tbl(sc, sql("select * from sample_data.adult_dataset"))
spark_hist_overlay(adult_df, "income"))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.