spark_plot_pairs: A pairs plot generation tool for sparklyr tables

Description Usage Arguments Details

Description

This is a lightweight and simple wrapper around GGally for SparklyR tables. It will randomly sample 80,000 records

Usage

1
2
spark_plot_pairs(sparklyr_table, label_col, sample_size = 80000,
  my_title = "Pairs Plot", progress = FALSE)

Arguments

sparklyr_table

is the sparklyr table to pass to the function

label_col

is the column you want to label in the plot.

sample_size

80000 is the default, you can increase this if you wish however it may lead to poor performance.

my_title

is the tile of your pairs plot.

progress

is FALSE by default, is you change it will produce status bars to let you know the speed.

Details

You must have GGally, sparklyr, dplyr, and ggplot2 installed
Suggested to store the outputs, and check the chunk settings (as follows) on the stored output to avoid re-running. You can change the plot output sizes with the chunk settings using knitR like r fig.height=12, fig.width=12
Example selection of a spark table and plot generation
iris_df = copy_to(sc, iris, name="iris_df")
spark_plot_pairs(sparklyr_table = iris_df, label_col = "Species")


GabeChurch/sparkedatools documentation built on June 25, 2019, 12:23 p.m.