spark_eda_dash: An EDA Dashboard Builder for SparklyR updated
In GabeChurch/sparkedatools:

It is adivsed to drop time/array/other columns (or those with nested datatypes) before running.

1
2
3

spark_eda_dash(sparklyr_table, hist_num_buckets = 10L,
  hist_include_null = FALSE, hist_decimal_places = 2L,
  desc_decimal_places = 2L)

`sparklyr_table`	is the spark table you will pass to the function. You can pass using a dplyr spark table (tbl).
`hist_num_buckets`	(default=10L) will set the number of buckets for the Spark Histograms (on each numeric column). The default is 10 buckets (set with 10L)
`hist_include_null`	(default=FALSE) if TRUE will include a column with the null counts for each field in the histograms
`hist_decimal_places`	(default = 2L) controls the number of decimals values to round for histograms bucketed (if any)
`desc_decimal_places`	(default = 2L) controls the number of decimals

Important package requirements:
Download the required jar at www.gabechurch.com/sparkEDA (default future integration is in the works)

Example selection of a spark table and graph
spark_table = tbl(sc, sql("select * from db.stock_samples_20m limit 100"))
spark_hist(spark_table, 20L)