spark_plot_randforest: A SparklyR Random Forest Classification Model Plotting...
In GabeChurch/sparkedatools:

Description Usage Arguments Details

This function can be used to generate plots of the underlying decision trees used in the spark random forest classification model

1
2
3

spark_plot_randforest(sparklyr_table, ml_rf_model, show_stats = TRUE,
  plot_treeIDs = "all", y_lim = c(3, 5), x_lim = c(-15, 15),
  hdfs_temp_path = "/tmp/RandomForestClassificationModels")

`sparklyr_table`	is the spark table you will pass to the function. You can pass using a dplyr spark table (tbl) This could be the test or train set you want to use for prediction generation.
`ml_rf_model`	is the ml_random_forest model output you pass to this function
`show_stats`	(default=TRUE) This will include the metrics in each
`plot_treeIDs`	(default="all") You can plot specific Trees like plot_treeIDs = list(1,4,5) where 1,4,5 are the target treeIDs you want to plot
`hdfs_temp_path`	(default = "/tmp/RandomForestClassificationModels/") You should change this path to another location if you do not have permission to write in the hdfs or local /tmp directory. This function must write the spark RandomForestRegressionModel to hdfs temporarily to access certain model specs needed.

Important package requirements:
You MUST have the sparklyr, igraph, and purrr packages installed to use this function
You MUST have an active spark_context named "sc"

Example selection of a spark table and graph
spark_table = tbl(sc, sql("select * from db.stock_samples_20m limit 100"))
outputs = spark_plot_kmeans(inputDF, kmean_model, plotMode="both")