Description Usage Arguments Details
This function can be used to generate plots of the underlying decision trees used in the spark random forest classification model
1 2 3 | spark_plot_randforest(sparklyr_table, ml_rf_model, show_stats = TRUE,
plot_treeIDs = "all", y_lim = c(3, 5), x_lim = c(-15, 15),
hdfs_temp_path = "/tmp/RandomForestClassificationModels")
|
sparklyr_table |
is the spark table you will pass to the function. You can pass using a dplyr spark table (tbl) This could be the test or train set you want to use for prediction generation. |
ml_rf_model |
is the ml_random_forest model output you pass to this function |
show_stats |
(default=TRUE) This will include the metrics in each |
plot_treeIDs |
(default="all") You can plot specific Trees like plot_treeIDs = list(1,4,5) where 1,4,5 are the target treeIDs you want to plot |
hdfs_temp_path |
(default = "/tmp/RandomForestClassificationModels/") You should change this path to another location if you do not have permission to write in the hdfs or local /tmp directory. This function must write the spark RandomForestRegressionModel to hdfs temporarily to access certain model specs needed. |
Important package requirements:
You MUST have the sparklyr, igraph, and purrr packages installed to use this function
You MUST have an active spark_context named "sc"
Example selection of a spark table and graph
spark_table = tbl(sc, sql("select * from db.stock_samples_20m limit 100"))
outputs = spark_plot_kmeans(inputDF, kmean_model, plotMode="both")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.