spark_plot_randforest: A SparklyR Random Forest Classification Model Plotting...

Description Usage Arguments Details

Description

This function can be used to generate plots of the underlying decision trees used in the spark random forest classification model

Usage

1
2
3
spark_plot_randforest(sparklyr_table, ml_rf_model, show_stats = TRUE,
  plot_treeIDs = "all", y_lim = c(3, 5), x_lim = c(-15, 15),
  hdfs_temp_path = "/tmp/RandomForestClassificationModels")

Arguments

sparklyr_table

is the spark table you will pass to the function. You can pass using a dplyr spark table (tbl) This could be the test or train set you want to use for prediction generation.

ml_rf_model

is the ml_random_forest model output you pass to this function

show_stats

(default=TRUE) This will include the metrics in each

plot_treeIDs

(default="all") You can plot specific Trees like plot_treeIDs = list(1,4,5) where 1,4,5 are the target treeIDs you want to plot

hdfs_temp_path

(default = "/tmp/RandomForestClassificationModels/") You should change this path to another location if you do not have permission to write in the hdfs or local /tmp directory. This function must write the spark RandomForestRegressionModel to hdfs temporarily to access certain model specs needed.

Details

Important package requirements:
You MUST have the sparklyr, igraph, and purrr packages installed to use this function
You MUST have an active spark_context named "sc"

Example selection of a spark table and graph
spark_table = tbl(sc, sql("select * from db.stock_samples_20m limit 100"))
outputs = spark_plot_kmeans(inputDF, kmean_model, plotMode="both")


GabeChurch/sparkedatools documentation built on June 25, 2019, 12:23 p.m.