This is a proof of concept extension package for sparklyr that demonstrates creating an R front-end for a Spark package (in this case Sparking Water from H2O).
This package implements only the most basic functionality (creating an H2OContext, showing the H2O Flow interface, and converting a Spark DataFrame to an H2O Frame). Note that the package won't be developed further since it's just a demonstration.
First we connect to Spark. The call to library(sparklingwater)
will make the H2O functions available on the R search path and will also ensure that the dependencies required by the Sparkling Water package are included when we connect to Spark.
library(sparklyr) library(sparklingwater) sc <- spark_connect(master = "local")
The call to library(sparklingwater)
automatically registered the Sparkling Water extension, which in turn specified that the Sparkling Water Spark package should be made available for Spark connections. Let's inspect the H2OContext for our Spark connection:
h2o_context(sc)
We can also view the H2O Flow web UI:
h2o_flow(sc)
Let's copy the mtcars dataset to to Spark so we can access it from Sparkling Water:
library(dplyr) mtcars_tbl <- copy_to(sc, mtcars, overwrite = TRUE) mtcars_tbl
The use case we'd like to enable is calling the H2O algorithms and feature transformers directly on Spark DataFrames that we've manipulated with dplyr. This is indeed supported by the Sparkling Water package. Here though we'll just convert the Spark DataFrame into an H2O Frame to prove that it's possible:
mtcars_hf <- h2o_frame(mtcars_tbl) mtcars_hf
Now we disconnect from Spark, this will result in the H2OContext being stopped as well since it's owned by the spark shell process used by our Spark connection:
spark_disconnect(sc)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.