rSHARK provides an R library for creating Apache Spark dataframes with SmartSHARK data. The data can then be analyzed with SparkR.
Two prerequisites are required:
Once SparkR is running, rSHARK can be installed directly from this repository using the devtools:
if (!require("devtools")) install.packages("devtools")
library(devtools)
install_github("smartshark/rSHARK")
You first need to load SparkR and rSHARK:
library(SparkR)
library(rSHARK)
Then, you must create a Spark session:
sparkSession <- sparkR.session(master=SPARK_MASTER,
sparkConfig=list(spark.driver.extraClassPath="SPARKSHARK_JAR",
spark.driver.extraLibraryPath="SPARKSHARK_JAR",
spark.driver.extraJavaOptions=JAVA_OPTIONS),
sparkJars=SPARKSHARK_JAR)
You have to replace SPARK_MASTER
, SPARKSHARK_JAR
, and JAVA_OPTIONS
with the correct values:
- SPARK_MASTER
: the address of the Spark master (see here)
- SPARKSHARK_JAR
: location of the sparkSHARK.jar file.
- JAVA_OPTIONS
: arguments to to the JVM used for the Spark execution. These arguments should be used to setup the data base connection, as described here.
Using the Spark session, you can initialize the database utilities for accessing the data:
mongoDBUtils <- rShark.createMongoDBUtils(sparkSession)
You can then use the rShark.loadData()
and rShark.loadDataLogical()
commands for accessing the data.
Please find some helpful code snippets below. Complete rSHARK Jobs can be found here.
SPARK_MASTER <- local[*]
SPARKSHARK_JAR <- "/users/jsmith/jars/sparkSHARK.jar"
JAVA_OPTIONS <- paste("-Dspark.exectutorEnv.dbtuils.type=mongo",
"-Dspark.executorEnv.mongo.uri=localhost",
"-Dspark.executorEnv.mongo.port=27017",
"-Dspark.executorEnv.mongo.dbname=smartshark")
sparkSession <- sparkR.session(master=SPARK_MASTER,
sparkConfig=list(spark.driver.extraClassPath=SPARKSHARK_JAR,
spark.driver.extraLibraryPath="SPARKSHARK_JAR",
spark.driver.extraJavaOptions=JAVA_OPTIONS),
sparkJars=SPARKSHARK_JAR)
mongoDBUtils <- rShark.createMongoDBUtils(sparkSession)
SPARK_MASTER <- spark://YOURHOST:YOURPORT
SPARKSHARK_JAR <- "/users/jsmith/jars/sparkSHARK.jar"
JAVA_OPTIONS <- paste("-Dspark.exectutorEnv.dbtuils.type=mongo",
"-Dspark.executorEnv.mongo.uri=http://somehost/",
"-Dspark.executorEnv.mongo.port=27017",
"-Dspark.executorEnv.mongo.dbname=smartshark",
"-Dspark.executorEnv.mongo.useauth=true",
"-Dspark.executorEnv.mongo.username=USER",
"-Dspark.executorEnv.mongo.authdb=admin",
"-Dspark.executorEnv.mongo.password=PASSWORD")
sparkSession <- sparkR.session(master=SPARK_MASTER,
sparkConfig=list(spark.driver.extraClassPath=SPARKSHARK_JAR,
spark.driver.extraLibraryPath="SPARKSHARK_JAR",
spark.driver.extraJavaOptions=JAVA_OPTIONS),
sparkJars=SPARKSHARK_JAR)
mongoDBUtils <- rShark.createMongoDBUtils(sparkSession)
# Load all data from the commit collection
rShark.loadData(mongoDBUtils, "commit")
# Loads the document ID and the product metrics available for Java classes
# from the entity_state collection
rShark.loadDataLogical(mongoDBUtils,
"code_entity_state",
list(c("AbstractionLevel"), c("ProductMetric", "JavaClass")))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.