knitr::opts_chunk$set(eval = TRUE) knitr::opts_chunk$set(warning = FALSE) knitr::opts_chunk$set(fig.path = "tools/readme/", dev = "png")
For those already using sparklyr
simply run:
devtools::install_github("kevinykuo/sparklygraphs")
Otherwise, install first sparklyr
from CRAN using:
install.packages("sparklyr")
The examples make use of the highschool
dataset from the ggplot
package.
We will calculate PageRank over the highschool
dataset as follows:
library(sparklygraphs) library(sparklyr) library(dplyr) # connect to spark using sparklyr sc <- spark_connect(master = "local", version = "2.1.0") # copy highschool dataset to spark highschool_tbl <- copy_to(sc, ggraph::highschool, "highschool") # create a table with unique vertices using dplyr vertices_tbl <- sdf_bind_rows( highschool_tbl %>% distinct(from) %>% transmute(id = from), highschool_tbl %>% distinct(to) %>% transmute(id = to) ) # create a table with <source, destination> edges edges_tbl <- highschool_tbl %>% transmute(src = from, dst = to) gf_graphframe(vertices_tbl, edges_tbl) %>% gf_pagerank(reset_prob = 0.15, max_iter = 10L, source_id = "1")
Appart from calculating PageRank
using gf_pagerank
, the following functions are available:
For instance, one can calcualte the degrees of vertices using gf_degrees
as follows:
gf_graphframe(vertices_tbl, edges_tbl) %>% gf_degrees()
In order to visualize large sparklygraphs
, one can use sample_n
and then use ggraph
with igraph
to visualize the graph as follows:
library(ggraph) library(igraph) graph <- highschool_tbl %>% sample_n(20) %>% collect() %>% graph_from_data_frame() ggraph(graph, layout = 'kk') + geom_edge_link(aes(colour = factor(year))) + geom_node_point() + ggtitle('An example')
Finally, we disconnect from Spark:
spark_disconnect(sc)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.