knitr::opts_chunk$set(eval = TRUE)
knitr::opts_chunk$set(warning = FALSE)
knitr::opts_chunk$set(fig.path = "tools/readme/", dev = "png")

Build Status Coverage status CRAN status

Installation

For those already using sparklyr simply run:

install.packages("graphframes")
# or, for the development version,
# devtools::install_github("rstudio/graphframes")

Otherwise, install first sparklyr from CRAN using:

install.packages("sparklyr")

The examples make use of the highschool dataset from the ggplot package.

Getting Started

We will calculate PageRank over the built-in "friends" dataset as follows.

library(graphframes)
library(sparklyr)
library(dplyr)

# connect to spark using sparklyr
sc <- spark_connect(master = "local", version = "2.3.0")

# obtain the example graph
g <- gf_friends(sc)

# compute PageRank
results <- gf_pagerank(g, tol = 0.01, reset_probability = 0.15)
results

We can then visualize the results by collecting the results to R:

library(tidygraph)
library(ggraph)

vertices <- results %>%
  gf_vertices() %>%
  collect()

edges <- results %>%
  gf_edges() %>%
  collect()

edges %>%
  as_tbl_graph() %>%
  activate(nodes) %>%
  left_join(vertices, by = c(name = "id")) %>%
  ggraph(layout = "nicely") +
  geom_node_label(aes(label = name.y, color = pagerank)) +
  geom_edge_link(
    aes(
      alpha = weight,
      start_cap = label_rect(node1.name.y),
      end_cap = label_rect(node2.name.y)
    ),
    arrow = arrow(length = unit(4, "mm"))
  ) +
  theme_graph(fg_text_colour = 'white')

Further Reading

Appart from calculating PageRank using gf_pagerank, many other functions are available, including:

For instance, one can calculate the degrees of vertices using gf_degrees as follows:

gf_friends(sc) %>% gf_degrees()

Finally, we disconnect from Spark:

spark_disconnect(sc)


rstudio/graphframes documentation built on May 17, 2019, 8:44 p.m.