In RoheLab/aPPR: Approximate Personalized PageRank

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%",
  error = TRUE
)

aPPR

aPPR helps you calculate approximate personalized pageranks from large graphs, including those that can only be queried via an API. aPPR additionally performs degree correction and regularization, allowing you to recover blocks from stochastic blockmodels.

To learn more about aPPR you can:

Glance through slides from the JSM2021 talk
Read the accompanying paper

Installation

You can install the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("RoheLab/aPPR")

Find the personalized pagerank of a node in an `igraph` graph

library(aPPR)
library(igraph)

set.seed(27)

erdos_renyi_graph <- sample_gnp(n = 100, p = 0.5)

erdos_tracker <- appr(
  erdos_renyi_graph,   # the graph to work with
  seeds = "5",         # name of seed node (character)
  epsilon = 0.0005     # desired approximation quality (see ?appr)
)

erdos_tracker

You can access the Personalized PageRanks themselves via the stats field of Tracker objects.

erdos_tracker$stats

Sometimes you may wish to limit computation time by limiting the number of nodes to visit, which you can do as follows:

limited_visits_tracker <- appr(
  erdos_renyi_graph,   
  seeds = "5",         
  epsilon = 1e-10,     
  max_visits = 20      # max unique nodes to visit during approximation
)

limited_visits_tracker

Find the personalized pagerank of a Twitter user using `rtweet`

ftrevorc_ppr <- appr(
  rtweet_graph(),
  "ftrevorc",
  epsilon = 1e-4,
  max_visits = 5
)

ftrevorc_ppr

Logging

aPPR uses logger for displaying information to the user. By default, aPPR is quite verbose. You can control verbosity by loading logger and setting the logging threshold.

library(logger)

# hide basically all messages (not recommended)
log_threshold(FATAL, namespace = "aPPR")

appr(
  erdos_renyi_graph,   # the graph to work with
  seeds = "5",         # name of seed node (character)
  epsilon = 0.0005     # desired approximation quality (see ?appr)
)

If you submit a bug report, please please please include a log file using the TRACE threshold. You can set up this kind of detailed logging via the following:

set.seed(528491)  # be sure to set seed for bug reports

log_appender(
  appender_file(
    "/path/to/logfile.log"  ## TODO: choose a path to log to
  ),
  namespace = "aPPR"
)

log_threshold(TRACE, namespace = "aPPR")

tracker <- appr(
  rtweet_graph(),
  seed = c("hadleywickham", "gvanrossum"),
  epsilon = 1e-6
)

Ethical considerations

People have a right to choose how public and discoverable their information is. aPPR will often lead you to accounts that interesting, but also small and out of sight. Do not change the public profile or attention towards these the people running these accounts, or any other accounts, without their permission.

References

Chen, Fan, Yini Zhang, and Karl Rohe. “Targeted Sampling from Massive Block Model Graphs with Personalized PageRank.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 82, no. 1 (February 2020): 99–126. https://doi.org/10.1111/rssb.12349. arxiv
Andersen, Reid, Fan Chung, and Kevin Lang. “Local Graph Partitioning Using PageRank Vectors.” In 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS’06), 475–86. Berkeley, CA, USA: IEEE, 2006. https://doi.org/10.1109/FOCS.2006.44.

RoheLab/aPPR documentation built on July 1, 2024, 1:57 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com