pagerank: Estimate PageRank

Description Usage Arguments Details Value References Examples

View source: R/06_PageRank.R

Description

Estimate PageRank (centrality scores) of nodes from an edge list or adjacency matrix. If data is a bipartite graph, estimates PageRank based on a one-mode projection of the input. If the data is an edge list, returns ranks ordered by the unique values in the supplied edge list (first by unique senders, then by unique receivers).

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
pagerank(
  data,
  is_bipartite = TRUE,
  project_mode = c("rows", "columns"),
  sender_name = NULL,
  receiver_name = NULL,
  weight_name = NULL,
  rm_weights = FALSE,
  duplicates = c("add", "remove"),
  return_data_frame = TRUE,
  alpha = 0.85,
  max_iter = 200,
  tol = 1e-04,
  verbose = FALSE
)

Arguments

data

Data to use for estimating PageRank. Can contain unipartite or bipartite graph data, either formatted as an edge list (class data.frame, data.table, or tibble (tbl_df)) or as an adjacency matrix (class matrix or dgCMatrix).

is_bipartite

Indicate whether input data is bipartite (rather than unipartite/one-mode). Defaults to TRUE.

project_mode

Mode for which to return PageRank estimates. Parameter ignored if is_bipartite = FALSE. Defaults to "rows" (the first column of an edge list).

sender_name

Name of sender column. Parameter ignored if data is an adjacency matrix. Defaults to first column of edge list.

receiver_name

Name of sender column. Parameter ignored if data is an adjacency matrix. Defaults to the second column of edge list.

weight_name

Name of edge weights. Parameter ignored if data is an adjacency matrix. Defaults to edge weights = 1.

rm_weights

Removes edge weights from graph object before estimating PageRank. Defaults to FALSE.

duplicates

How to treat duplicate edges if any in data. Parameter ignored if data is an adjacency matrix. If option "add" is selected, duplicated edges and corresponding edge weights are collapsed via addition. Otherwise, duplicated edges are removed and only the first instance of a duplicated edge is used. Defaults to "add".

return_data_frame

Return results as a data frame with node names in the first column and ranks in the second column. If set to FALSE, the function just returns a named vector of ranks. Defaults to TRUE.

alpha

Dampening factor. Defaults to 0.85.

max_iter

Maximum number of iterations to run before model fails to converge. Defaults to 200.

tol

Maximum tolerance of model convergence. Defaults to 1.0e-4.

verbose

Show the progress of this function. Defaults to FALSE.

Details

The default optional arguments are likely well-suited for most users. However, it is critical to change the is.bipartite function to FALSE when working with one mode data. In addition, when estimating PageRank in unipartite edge lists that contain nodes with outdegrees or indegrees equal to 0, it is recommended that users append self-ties to the edge list to ensure that the returned PageRank estimates are ordered intuitively.

Value

A dataframe containing each node name and node rank. If return_data_frame changed to FALSE or input data is classed as an adjacency matrix, returns a vector of node ranks. Does not return node ranks for isolates.

References

Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. "The pagerank citation ranking: Bringing order to the web". Technical report, Stanford InfoLab, 1999

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
#Prepare one-mode data
    df_one_mode <- data.frame(
      sender = sample(x = 1:10000, size = 10000, replace = TRUE),
      receiver = sample(x = 1:10000, size = 10000, replace = TRUE)
    )

#Add self-loops for all nodes
    unique_ids <- unique(c(df_one_mode$sender, df_one_mode$receiver))
    df_one_mode <- rbind(df_one_mode, data.frame(sender = unique_ids,
    receiver = unique_ids))

#Estimate PageRank in one-mode data
    PageRank <- pagerank(data = df_one_mode, is_bipartite = FALSE)

#Estimate PageRank in two-mode data
    df_two_mode <- data.frame(
      patient_id = sample(x = 1:10000, size = 10000, replace = TRUE),
      provider_id = sample(x = 1:5000, size = 10000, replace = TRUE)
    )
    PageRank <- pagerank(data = df_two_mode)

BrianAronson/birankr documentation built on July 13, 2020, 1:19 p.m.