PAFit-package: Generative Mechanism Estimation in Temporal Complex Networks

Description Details Author(s) References See Also Examples

Description

A package for estimating preferential attachment and node fitness generative mechanisms in temporal complex networks.

Details

Package: PAFit
Type: Package
Version: 0.9.9.0
Authors: Thong Pham, Paul Sheridan, Hidetoshi Shimodaira
Maintainer: Thong Pham thongpham@thongpham.net
Date: 2017-04-16
License: GPL-3

The PAFit package provides a comprehensive framework to deal with growth mechanisms of temporal complex networks. In particular, it implements functions to simulate various temporal network models, gather essential network statistics from raw input data, and use these summarized statistics in the estimation of the attachment function A_k and node fitnesses η_i. The heavy computational parts of the package are implemented in C++ through the use of the Rcpp package. Furthermore, users with a multi-core machine can enjoy a hassle-free speed up through OpenMP parallelization mechanisms implemented in the code. Apart from the main functions, the package also includes a real-world collaboration network dataset between scientists in the field of complex networks (coauthor.net). The main package functionalities are as follows.

Firstly, most well-known temporal network models based on the preferential attachment (PA) and node fitness mechanisms can be easily simulated using the package. PAFit implements generate_BA for the Barabási-Albert (BA) model, generate_ER for the growing Erdős–Rényi (ER) model, generate_BB for the Bianconi-Barabási (BB) model and generate_fit_only for the Caldarelli model. These functions have many customizable options, for example the number of new edges at each time-step are tunable stochastic variables. They are actually wrappers of the more powerful generate_net function, which simulates networks with more flexible attachment function and node fitness settings. In any case, the output of these functions is a list with two fields: graph and fitness. The first field contains the temporal network in a three-column matrix format. Each row of this matrix is of the form (id of source node, id of destination node, time_stamp). The remaining fitness field contains the true node fitnesses.

Secondly, the function get_statistics efficiently collects all temporal network summary statistics. The input network is assumed to be stored as a three-column matrix where each row is of the form (id of source node, id of destination node, time_stamp), which is the same output format of simulation functions in the package. We note that get_statistics automatically handles both directed and undirected networks. It returns a list containing many statistics that can be used to characterize the network growth process. Notable fields are m_tk containing the number of new edges that connect to a degree-k node at time-step t, and node_degree containing the degree sequence, i.e., the degree of each node at each time-step.

The most important functionality of the package is estimating the attachment function and node fitnesses of a temporal network. This is implemented through various methods. There are three usages: estimation of the attachment function in isolation, estimation of the node fitnesses in isolation, and the joint estimation of the attachment function and node fitnesses.

  • The functions for estimating the attachment function in isolation are: Jeong for Jeong's method (Ref. 1), Newman for Newman's method (Ref. 2), and only_A_estimate for the PAFit method (Ref. 3).

  • For estimation of node fitnesses in isolation, only_F_estimate implements a variant of the PAFit method (Ref. 4).

  • For the joint estimation of the attachment function and node fitnesses, we implement the full version of the PAFit method in joint_estimate (Ref. 4).

In all cases, the input of these functions is the output object of the function get_statistics. The output object of these functions contains the estimation results as well as some additional information pertaining to the estimation process. The estimated attachment function and/or node fitnesses can be plotted by using the plot command directly on this output object. This will visualize not only the estimated results but also the remaining uncertainties when possible.

Author(s)

Thong Pham thongpham@thongpham.net, Paul Sheridan, and Hidetoshi Shimodaira.

References

1. Jeong, H., Néda, Z. & Barabási, A. (2003). Measuring Preferential Attachment in Evolving Networks. Europhysics Letters 61(61):567-572. doi:10.1209/epl/i2003-00166-9 (http://iopscience.iop.org/article/10.1209/epl/i2003-00166-9/fulltext/).

2. Newman, M. (2001). Clustering and Preferential Attachment in Growing Networks. Physical Review E 64(2):025102. doi:10.1103/PhysRevE.64.025102 (https://journals.aps.org/pre/abstract/10.1103/PhysRevE.64.025102).

3. Pham, T., Sheridan, P. & Shimodaira, H. (2015). PAFit: A Statistical Method for Measuring Preferential Attachment in Temporal Complex Networks. PLoS ONE 10(9):e0137796. doi:10.1371/journal.pone.0137796 (http://dx.doi.org/10.1371/journal.pone.0137796).

4. Pham, T., Sheridan, P. & Shimodaira, H. (2016). Joint Estimation of Preferential Attachment and Node Fitness in Growing Complex Networks. Scientific Reports 6, Article number: 32558. doi:10.1038/srep32558 (www.nature.com/articles/srep32558).

See Also

See the acompanying vignette for a tutorial.

See also the GitHub page.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
## Not run: 
  ### Jointly estimate the attachment function and node fitnesses
   library("PAFit")
  # a Bianconi-Barabasi network 
  # size of initial network = 100
  # number of new nodes at each time-step = 100
  # Ak = k; inverse variance of distribution of fitness: s = 5
  net        <- generate_BB(N        = 1000 , m             = 50 , 
                            num_seed = 100  , multiple_node = 100,
                            s        = 5)
  net_stats  <- get_statistics(net$graph)
  
  #Joint estimation of attachment function Ak and node fitness
  result     <- joint_estimate(net$graph, net_stats)
  
  summary(result)
  
  # plot the estimated attachment function
  plot(result, net_stats)
  
  # true function
  true_A     <- pmax(result$estimate_result$center_k,1)
  lines(result$estimate_result$center_k, true_A, col = "red") # true line
  legend("topleft" , legend = "True function" , col = "red" , lty = 1 , bty = "n")
  #plot distribution of estimated node fitnesses
  plot(result, net_stats, plot = "f")
  
  #plot the estimated node fitnesses and true node fitnesses
  plot(result, net_stats, true = net$fitness, plot = "true_f")

## End(Not run)


Search within the PAFit package
Search all R packages, documentation and source code

Questions? Problems? Suggestions? or email at ian@mutexlabs.com.

Please suggest features or report bugs with the GitHub issue tracker.

All documentation is copyright its authors; we didn't write any of that.