A package for estimating preferential attachment and node fitness generative mechanisms in temporal complex networks.

Package: | PAFit |

Type: | Package |

Version: | 0.9.9.0 |

Authors: | Thong Pham, Paul Sheridan, Hidetoshi Shimodaira |

Maintainer: | Thong Pham thongpham@thongpham.net |

Date: | 2017-04-16 |

License: | GPL-3 |

The PAFit package provides a comprehensive framework to deal with growth mechanisms of temporal complex networks. In particular, it implements functions to simulate various temporal network models, gather essential network statistics from raw input data, and use these summarized statistics in the estimation of the attachment function *A_k* and node fitnesses *η_i*. The heavy computational parts of the package are implemented in `C++`

through the use of the Rcpp package. Furthermore, users with a multi-core machine can enjoy a hassle-free speed up through OpenMP parallelization mechanisms implemented in the code. Apart from the main functions, the package also includes a real-world collaboration network dataset between scientists in the field of complex networks (`coauthor.net`

). The main package functionalities are as follows.

Firstly, most well-known temporal network models based on the preferential attachment (PA) and node fitness mechanisms can be easily simulated using the package. PAFit implements `generate_BA`

for the Barabási-Albert (BA) model, `generate_ER`

for the growing Erdős–Rényi (ER) model, `generate_BB`

for the Bianconi-Barabási (BB) model and `generate_fit_only`

for the Caldarelli model. These functions have many customizable options, for example the number of new edges at each time-step are tunable stochastic variables. They are actually wrappers of the more powerful `generate_net`

function, which simulates networks with more flexible attachment function and node fitness settings. In any case, the output of these functions is a list with two fields: `graph`

and `fitness`

. The first field contains the temporal network in a three-column matrix format. Each row of this matrix is of the form `(id of source node, id of destination node, time_stamp)`

. The remaining `fitness`

field contains the true node fitnesses.

Secondly, the function `get_statistics`

efficiently collects all temporal network summary statistics. The input network is assumed to be stored as a three-column matrix where each row is of the form `(id of source node, id of destination node, time_stamp)`

, which is the same output format of simulation functions in the package. We note that `get_statistics`

automatically handles both directed and undirected networks. It returns a list containing many statistics that can be used to characterize the network growth process. Notable fields are `m_tk`

containing the number of new edges that connect to a degree-*k* node at time-step *t*, and `node_degree`

containing the degree sequence, i.e., the degree of each node at each time-step.

The most important functionality of the package is estimating the attachment function and node fitnesses of a temporal network. This is implemented through various methods. There are three usages: estimation of the attachment function in isolation, estimation of the node fitnesses in isolation, and the joint estimation of the attachment function and node fitnesses.

The functions for estimating the attachment function in isolation are:

`Jeong`

for Jeong's method (Ref. 1),`Newman`

for Newman's method (Ref. 2), and`only_A_estimate`

for the PAFit method (Ref. 3).For estimation of node fitnesses in isolation,

`only_F_estimate`

implements a variant of the PAFit method (Ref. 4).For the joint estimation of the attachment function and node fitnesses, we implement the full version of the PAFit method in

`joint_estimate`

(Ref. 4).

In all cases, the input of these functions is the output object of the function `get_statistics`

. The output object of these functions contains the estimation results as well as some additional information pertaining to the estimation process. The estimated attachment function and/or node fitnesses can be plotted by using the `plot`

command directly on this output object. This will visualize not only the estimated results but also the remaining uncertainties when possible.

Thong Pham thongpham@thongpham.net, Paul Sheridan, and Hidetoshi Shimodaira.

1. Jeong, H., Néda, Z. & Barabási, A. (2003). Measuring Preferential Attachment in Evolving Networks. Europhysics Letters 61(61):567-572. doi:10.1209/epl/i2003-00166-9 (http://iopscience.iop.org/article/10.1209/epl/i2003-00166-9/fulltext/).

2. Newman, M. (2001). Clustering and Preferential Attachment in Growing Networks. Physical Review E 64(2):025102. doi:10.1103/PhysRevE.64.025102 (https://journals.aps.org/pre/abstract/10.1103/PhysRevE.64.025102).

3. Pham, T., Sheridan, P. & Shimodaira, H. (2015). PAFit: A Statistical Method for Measuring Preferential Attachment in Temporal Complex Networks. PLoS ONE 10(9):e0137796. doi:10.1371/journal.pone.0137796 (http://dx.doi.org/10.1371/journal.pone.0137796).

4. Pham, T., Sheridan, P. & Shimodaira, H. (2016). Joint Estimation of Preferential Attachment and Node Fitness in Growing Complex Networks. Scientific Reports 6, Article number: 32558. doi:10.1038/srep32558 (www.nature.com/articles/srep32558).

See the acompanying vignette for a tutorial.

See also the GitHub page.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | ```
## Not run:
### Jointly estimate the attachment function and node fitnesses
library("PAFit")
# a Bianconi-Barabasi network
# size of initial network = 100
# number of new nodes at each time-step = 100
# Ak = k; inverse variance of distribution of fitness: s = 5
net <- generate_BB(N = 1000 , m = 50 ,
num_seed = 100 , multiple_node = 100,
s = 5)
net_stats <- get_statistics(net$graph)
#Joint estimation of attachment function Ak and node fitness
result <- joint_estimate(net$graph, net_stats)
summary(result)
# plot the estimated attachment function
plot(result, net_stats)
# true function
true_A <- pmax(result$estimate_result$center_k,1)
lines(result$estimate_result$center_k, true_A, col = "red") # true line
legend("topleft" , legend = "True function" , col = "red" , lty = 1 , bty = "n")
#plot distribution of estimated node fitnesses
plot(result, net_stats, plot = "f")
#plot the estimated node fitnesses and true node fitnesses
plot(result, net_stats, true = net$fitness, plot = "true_f")
## End(Not run)
``` |

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.

Please suggest features or report bugs with the GitHub issue tracker.

All documentation is copyright its authors; we didn't write any of that.

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.