A package for estimating preferential attachment and node fitness generative mechanisms in temporal complex networks.
|Authors:||Thong Pham, Paul Sheridan, Hidetoshi Shimodaira|
|Maintainer:||Thong Pham firstname.lastname@example.org|
The PAFit package provides a comprehensive framework to deal with growth mechanisms of temporal complex networks. In particular, it implements functions to simulate various temporal network models, gather essential network statistics from raw input data, and use these summarized statistics in the estimation of the attachment function A_k and node fitnesses η_i. The heavy computational parts of the package are implemented in
C++ through the use of the Rcpp package. Furthermore, users with a multi-core machine can enjoy a hassle-free speed up through OpenMP parallelization mechanisms implemented in the code. Apart from the main functions, the package also includes a real-world collaboration network dataset between scientists in the field of complex networks (
coauthor.net). The main package functionalities are as follows.
Firstly, most well-known temporal network models based on the preferential attachment (PA) and node fitness mechanisms can be easily simulated using the package. PAFit implements
generate_BA for the Barabási-Albert (BA) model,
generate_ER for the growing Erdős–Rényi (ER) model,
generate_BB for the Bianconi-Barabási (BB) model and
generate_fit_only for the Caldarelli model. These functions have many customizable options, for example the number of new edges at each time-step are tunable stochastic variables. They are actually wrappers of the more powerful
generate_net function, which simulates networks with more flexible attachment function and node fitness settings. In any case, the output of these functions is a list with two fields:
fitness. The first field contains the temporal network in a three-column matrix format. Each row of this matrix is of the form
(id of source node, id of destination node, time_stamp). The remaining
fitness field contains the true node fitnesses.
Secondly, the function
get_statistics efficiently collects all temporal network summary statistics. The input network is assumed to be stored as a three-column matrix where each row is of the form
(id of source node, id of destination node, time_stamp), which is the same output format of simulation functions in the package. We note that
get_statistics automatically handles both directed and undirected networks. It returns a list containing many statistics that can be used to characterize the network growth process. Notable fields are
m_tk containing the number of new edges that connect to a degree-k node at time-step t, and
node_degree containing the degree sequence, i.e., the degree of each node at each time-step.
The most important functionality of the package is estimating the attachment function and node fitnesses of a temporal network. This is implemented through various methods. There are three usages: estimation of the attachment function in isolation, estimation of the node fitnesses in isolation, and the joint estimation of the attachment function and node fitnesses.
The functions for estimating the attachment function in isolation are:
Jeong for Jeong's method (Ref. 1),
Newman for Newman's method (Ref. 2), and
only_A_estimate for the PAFit method (Ref. 3).
For estimation of node fitnesses in isolation,
only_F_estimate implements a variant of the PAFit method (Ref. 4).
For the joint estimation of the attachment function and node fitnesses, we implement the full version of the PAFit method in
joint_estimate (Ref. 4).
In all cases, the input of these functions is the output object of the function
get_statistics. The output object of these functions contains the estimation results as well as some additional information pertaining to the estimation process. The estimated attachment function and/or node fitnesses can be plotted by using the
plot command directly on this output object. This will visualize not only the estimated results but also the remaining uncertainties when possible.
Thong Pham email@example.com, Paul Sheridan, and Hidetoshi Shimodaira.
1. Jeong, H., Néda, Z. & Barabási, A. (2003). Measuring Preferential Attachment in Evolving Networks. Europhysics Letters 61(61):567-572. doi:10.1209/epl/i2003-00166-9 (http://iopscience.iop.org/article/10.1209/epl/i2003-00166-9/fulltext/).
2. Newman, M. (2001). Clustering and Preferential Attachment in Growing Networks. Physical Review E 64(2):025102. doi:10.1103/PhysRevE.64.025102 (https://journals.aps.org/pre/abstract/10.1103/PhysRevE.64.025102).
3. Pham, T., Sheridan, P. & Shimodaira, H. (2015). PAFit: A Statistical Method for Measuring Preferential Attachment in Temporal Complex Networks. PLoS ONE 10(9):e0137796. doi:10.1371/journal.pone.0137796 (http://dx.doi.org/10.1371/journal.pone.0137796).
4. Pham, T., Sheridan, P. & Shimodaira, H. (2016). Joint Estimation of Preferential Attachment and Node Fitness in Growing Complex Networks. Scientific Reports 6, Article number: 32558. doi:10.1038/srep32558 (www.nature.com/articles/srep32558).
See the acompanying vignette for a tutorial.
See also the GitHub page.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
## Not run: ### Jointly estimate the attachment function and node fitnesses library("PAFit") # a Bianconi-Barabasi network # size of initial network = 100 # number of new nodes at each time-step = 100 # Ak = k; inverse variance of distribution of fitness: s = 5 net <- generate_BB(N = 1000 , m = 50 , num_seed = 100 , multiple_node = 100, s = 5) net_stats <- get_statistics(net$graph) #Joint estimation of attachment function Ak and node fitness result <- joint_estimate(net$graph, net_stats) summary(result) # plot the estimated attachment function plot(result, net_stats) # true function true_A <- pmax(result$estimate_result$center_k,1) lines(result$estimate_result$center_k, true_A, col = "red") # true line legend("topleft" , legend = "True function" , col = "red" , lty = 1 , bty = "n") #plot distribution of estimated node fitnesses plot(result, net_stats, plot = "f") #plot the estimated node fitnesses and true node fitnesses plot(result, net_stats, true = net$fitness, plot = "true_f") ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.