PAFit: Joint inference of preferential attachment and node fitness...

Description Usage Arguments Value Author(s) References Examples

Description

From a PAFit_data object, which contains summary statistics of the dataset, PAFit estimates the attachment function A_k and node fitness η_i by penalized log-likelihood maximization. It also infers the remaining uncertainties in the estimated results by approximating the confidence intervals of A_k and η_i.

Estimation of either the attachment function or node fitness in isolation are also supported. Estimation of the PA function with η_i = 1 can be specified by setting only_PA = TRUE. Estimation of node fitness with either A_k = k or A_k = 1 can be specified by setting only_f = TRUE.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
PAFit (net_stat, 
       only_PA        = FALSE       , only_f         = FALSE       , 
       mode_f         = "Linear_PA" ,
       true_A         = NULL        , true_f         = NULL        , 
       
       mode_reg_A     = 0           , weight_PA_mode = 1           ,
       s              = 10          , lambda         = 1           , 
       auto_lambda    = TRUE        ,  r             = 0.01        , 
       
       alpha_start    = 1           , start_mode_A   = "Log_linear", 
       start_mode_f   = "Constant"  ,
       
       auto_stop      = TRUE        , stop_cond      = 10^-7       , 
       iteration      = 200         , max_iter       = 2e+05       , 
       debug          = FALSE       , q              = 1           ,   
       step_size      = 0.5         ,
      
       normalized_f   = FALSE       , interpolate    = FALSE)

Arguments

The parameters can be divided into five groups based on what they specify.

First group specifies basic instructions for the algorithm:

net_stat

An object of class "PAFit_data" containing all the summary statistics summerized from the data by the function GetStatistics.

only_PA

Logical. TRUE means that the attachment function A_k is estimated in isolation(fixing η_i = 1). Default is FALSE.

only_f

Logical. TRUE means that the fitness function is estimated in isolation. Default is FALSE.

mode_f

String. Possible values: "Linear_PA", "Constant_PA" or "Log_linear". In the first two cases, the PA function is fixed. If mode_f == "Linear_PA" then A_k = k for k ≥ 1 and A_0 = 1. If mode_f == "Constant_PA" then A_k = 1 for all k. In the final case of mode_f == "Log_linear", we set A_k = k^α for k ≥ 1 and A_0 = 1. The value of α is also estimated. Default values is "Linear_PA".

true_A

Numeric vector. User-supplemented value of the PA function. If true_A is supplemented, then only node fitnesses are estimated.

true_f

Numeric vector. User-supplemented value of node fitnesses. If true_f is supplemented, then only the PA function is estimated.

Second group specifies the objective function, e.g. the weighting of PA, the regularization terms for PA and fitness:

mode_reg_A

Integer. Possible values: 0, 1 or 2. Indicates which regularization term is used for the PA function. For the regularization function used in the PLOS ONE and SR paper, use 0. Default value is 0.

weight_PA_mode

Binary. Indicates how the regularization terms for A_k are weighted. If weight_PA_mode == 0, the regularization term for A_k is weighted by the total number of edges connected to degree k nodes. If weight_PA_mode == 1, the regularization terms have uniform weights. Default value is 0.

s

Positive numeric. The regularization parameter s for node fitness. Default value is 10.

lambda

Non-negative numeric. The absolute strength of the regularization for PA function. Ignored when auto_lambda == TRUE. Default value is 1. lambda == 0 means no regularization for PA.

auto_lambda

Logical. If auto_lambda == TRUE, lambda will be determined automatically from the data by r. Default is TRUE.

r

Non-negative numeric. The regularization parameter r for the PA function indicates the relative strength of the regularization term. From r, the value of lambda is automatically determined if auto_lambda == TRUE. Default value is 0.01.

Third group specifies the initial value of PA and fitness:

alpha_start

Non-negative numeric. The starting value for α when we use the model k^α. Default value is 1.

start_mode_A

String. Takes one of two values: "Log_linear" (the initial PA function set to k^alpha_start) or "Random" (the initial function is randomly sampled from a uniform distribution). Default value is "Log_linear".

start_mode_f

String. Takes one of two values: "Constant" (the initial node fitnesses are all set to 1) or "Random" (the initial node fitnesses are randomly sampled from a gamma distribution). Default value is "Constant".

Fourth group concerns the iterative process:

auto_stop

Logical. Indicates whether the algorithm stop automatically or not. Default is TRUE

stop_cond

Numeric. If auto_stop = TRUE, the iterative algorithm stops when abs(h(ii) - h(ii + 1)) / (abs(h(ii)) + 1) < stop_cond where h(ii) is the value of the objective function at iteration ii. We recommend to choose stop_cond at most equal to 10^(- number of digits of h - 2), in order to ensure that when the algorithm stops, the increase in posterior probability is less than 1% of the current posterior probability. Default is 10^-7.

iteration

Integer. The number of iterations. Ignored if auto_stop == TRUE. Default value is 20.

max_iter

Integer. The maximum number of iterations. Regardless of other settings, the algorithm will stop once the number of iterations reaches this threshold. Default value is 2e+05.

debug

Logical. if debug == TRUE, the value of the objective function h is printed out at each step. Defaule is FALSE.

q

Integer. Indicates numbers of previous steps using in the quasi-Newton speedup. Ignored if q <= 1. Defaule is 1.

step_size

Numeric. A number between (0,1] to indicate the step-size of the quasi-Newton speedup. Ignored (no quasi-Newton speedup) if q <= 1. Defaule is 0.5.

Final group gives some additional instructions:

normalized_f

Logical. Indicates whether we should normalize the estimated value of node fitness after estimation. Default value is FALSE.

interpolate

Logical. Indicates whether we should perform interpolation for the missing gaps in the estimated A_k. The interpolation, if performed, is a linear regression on log-scale. Default value is FALSE.

Value

an object of class "PAFit_result", which is a list. Some important fields can be divided into five groups.

The first group gives the estimated preferential attachment function:

k

The observed degree vector

A

The estimated attachment function corresponding to k

center_k

The logarithmic center of the bins

theta

Preferential attachment value corresponding to center_k (before mapping back to A_k)

weight_of_A

The number of A in each bin

loglinear_fit

Result of fitting the log-linear model log A_k = α log k + C to the estimated A_k

alpha

The estimated attachment exponent of the log-linear model A_k =k^α

ci

The confidence interval of the attachment exponent. It is two-sigma. When mode_f != "Log_linear", this confidence interval is estimated from the log_linear fit (fitting log k to log A_k) using confint function, so it has a popular meaning as a 95-percentage confidence interval.

alpha_series

The series of α over iterations if mode_f == "Log_linear"

The second group gives the confidence intervals of the estimated PA function:

var_A

Variances of the estimated A

var_logA

Variances of log(A)

upper_A

The upper value of the two-sigma confidence interval of A

lower_A

The lower value of the two-sigma confidence interval of A

upper_bin

The upper value of the two-sigma confidence interval of theta

lower_bin

The lower value of the two-sigma confidence interval of theta

The third group gives the estimated node fitnesses:

f

The estimated node fitnesses η

The fourth group gives the confidence intervals of the estimated node fitnesses:

var_f

Variances of the estimated node fitnesses

upper_f

The upper value of the two-sigma confidence interval of node fitness η

lower_f

The lower value of the two-sigma confidence interval of node fitness η

The final group gives additional information on the iterative process:

objective_value

Values of the objective function h (posterior probability in log-scale) recorded at each iteration

Author(s)

Thong Pham thongpham@thongpham.net

References

1. Pham, T., Sheridan, P. & Shimodaira, H. (2016). Nonparametric Estimation of the Preferential Attachment Function in Complex Networks: Evidence of Deviations from Log Linearity, Proceedings of ECCS 2014, 141-153 (Springer International Publishing) (http://dx.doi.org/10.1007/978-3-319-29228-1_13).

2. Pham, T., Sheridan, P. & Shimodaira, H. (2015). PAFit: A Statistical Method for Measuring Preferential Attachment in Temporal Complex Networks. PLoS ONE 10(9): e0137796. doi:10.1371/journal.pone.0137796 (http://dx.doi.org/10.1371/journal.pone.0137796).

3. Pham, T., Sheridan, P. & Shimodaira, H. (2016). Joint Estimation of Preferential Attachment and Node Fitness in Growing Complex Networks. Scientific Reports 6, Article number: 32558. doi:10.1038/srep32558 (www.nature.com/articles/srep32558).

Examples

1
2
3
4
5
library("PAFit")
net        <- GenerateNet(N = 50,m = 10, mode = 1, alpha = 0.5, shape = 100, rate = 100)
net_stats  <- GetStatistics(net$graph)
result     <- PAFit(net_stats, r = 0.01, s = 100)
summary(result)


Search within the PAFit package
Search all R packages, documentation and source code

Questions? Problems? Suggestions? or email at ian@mutexlabs.com.

Please suggest features or report bugs with the GitHub issue tracker.

All documentation is copyright its authors; we didn't write any of that.