Estimating the attachment function in isolation by PAFit method


This function estimates the attachment function A_k by PAFit method. The method has a hyper-parameter r. It first performs a cross-validation step to select the optimal parameter r for the regularization of A_k, then uses that r to estimate the attachment function with the full data.


only_A_estimate(net_object                             , 
                net_stat   = get_statistics(net_object), 
                p          = 0.75                      ,
                stop_cond  = 10^-8                     , 
                mode_reg_A = 0                         ,
                MLE        = FALSE                     ,



an object of class PAFit_net that contains the network.


An object of class PAFit_data which contains summerized statistics needed in estimation. This object is created by the function get_statistics. The default value is get_statistics(net_object).


Numeric. This is the ratio of the number of new edges in the learning data to that of the full data. The data is then divided into two parts: learning data and testing data based on p. The learning data is used to learn the node fitnesses and the testing data is then used in cross-validation. Default value is 0.75.


Numeric. The iterative algorithm stops when abs(h(ii) - h(ii + 1)) / (abs(h(ii)) + 1) < stop.cond where h(ii) is the value of the objective function at iteration ii. We recommend to choose stop.cond at most equal to 10^(- number of digits of h - 2), in order to ensure that when the algorithm stops, the increase in posterior probability is less than 1% of the current posterior probability. Default is 10^-8. This threshold is good enough for most applications.


Binary. Indicates which regularization term is used for A_k:

  • 0: This is the regularization term used in Ref. 1 and 2. Please refer to Eq. (4) in the tutorial for the definition of the term. It approximately enforces the power-law form A_k = k^\alpha. This is the default value.

  • 1: Unlike the default, this regularization term exactly enforces the functional form A_k = k^\alpha. Please refer to Eq. (6) in the tutorial for the definition of the term. Its main drawback is it is significantly slower to converge, while its gain over the default one is marginal in most cases.


Logical. If TRUE, then not perform cross-validation and estimate the PA function with r = 0, i.e., maximum likelihood estimation. Default is FALSE. One might want to set this option to TRUE when one believes that there are sufficient data to get a reasonable MLE result, or when one wants to compare the default, regularized result with the MLE result.


Other arguments to pass to the underlying algorithm.


Outputs a Full_PAFit_result object, which is a list containing the following fields:

  • cv_data: a CV_Data object which contains the cross-validation data. This is the final Normally the user does not need to pay attention to this data. NULL if MLE = TRUE.

  • cv_result: a CV_Result object which contains the cross-validation result. Normally the user does not need to pay attention to this data. NULL if MLE = TRUE.

  • estimate_result: this is a PAFit_result object which contains the estimated PA function and its confidence interval. It also includes the estimated attachment exponenent \alpha (assuming the model A_k = k^\alpha) in the field alpha, and the confidence interval of \alpha (in the field ci) when possible. In particular, the important fields are:

    • ratio: this is the selected value for the hyper-parameter r.

    • k and A: a degree vector and the estimated PA function.

    • var_A: the estimated variance of A.

    • var_logA: the estimated variance of log A.

    • upper_A: the upper value of the interval of two standard deviations around A.

    • lower_A: the lower value of the interval of two standard deviations around A.

    • center_k and theta: when we perform binning, these are the centers of the bins and the estimated PA values for those bins. theta is similar to A but with duplicated values removed.

    • var_bin: the variance of theta. Same as var_A but with duplicated values removed.

    • upper_bin: the upper value of the interval of two standard deviations around theta. Same as upper_A but with duplicated values removed.

    • lower_lower: the lower value of the interval of two standard deviations around theta. Same as lower_A but with duplicated values removed.

    • g: the number of bins used.

    • alpha and ci: alpha is the estimated attachment exponenet \alpha (when assume A_k = k^\alpha), while ci is the confidence interval.

    • loglinear_fit: this is the fitting result when we estimate \alpha.

    • objective_value: values of the objective function over iterations in the final run with the full data.

    • diverge_zero: logical value indicates whether the algorithm diverged in the final run with the full data.


## Not run: 
  #### Example 1: Linear preferential attachment  #########
  # a network from BA model
  net        <- generate_net(N = 1000 , m = 50 , mode = 1, alpha = 1, s = 0)
  net_stats  <- get_statistics(net, only_PA = TRUE)
  result     <- only_A_estimate(net, net_stats)
  # plot the estimated attachment function
  plot(result, net_stats)
  # true function
  true_A     <- result$estimate_result$center_k
  lines(result$estimate_result$center_k, true_A, col = "red") # true line
  legend("topleft" , legend = "True function" , col = "red" , lty = 1 , bty = "n")
  #### Example 2: a non-log-linear preferential attachment  #########
  # A_k = alpha* log (max(k,1))^beta + 1, with alpha = 2, and beta = 2
  net        <- generate_net(N = 1000 , m = 50 , mode = 3, alpha = 2, beta = 2, s = 0)
  net_stats  <- get_statistics(net,only_PA = TRUE)
  result     <- only_A_estimate(net, net_stats)
  # plot the estimated attachment function
  plot(result, net_stats)
  # true function
  true_A     <- 2 * log(pmax(result$estimate_result$center_k,1))^2 + 1 # true function
  lines(result$estimate_result$center_k, true_A, col = "red") # true line
  legend("topleft" , legend = "True function" , col = "red" , lty = 1 , bty = "n")
  #### Example 3: another non-log-linear preferential attachment kernel ############
  # A_k = min(max(k,1),sat_at)^alpha, with alpha = 1, and sat_at = 200
  # inverse variance of the distribution of node fitnesse = 10
  net        <- generate_net(N = 1000 , m = 50 , mode = 2, alpha = 1, sat_at = 200, s = 0)
  net_stats  <- get_statistics(net, only_PA = TRUE)
  result     <- only_A_estimate(net, net_stats)
  # plot the estimated attachment function
  true_A     <- pmin(pmax(result$estimate_result$center_k,1),200)^1 # true function
  plot(result , net_stats, max_A = max(true_A,result$estimate_result$theta))
  lines(result$estimate_result$center_k, true_A, col = "red") # true line
  legend("topleft" , legend = "True function" , col = "red" , lty = 1 , bty = "n")
## End(Not run)

