Description Usage Arguments Value Author(s) References See Also Examples
View source: R/joint_estimate.R
This function jointly estimates the attachment function A_k and node fitnesses η_i. It first performs a cross-validation to select the optimal parameters r and s, then estimates A_k and eta_i using that optimal pair with the full data (Ref. 2).
1 2 3 4 5 6 | joint_estimate(net_object ,
net_stat = get_statistics(net_object),
p = 0.75 ,
stop_cond = 10^-8 ,
mode_reg_A = 0 ,
...)
|
net_object |
an object of class |
net_stat |
An object of class |
p |
Numeric. This is the ratio of the number of new edges in the learning data to that of the full data. The data is then divided into two parts: learning data and testing data based on |
stop_cond |
Numeric. The iterative algorithm stops when abs(h(ii) - h(ii + 1)) / (abs(h(ii)) + 1) < stop.cond where h(ii) is the value of the objective function at iteration ii. We recommend to choose |
mode_reg_A |
Binary. Indicates which regularization term is used for A_k:
|
... |
Outputs a Full_PAFit_result
object, which is a list containing the following fields:
cv_data
: a CV_Data
object which contains the cross-validation data. This is the testing data.
cv_result
: a CV_Result
object which contains the cross-validation result. Normally the user does not need to pay attention to this data.
estimate_result
: this is a PAFit_result
object which contains the estimated attachment function A_k, the estimated fitnesses η_i and their confidence intervals. In particular, the important fields are:
ratio
: this is the selected value for the hyper-parameter r.
shape
: this is the selected value for the hyper-parameter s.
k
and A
: a degree vector and the estimated PA function.
var_A
: the estimated variance of A.
var_logA
: the estimated variance of log A.
upper_A
: the upper value of the interval of two standard deviations around A.
lower_A
: the lower value of the interval of two standard deviations around A.
center_k
and theta
: when we perform binning, these are the centers of the bins and the estimated PA values for those bins. theta
is similar to A
but with duplicated values removed.
var_bin
: the variance of theta
. Same as var_A
but with duplicated values removed.
upper_bin
: the upper value of the interval of two standard deviations around theta
. Same as upper_A
but with duplicated values removed.
lower_bin
: the lower value of the interval of two standard deviations around theta
. Same as lower_A
but with duplicated values removed.
g
: the number of bins used.
alpha
and ci
: alpha
is the estimated attachment exponent α (when assume A_k = k^α), while ci
is the confidence interval.
loglinear_fit
: this is the fitting result when we estimate α.
f
: the estimated node fitnesses.
var_f
: the estimated variance of η_i.
upper_f
: the estimated upper value of the interval of two standard deviations around η_i.
lower_f
: the estimated lower value of the interval of two standard deviations around η_i.
objective_value
: values of the objective function over iterations in the final run with the full data.
diverge_zero
: logical value indicates whether the algorithm diverged in the final run with the full data.
contribution
: a list containing an estimate of the contributions of preferential attachment and fitness mechanisms in the growth process of the network. The calculation adapts a quantification method proposed in Section 3 of Ref. 4, which is for preferential attachment and transitivity, to preferential attachment and fitness.
PA_contribution
: an array containing the contributions of preferential attachment at each time-step
fit_contribution
: an array containing the contributions of the fitness mechanism at each time-step
mean_PA_contrib
: the average contribution of preferential attachment through the whole growth process
mean_fit_contrib
: the average contribution of the fitness mechanism through the whole growth process
Thong Pham thongphamthe@gmail.com
1. Pham, T., Sheridan, P. & Shimodaira, H. (2015). PAFit: A Statistical Method for Measuring Preferential Attachment in Temporal Complex Networks. PLoS ONE 10(9): e0137796. (doi: 10.1371/journal.pone.0137796).
2. Pham, T., Sheridan, P. & Shimodaira, H. (2016). Joint Estimation of Preferential Attachment and Node Fitness in Growing Complex Networks. Scientific Reports 6, Article number: 32558. (doi: 10.1038/srep32558).
3. Pham, T., Sheridan, P. & Shimodaira, H. (2020). PAFit: An R Package for the Non-Parametric Estimation of Preferential Attachment and Node Fitness in Temporal Complex Networks. Journal of Statistical Software 92 (3). (doi: 10.18637/jss.v092.i03).
4. Inoue, M., Pham, T. & Shimodaira, H. (2020). Joint Estimation of Non-parametric Transitivity and Preferential Attachment Functions in Scientific Co-authorship Networks. Journal of Informetrics 14(3). (doi: 10.1016/j.joi.2020.101042).
See get_statistics
for how to create summarized statistics needed in this function.
See Jeong
, Newman
and only_A_estimate
for functions to estimate the attachment function in isolation.
See only_F_estimate
for a function to estimate node fitnesses in isolation.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 | ## Not run:
library("PAFit")
#### Example 1: a linear preferential attachment kernel, i.e., A_k = k ############
set.seed(1)
# size of initial network = 100
# number of new nodes at each time-step = 100
# Ak = k; inverse variance of the distribution of node fitnesse = 5
net <- generate_BB(N = 1000 , m = 50 ,
num_seed = 100 , multiple_node = 100,
s = 5)
net_stats <- get_statistics(net)
# Joint estimation of attachment function Ak and node fitness
result <- joint_estimate(net, net_stats)
summary(result)
# plot the estimated attachment function
true_A <- pmax(result$estimate_result$center_k,1) # true function
plot(result , net_stats, max_A = max(true_A,result$estimate_result$theta))
lines(result$estimate_result$center_k, true_A, col = "red") # true line
legend("topleft" , legend = "True function" , col = "red" , lty = 1 , bty = "n")
# plot the estimated node fitnesses and true node fitnesses
plot(result, net_stats, true = net$fitness, plot = "true_f")
#############################################################################
#### Example 2: a non-log-linear preferential attachment kernel ############
set.seed(1)
# size of initial network = 100
# number of new nodes at each time-step = 100
# A_k = alpha* log (max(k,1))^beta + 1, with alpha = 2, and beta = 2
# inverse variance of the distribution of node fitnesse = 10
net <- generate_net(N = 1000 , m = 50 ,
num_seed = 100 , multiple_node = 100,
s = 10 , mode = 3, alpha = 2, beta = 2)
net_stats <- get_statistics(net)
# Joint estimation of attachment function Ak and node fitness
result <- joint_estimate(net, net_stats)
summary(result)
# plot the estimated attachment function
true_A <- 2 * log(pmax(result$estimate_result$center_k,1))^2 + 1 # true function
plot(result , net_stats, max_A = max(true_A,result$estimate_result$theta))
lines(result$estimate_result$center_k, true_A, col = "red") # true line
legend("topleft" , legend = "True function" , col = "red" , lty = 1 , bty = "n")
# plot the estimated node fitnesses and true node fitnesses
plot(result, net_stats, true = net$fitness, plot = "true_f")
#############################################################################
#### Example 3: another non-log-linear preferential attachment kernel ############
set.seed(1)
# size of initial network = 100
# number of new nodes at each time-step = 100
# A_k = min(max(k,1),sat_at)^alpha, with alpha = 1, and sat_at = 100
# inverse variance of the distribution of node fitnesse = 10
net <- generate_net(N = 1000 , m = 50 ,
num_seed = 100 , multiple_node = 100,
s = 10 , mode = 2, alpha = 1, sat_at = 100)
net_stats <- get_statistics(net)
# Joint estimation of attachment function Ak and node fitness
result <- joint_estimate(net, net_stats)
summary(result)
# plot the estimated attachment function
true_A <- pmin(pmax(result$estimate_result$center_k,1),100)^1 # true function
plot(result , net_stats, max_A = max(true_A,result$estimate_result$theta))
lines(result$estimate_result$center_k, true_A, col = "red") # true line
legend("topleft" , legend = "True function" , col = "red" , lty = 1 , bty = "n")
# plot the estimated node fitnesses and true node fitnesses
plot(result, net_stats, true = net$fitness, plot = "true_f")
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.