get_statistics: Getting summarized statistics from input data

View source: R/get_statistics.R

get_statisticsR Documentation

Getting summarized statistics from input data

Description

The function summarizes input data into sufficient statistics for estimating the attachment function and node fitness, together with additional information about the data, such as total number of nodes, number of time-steps, maximum degree, and the final degree of the network, etc. . It also provides mechanisms to automatically deal with very large datasets by binning the degree, setting a degree threshold, or grouping time-steps.

Usage

get_statistics(net_object, only_PA  = FALSE , 
               only_true_deg_matrix = FALSE ,
               binning              = TRUE  , g              = 50    , 
               deg_threshold        = 0     , 
               compress_mode        = 0     , compress_ratio = 0.5   , 
               custom_time          = NULL)

Arguments

The parameters can be divided into four groups. The first group specifies input data and how the data will be summarized:

net_object

An object of class PAFit_net. You can use the function as.PAFit_net to convert from an edgelist matrix, function from_igraph to convert from an igraph object, function from_networkDynamic to convert from a networkDynamic object, and function graph_from_file to read from a file.

only_PA

Logical. Indicates whether only the statistics for estimating A_k are summarized. if TRUE, the statistics for estimating \eta_i are NOT collected. This will save memory at the cost of unable to estimate node fitness). Default value is FALSE.

only_true_deg_matrix

Logical. Return only the true degree matrix (without binning), and no other statistics is returned. The result cannot be used in PAFit function to estimate PA or fitness. The motivation for this option is that sometimes we only want to get a degree matrix that summarizes the growth process of a very big network for plotting etc. Default value is FALSE.

Second group of parameters specifies how to bin the degrees:

binning

Logical. Indicates whether the degree should be binned together. Default value is TRUE.

g

Positive integer. Number of bins. Should be at least 3. Default value is 50.

Third group contains a single parameter specifying how to reduce the number of node fitnesses:

deg_threshold

Integer. We only estimate the fitnesses of nodes whose number of new edges acquired is at least deg_threshold. The fitnesses of all other nodes are fixed at 1. Default value is 0.

Last group of parameters specifies how to group the time-stamps:

compress_mode

Integer. Indicates whether the timeline should be compressed. The value of CompressMode:

0: No compression

1: Compressed by using a subset of time-steps. The time stamps in this subset are equally spaced. The size of this subset is CompressRatio times the size of the set of all time stamps.

2: Compressed by only starting from the first time-step when CompressRatio*100 percentages of the total number of edges (in the final state of the network) had already been added to the network.

3: This mode offers the most flexibility, but requires user to supply the time stamps in CustomTime. Only time stamps in this CustomTime will be used. This mode can be used, for example, when investigating the change of the attachment function or node fitness in different time intervals.

Default value is 0, i.e. no compression.

compress_ratio

Numeric. Indicates how much we should compress if CompressMode is 1 or 2. Default value is 0.5.

custom_time

Vector. Custom time stamps. This vector is a subset of the vector that contains all time-stamps. Only effective if CompressMode == 3. In that case, only these time stamps are used.

Value

An object of class PAFit_data, which is a list. Some important fields are:

offset_tk

A matrix where the (t,k+1) element is the number of nodes with degree k at time t, counting among all the nodes whose number of new edges acquired is less than deg_thresh

n_tk

A matrix where the (t,k+1) element is the number of nodes with degree k at time t

m_tk

A matrix where the (t,k+1) element is the number of new edges connect to a degree-k node at time t

sum_m_k

A vector where the (k+1)-th element is the total number of edges that linked to a degree k node, counting over all time steps

node_degree

A matrix recording the degree of all nodes (that satisfy degree_threshold condition) at each time step

m_t

A vector where the t-th element is the number of new edges at time t

z_j

A vector where the j-th element is the total number of edges that linked to node j

N

Numeric. The number of nodes in the network

T

Numeric. The number of time steps

deg_max

Numeric. The maximum degree in the final network

node_id

A vector contains the id of all nodes

final_deg

A vector contains the final degree of all nodes (including those that do not satisfy the degree_threshold condition)

deg_thresh

Integer. The specified degree threshold.

f_position

Numeric vector. The index in the node_id vector of the nodes we want to estimate (i.e. nodes whose number of new edges acquired is at least deg_thresh)

start_deg

Integer. The specified degree at which we start binning.

begin_deg

Numeric vector contains the beginning degree of each bin

end_deg

Numeric vector contains the ending degree of each bin

interval_length

Numeric vector contains the length of each bin.

binning

Logical. Indicates whether binning was applied or not.

g

Integer. Number of bins

time_compress_mode

Integer. The mode of time compression.

t_compressed

Integer. The number of time stamps actually used

compressed_unique_time

The time stamps that are actually used

compress_ratio

Numeric.

custom_time

Vector. The time stamps specified by user.

Author(s)

Thong Pham thongphamthe@gmail.com

See Also

For creating the needed input for this function (a PAFit_net object), see as.PAFit_net, from_igraph, from_networkDynamic, and graph_from_file.

For the next step, see Newman, Jeong or only_A_estimate for estimating the attachment function in isolation, only_F_estimate for estimating node fitnesses in isolation, and joint_estimate for joint estimation of the attachment function and node fitnesses.

Examples

library("PAFit")
net        <- generate_BA(N = 100 , m = 1)
net_stats  <- get_statistics(net)
summary(net_stats)

PAFit documentation built on June 22, 2024, 11:06 a.m.