Statistics for goodnessoffit assessment of network models
Description
Statistics for goodnessoffit assessment of network models.
Usage
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73  b1deg(mat)
b1star(mat)
b2deg(mat)
b2star(mat)
comemb(vec)
deg(mat)
dsp(mat)
edgebetweenness.modularity(mat)
edgebetweenness.pr(sim, obs)
edgebetweenness.roc(sim, obs)
esp(mat)
fastgreedy.modularity(mat)
fastgreedy.pr(sim, obs)
fastgreedy.roc(sim, obs)
geodesic(mat)
ideg(mat)
istar(mat)
kcycle(mat)
kstar(mat)
maxmod.modularity(mat)
maxmod.pr(sim, obs)
maxmod.roc(sim, obs)
nsp(mat)
odeg(mat)
ostar(mat)
pr(sim, obs)
roc(sim, obs)
rocpr(sim, obs)
rocprgof(sim, obs, pr.impute = "poly4")
spinglass.modularity(mat)
spinglass.pr(sim, obs)
spinglass.roc(sim, obs)
triad.directed(mat)
triad.undirected(mat)
walktrap.modularity(mat)
walktrap.pr(sim, obs)
walktrap.roc(sim, obs)

Arguments
vec 
A vector of community memberships in order to create a community comembership matrix. 
mat 
A sparse network matrix as created by the 
sim 
A list of simulated networks. Each element in the list should be a sparse matrix as created by the 
obs 
A list of observed (= target) networks. Each element in the list should be a sparse matrix as created by the 
pr.impute 
In some cases, the first precision value of the precisionrecall curve is undefined. The 
Details
These functions can be plugged into the statistics
argument of the gof
methods in order to compare observed with simulated networks (see the gofmethods help page). There are three types of statistics:
(1) Univariate statistics, which aggregate a network into a single quantity. For example, modularity measures or density. The distribution of statistics can be displayed using histograms, density plots, and median bars. Univariate statistics take a sparse matrix (mat
) as an argument and return a single numeric value that summarize a network matrix.
(2) Multivariate statistics, which aggregate a network into a vector of quantities. For example, the distribution of geodesic distances, edgewise shared partners, or indegree. These statistics typically have multiple values, e.g., esp(1), esp(2), esp(3) etc. The results can be displayed using multiple boxplots for simulated networks and a black curve for the observed network(s). Multivariate statistics take a sparse matrix (mat
) as an argument and return a vector of numeric values that summarize a network matrix.
(3) Tie prediction statistics, which predict dyad states the observed network(s) by the dyad states in the simulated networks. For example, receiver operating characteristics (ROC) or precisionrecall curves (PR) of simulated networks based on the model, or ROC or PR predictions of community comembership matrices of the simulated vs. the observed network(s). Tie prediction statistics take a list of simulated sparse network matrices and another list of observed sparse network matrices (possibly containing only a single sparse matrix) as arguments and return a rocpr
, roc
, or pr
object (as created by the respective functions rocpr, rocprgof, roc, and pr).
Users can create their own statistics for use with the codegof methods. To do so, one needs to write a function that accepts and returns the respective objects described in the enumeration above. It is advisable to look at the definitions of some of the existing functions to add custom functions. It is also possible to add an attribute called label
to the return object, which describes what is being returned by the function. This label will be used as a descriptive label in the plot and for verbose output during computations. The examples section contains an example of a custom user statistic.
To aid the development of custom statistics, several helper functions are available: The roc
, pr
, and rocpr
functions accept lists of simulated and observed sparse network matrices and compute ROC and precision recall curves as well as the area under the curve that can be used as network statistics. These functions are used internally for a number of functions related to community structure, where the community structure in the simulated networks is compared to the community structure in the observed network(s) by means of tie prediction. The rocprgof
function provides the same functionality as the rocpr
function, but it has an additional argument for controlling imputation of the first PR value. Another helper function is comemb
, which accepts a vector of community memberships and converts it to a comembership matrix. This function is also used internally by statistics like walktrap.roc
and others.
Network statistics
The following builtin functions can be handed over to the statistics
argument. See the usage section for their respective arguments.
(1) Univariate statistics:
walktrap.modularity(mat)

Modularity distribution as computed by the Walktrap algorithm.
fastgreedy.modularity(mat)

Modularity distribution as computed by the fast and greedy algorithm. Only sensible with undirected networks.
maxmod.modularity(mat)

Optimal modularity distribution.
edgebetweenness.modularity(mat)

Modularity distribution as computed by the GirvanNewman edge betweenness community detection method.
spinglass.modularity(mat)

Modularity distribution as computed by the Spinglass algorithm.
(2) Multivariate statistics:
dsp

Dyadwise shared partner distribution.
esp(mat)

Edgewise shared partner distribution.
nsp(mat)

Nonedgewise shared partner distribution.
deg(mat)

Degree distribution (for undirected networks).
ideg(mat)

Indegree distribution (for directed networks).
odeg(mat)

Outdegree distribution (for directed networks).
b1deg(mat)

Degree distribution (for the first mode in a twomode network).
b2deg(mat)

Degree distribution (for the second mode in a twomode network).
kstar(mat)

kstar distribution (for undirected networks).
istar(mat)

instar distribution (for directed networks).
ostar(mat)

outstar distribution (for directed networks).
b1star(mat)

kstar distribution (for the first mode in a twomode network).
b2star(mat)

kstar distribution (for the second mode in a twomode network).
kcycle(mat)

kcycle distribution (for undirected networks).
geodesic(mat)

Geodesic distance (or shortest path) distribution.
triad.directed(mat)

Triad census (directed networks).
triad.undirected(mat)

Triad census (undirected networks).
(3) Tie prediction statistics:
walktrap.roc(sim, obs)

Receiveroperating characteristics of predicting the community structure in the observed network(s) by the community structure in the simulated networks, as computed by the Walktrap algorithm.
walktrap.pr(sim, obs)

Precisionrecall curve for predicting the community structure in the observed network(s) by the community structure in the simulated networks, as computed by the Walktrap algorithm.
fastgreedy.roc(sim, obs)

Receiveroperating characteristics of predicting the community structure in the observed network(s) by the community structure in the simulated networks, as computed by the fast and greedy algorithm. Only sensible with undirected networks.
fastgreedy.pr(sim, obs)

Precisionrecall curve for predicting the community structure in the observed network(s) by the community structure in the simulated networks, as computed by the fast and greedy algorithm. Only sensible with undirected networks.
maxmod.roc(sim, obs)

Receiveroperating characteristics of predicting the community structure in the observed network(s) by the community structure in the simulated networks, as computed by the modularity maximization algorithm.
maxmod.pr(sim, obs)

Precisionrecall curve for predicting the community structure in the observed network(s) by the community structure in the simulated networks, as computed by the modularity maximization algorithm.
edgebetweenness.roc(sim, obs)

Receiveroperating characteristics of predicting the community structure in the observed network(s) by the community structure in the simulated networks, as computed by the GirvanNewman edge betweenness community detection method.
edgebetweenness.pr(sim, obs)

Precisionrecall curve for predicting the community structure in the observed network(s) by the community structure in the simulated networks, as computed by the GirvanNewman edge betweenness community detection method.
spinglass.roc(sim, obs)

Receiveroperating characteristics of predicting the community structure in the observed network(s) by the community structure in the simulated networks, as computed by the Spinglass algorithm.
spinglass.pr(sim, obs)

Precisionrecall curve for predicting the community structure in the observed network(s) by the community structure in the simulated networks, as computed by the Spinglass algorithm.
roc(sim, obs)

Receiveroperating characteristics. Prediction of the dyad states of the observed network(s) by the dyad states of the simulated networks.
pr(sim, obs)

Precisionrecall curve. Prediction of the dyad states of the observed network(s) by the dyad states of the simulated networks.
rocpr(sim, obs)

Both receiveroperating characteristics and precisionrecall curve. Prediction of the dyad states of the observed network(s) by the dyad states of the simulated networks.
Author(s)
Philip Leifeld (http://www.philipleifeld.com)
See Also
btergmpackage gof gofmethods
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31  # To see how these statistics are used, look at the examples section of
# ?"gofmethods". The following example illustrates how custom
# statistics can be created. Suppose one is interested in the density
# of a network. Then a univariate statistic can be created as follows.
dens < function(mat) { # univariate: one argument
mat < as.matrix(mat) # sparse matrix > normal matrix
d < sna::gden(mat) # compute the actual statistic
attributes(d)$label < "Density" # add a descriptive label
return(d) # return the statistic
}
# Now the statistic can be used in the statistics argument of one of
# the gof methods.
# For illustrative purposes, let us consider an existing statistic, the
# indegree distribution, a multivariate statistic. It also accepts a
# single argument. Note that the sparse matrix is converted to a
# normal matrix object when it is used. First, statnet's summary
# method is used to compute the statistic. Names are attached to the
# resulting vector for the different indegree values. Then the vector
# is returned.
ideg < function(mat) {
d < summary(mat ~ idegree(0:(nrow(mat)  1)))
names(d) < 0:(length(d)  1)
attributes(d)$label < "Indegree"
return(d)
}
# See the gofstatistics.R file in the package for more complex examples.
