gof-statistics: Statistics for goodness-of-fit assessment of network models

gof-statisticsR Documentation

Statistics for goodness-of-fit assessment of network models

Description

Statistics for goodness-of-fit assessment of network models.

Usage

dsp(mat, ...)

esp(mat, ...)

nsp(mat, ...)

deg(mat, ...)

b1deg(mat, ...)

b2deg(mat, ...)

odeg(mat, ...)

ideg(mat, ...)

kstar(mat, ...)

b1star(mat, ...)

b2star(mat, ...)

ostar(mat, ...)

istar(mat, ...)

kcycle(mat, ...)

geodesic(mat, ...)

triad.directed(mat, ...)

triad.undirected(mat, ...)

comemb(vec)

walktrap.modularity(mat, ...)

walktrap.roc(sim, obs, ...)

walktrap.pr(sim, obs, ...)

fastgreedy.modularity(mat, ...)

fastgreedy.roc(sim, obs, ...)

fastgreedy.pr(sim, obs, ...)

louvain.modularity(mat, ...)

louvain.roc(sim, obs, ...)

louvain.pr(sim, obs, ...)

maxmod.modularity(mat, ...)

maxmod.roc(sim, obs, ...)

maxmod.pr(sim, obs, ...)

edgebetweenness.modularity(mat, ...)

edgebetweenness.roc(sim, obs, ...)

edgebetweenness.pr(sim, obs, ...)

spinglass.modularity(mat, ...)

spinglass.roc(sim, obs, ...)

spinglass.pr(sim, obs, ...)

rocpr(sim, obs, roc = TRUE, pr = TRUE, joint = TRUE, pr.impute = "poly4", ...)

Arguments

mat

A sparse network matrix as created by the Matrix function in the Matrix package.

...

Additional arguments. This must be present in all auxiliary GOF statistics.

vec

A vector of community memberships in order to create a community co-membership matrix.

sim

A list of simulated networks. Each element in the list should be a sparse matrix as created by the Matrix function in the Matrix package.

obs

A list of observed (= target) networks. Each element in the list should be a sparse matrix as created by the Matrix function in the Matrix package.

roc

Compute receiver-operating characteristics (ROC)?

pr

Compute precision-recall curve (PR)?

joint

Merge all time steps into a single big prediction task and compute predictive fit (instead of computing GOF for all time steps separately)?

pr.impute

In some cases, the first precision value of the precision-recall curve is undefined. The pr.impute argument serves to impute this missing value to ensure that the AUC-PR value is not severely biased. Possible values are "no" for no imputation, "one" for using a value of 1.0, "second" for using the next (= adjacent) precision value, "poly1" for fitting a straight line through the remaining curve to predict the first value, "poly2" for fitting a second-order polynomial curve etc. until "poly9". Warning: this is a pragmatic solution. Please double-check whether the imputation makes sense. This can be checked by plotting the resulting object and using the pr.poly argument to plot the predicted curve on top of the actual PR curve.

Details

These functions can be plugged into the statistics argument of the gof methods in order to compare observed with simulated networks (see the gof-methods help page). There are three types of statistics:

  1. Univariate statistics, which aggregate a network into a single quantity. For example, modularity measures or density. The distribution of statistics can be displayed using histograms, density plots, and median bars. Univariate statistics take a sparse matrix (mat) as an argument and return a single numeric value that summarize a network matrix.

  2. Multivariate statistics, which aggregate a network into a vector of quantities. For example, the distribution of geodesic distances, edgewise shared partners, or indegree. These statistics typically have multiple values, e.g., esp(1), esp(2), esp(3) etc. The results can be displayed using multiple boxplots for simulated networks and a black curve for the observed network(s). Multivariate statistics take a sparse matrix (mat) as an argument and return a vector of numeric values that summarize a network matrix.

  3. Tie prediction statistics, which predict dyad states the observed network(s) by the dyad states in the simulated networks. For example, receiver operating characteristics (ROC) or precision-recall curves (PR) of simulated networks based on the model, or ROC or PR predictions of community co-membership matrices of the simulated vs. the observed network(s). Tie prediction statistics take a list of simulated sparse network matrices and another list of observed sparse network matrices (possibly containing only a single sparse matrix) as arguments and return a rocpr, roc, or pr object (as created by the rocpr function).

Users can create their own statistics for use with the gof methods. To do so, one needs to write a function that accepts and returns the respective objects described in the enumeration above. It is advisable to look at the definitions of some of the existing functions to add custom functions. It is also possible to add an attribute called label to the return object, which describes what is being returned by the function. This label will be used as a descriptive label in the plot and for verbose output during computations. The examples section contains an example of a custom user statistic. Note that all statistics must contain the ... argument to ensure that custom arguments of other statistics do not cause an error.

To aid the development of custom statistics, the helper function comemb is available: it accepts a vector of community memberships and converts it to a co-membership matrix. This function is also used internally by statistics like walktrap.roc and others.

Functions

  • dsp(): Multivariate GOF statistic: dyad-wise shared partner distribution

  • esp(): Multivariate GOF statistic: edge-wise shared partner distribution

  • nsp(): Multivariate GOF statistic: non-edge-wise shared partner distribution

  • deg(): Multivariate GOF statistic: degree distribution

  • b1deg(): Multivariate GOF statistic: degree distribution for the first mode

  • b2deg(): Multivariate GOF statistic: degree distribution for the second mode

  • odeg(): Multivariate GOF statistic: outdegree distribution

  • ideg(): Multivariate GOF statistic: indegree distribution

  • kstar(): Multivariate GOF statistic: k-star distribution

  • b1star(): Multivariate GOF statistic: k-star distribution for the first mode

  • b2star(): Multivariate GOF statistic: k-star distribution for the second mode

  • ostar(): Multivariate GOF statistic: outgoing k-star distribution

  • istar(): Multivariate GOF statistic: incoming k-star distribution

  • kcycle(): Multivariate GOF statistic: k-cycle distribution

  • geodesic(): Multivariate GOF statistic: geodesic distance distribution

  • triad.directed(): Multivariate GOF statistic: triad census in directed networks

  • triad.undirected(): Multivariate GOF statistic: triad census in undirected networks

  • comemb(): Helper function: create community co-membership matrix

  • walktrap.modularity(): Univariate GOF statistic: Walktrap modularity distribution

  • walktrap.roc(): Tie prediction GOF statistic: ROC of Walktrap community detection. Receiver-operating characteristics of predicting the community structure in the observed network(s) by the community structure in the simulated networks, as computed by the Walktrap algorithm.

  • walktrap.pr(): Tie prediction GOF statistic: PR of Walktrap community detection. Precision-recall curve for predicting the community structure in the observed network(s) by the community structure in the simulated networks, as computed by the Walktrap algorithm.

  • fastgreedy.modularity(): Univariate GOF statistic: fast and greedy modularity distribution

  • fastgreedy.roc(): Tie prediction GOF statistic: ROC of fast and greedy community detection. Receiver-operating characteristics of predicting the community structure in the observed network(s) by the community structure in the simulated networks, as computed by the fast and greedy algorithm. Only sensible with undirected networks.

  • fastgreedy.pr(): Tie prediction GOF statistic: PR of fast and greedy community detection. Precision-recall curve for predicting the community structure in the observed network(s) by the community structure in the simulated networks, as computed by the fast and greedy algorithm. Only sensible with undirected networks.

  • louvain.modularity(): Univariate GOF statistic: Louvain clustering modularity distribution

  • louvain.roc(): Tie prediction GOF statistic: ROC of Louvain community detection. Receiver-operating characteristics of predicting the community structure in the observed network(s) by the community structure in the simulated networks, as computed by the Louvain algorithm.

  • louvain.pr(): Tie prediction GOF statistic: PR of Louvain community detection. Precision-recall curve for predicting the community structure in the observed network(s) by the community structure in the simulated networks, as computed by the Louvain algorithm.

  • maxmod.modularity(): Univariate GOF statistic: maximal modularity distribution

  • maxmod.roc(): Tie prediction GOF statistic: ROC of maximal modularity community detection. Receiver-operating characteristics of predicting the community structure in the observed network(s) by the community structure in the simulated networks, as computed by the modularity maximization algorithm.

  • maxmod.pr(): Tie prediction GOF statistic: PR of maximal modularity community detection. Precision-recall curve for predicting the community structure in the observed network(s) by the community structure in the simulated networks, as computed by the modularity maximization algorithm.

  • edgebetweenness.modularity(): Univariate GOF statistic: edge betweenness modularity distribution

  • edgebetweenness.roc(): Tie prediction GOF statistic: ROC of edge betweenness community detection. Receiver-operating characteristics of predicting the community structure in the observed network(s) by the community structure in the simulated networks, as computed by the Girvan-Newman edge betweenness community detection method.

  • edgebetweenness.pr(): Tie prediction GOF statistic: PR of edge betweenness community detection. Precision-recall curve for predicting the community structure in the observed network(s) by the community structure in the simulated networks, as computed by the Girvan-Newman edge betweenness community detection method.

  • spinglass.modularity(): Univariate GOF statistic: spinglass modularity distribution

  • spinglass.roc(): Tie prediction GOF statistic: ROC of spinglass community detection. Receiver-operating characteristics of predicting the community structure in the observed network(s) by the community structure in the simulated networks, as computed by the Spinglass algorithm.

  • spinglass.pr(): Tie prediction GOF statistic: PR of spinglass community detection. Precision-recall curve for predicting the community structure in the observed network(s) by the community structure in the simulated networks, as computed by the Spinglass algorithm.

  • rocpr(): Tie prediction GOF statistic: ROC and PR curves. Receiver-operating characteristics (ROC) and precision-recall curve (PR). Prediction of the dyad states of the observed network(s) by the dyad states of the simulated networks.

References

Leifeld, Philip, Skyler J. Cranmer and Bruce A. Desmarais (2018): Temporal Exponential Random Graph Models with btergm: Estimation and Bootstrap Confidence Intervals. Journal of Statistical Software 83(6): 1–36. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.18637/jss.v083.i06")}.

Examples

# To see how these statistics are used, look at the examples section of 
# ?"gof-methods". The following example illustrates how custom 
# statistics can be created. Suppose one is interested in the density 
# of a network. Then a univariate statistic can be created as follows.

dens <- function(mat, ...) {        # univariate: one argument
  mat <- as.matrix(mat)             # sparse matrix -> normal matrix
  d <- sna::gden(mat)               # compute the actual statistic
  attributes(d)$label <- "Density"  # add a descriptive label
  return(d)                         # return the statistic
}

# Note that the '...' argument must be present in all statistics. 
# Now the statistic can be used in the statistics argument of one of 
# the gof methods.

# For illustrative purposes, let us consider an existing statistic, the 
# indegree distribution, a multivariate statistic. It also accepts a 
# single argument. Note that the sparse matrix is converted to a 
# normal matrix object when it is used. First, statnet's summary 
# method is used to compute the statistic. Names are attached to the 
# resulting vector for the different indegree values. Then the vector 
# is returned.

ideg <- function(mat, ...) {
  d <- summary(mat ~ idegree(0:(nrow(mat) - 1)))
  names(d) <- 0:(length(d) - 1)
  attributes(d)$label <- "Indegree"
  return(d)
}

# See the gofstatistics.R file in the package for more complex examples.


btergm documentation built on May 29, 2024, 12:09 p.m.