Dynamic Network Clustering

Share:

Description

Perform dynamic network clustering. Either variational Bayes or a Gibbs sampler may be implemented. Setting M=0 performs variational Bayes with no clustering. Returns posterior parameters (if method="VB") or approximate posterior samples (if method="Gibbs"), as well as the MAP estimates, which may be extracted through dncObj$pm.

Usage

1
2
3
dnc(Y,M,p=3,method="VB",init=NULL,hyperparms=NULL,Missing=NULL,
    controls=list(MaxIt=500,epsilon=1e-5,MaxItStg2=100,
                  epsilonStg2=1e-15,nDraws=10000,burnin=1000))

Arguments

Y

Dynamic network data. This should be in the form of a n x n x T array of 1's and 0's. Each slice corresponds to a single time point.

M

Number of communities (may be zero).

p

Dimension of the latent space.

method

Method of estimation, either "VB" for variational Bayes, or "Gibbs" for a Gibbs sampler.

init

(Use of this argument is not recommended) Initial values of the parameters. A named list containing EOm, mu, Sig, Bi0g, Bitbar, Bithk, Er, Er2, ai2, bi2, nu, a3, b3, Es, Es2, and Gam.

hyperparms

Hyperparameters. A named list with cc, a0Star, b0Star, a2Star, b2Star, b3Star, GamStar.

Missing

A matrix whose rows correspond to missing dyads. Missing should have three columns: row, column, and time (i.e., the indices for the NA's in Y). May be left as NULL if the missing dyads in Y are NA's.

controls

A list of values to control the algorithm.

MaxIt

The total number of iterations for the VB algorithm. Ignored if method="Gibbs" unless M=0.

epsilon

Relative tolerance criteria for evaluating convergence.

MaxItStg2

The total number of iterations for the second stage initialization of the VB algorithm/Gibbs sampler. Ignored if M=0.

epsilonStg2

Relative tolerance criteria for evaluating convergence for the second stage initialization of the VB algorithm/Gibbs sampler. Ignored if M=0.

nDraws

Total number of post-burn-in samples to be drawn via the Gibbs sampler. Ignored if method="VB".

burnin

The number of burn-in samples. Ignored if method="VB".

Details

This function performs community detection according to the model

logit(P(Y_{ijt} =1)) = α + s_j X_{it}'X_{jt}

,

π(X_{it}|Z_{it}=m) = N(r_{i}*u_{m},τ_{i}^{-1}I_p).

While the latent positions, X_{it}'s, live in a p-dim Euclidean space, it is more natural to conceptualize these as living on a (hyper-) sphere with the magnitude of the X_{it}'s as attached attributes that reflect the actors' individual tendency to send and receive edges.

If M=0, then the prior on X_{it} is given by

π(X_{i1}) = N(0,σ^2 I_p)

π(X_{it}|X_{i(t-1)}) = N(X_{i(t-1)},τ_i^{-1} I_p)

The variational Bayes approach is typically faster than the Gibbs sampler, but tends to underestimate the spread of the posterior.

Currently, only VB is implemented when M=0 (no clustering), hence method will be ignored if M=0.

Ignorable missing data can be estimated within the Gibbs sampler (not using the VB algorithm) by adding the extra step of drawing the missing edges given the latent positions and the model parameters at each iteration.

Using the init is, in general, strongly discouraged, as this may have a non-negligible negative affect on the performance of the VB or the length of the chain needed to reach convergence. Unless otherwise specified, both the initialization scheme and the hyperparameters are chosen according to Sewell and Chen (2016).

Value

An object of class dnc, for which other methods exist (e.g., methods(class="dnc")).

If method="VB" and M=0,

method

The estimation algorithm

Y

The original data

mu

A p x T x n array: Posterior mean of the latent positions

Sig

A (Tp) x p x n array: Posterior covariance matrices of the latent positions. The covariance matrix for X_{it} is dncObj$Sig[(t-1)*p,,i]

a0

Scalar: Posterior shape parameter for σ^2 in inverse gamma distribution (if M=0).

b0

Scalar: Posterior scale parameter for σ^2 in inverse gamma distribution (if M=0).

ai1

A n x 1 vector: Posterior mean parameter for the r_i's in truncated normal distribution (if M>0).

bi1

A n x 1 vector: Posterior variance parameter for the r_i's in truncated normal distribution (if M>0).

Er

A n x 1 vector: Posterior first moment for the r_i's (if M>0).

Er2

A n x 1 vector: Posterior second moment for the r_i's (if M>0).

ai2

A n x 1 vector: Posterior shape parameter for the τ_i's in gamma distribution.

bi2

A n x 1 vector: Posterior scale parameter for the τ_i's in gamma distribution.

a3

Scalar: Posterior mean for α.

b3

Scalar: Posterior variance for α.

ai4

A n x 1 vector: Posterior mean parameter for the s_j's in truncated normal distribution.

bi4

A n x 1 vector: Posterior variance parameter for the s_j's in truncated normal distribution.

Es

A n x 1 vector: Posterior first moment for the s_j's.

Es2

A n x 1 vector: Posterior second moment for the s_j's.

nu

A M x p matrix: Posterior mean directions for the M clusters/communities, i.e., for the u_m's (if M>0).

kappa

A M x 1 vector: Posterior concentration parameters for the M clusters/communities, i.e., for the u_m's (if M>0).

Z

A n x T matrix: Cluster assignments based on the maximum posterior probabilities, computed marginally at each time point (if M>0).

Bi0g

A n x M matrix: Posterior probabilities of community assignment for each actor at the first observed time point (if M>0).

Bithk

A (MT) x M x n array: Posterior transition probability matrices; π(Z_{itk}=1|Z_{i(t-1)h}=1,Y)= dncObj$Bithk[(t-1)*M+h,k,i]. Ignore first M lines (internal use only). (if M>0).

Bitbar

A T x M x n array: Marginal posterior probabilities of community assignments, i.e., π(Z_{itk}=1|Y)= dncObj$Bitbar[t,k,i] (if M>0).

Gam

A (M+1) x M matrix: Posterior concentration parameters for β_0 (row 1) and for β_m, m>1 (rows 2 to M+1) in Dirichlet distribution (if M>0).

If method="Gibbs",

method

The estimation algorithm

Y

The original data

X

A p x T x n x nDraws array: Posterior samples for the latent positions.

r

A n x nDraws matrix: Posterior samples for the r_i's.

tau

A n x nDraws matrix: Posterior samples for the τ_i's.

alpha

A nDraws x 1 vector: Posterior samples for α.

s

A n x nDraws matrix: Posterior samples for the τ_i's.

u

A M x p x nDraws array: Posterior draws for the communities, i.e., the u_m's.

Z

A n x T x nDraws array: Posterior draws for the community assignments for each actor at each time point.

beta

A (M+1) x M x n array: Posterior draws for beta_0 (row 1) and β_m, m>1 (rows 2 to M+1).

posterior

A (burnin+nDraws) x 1 vector: Posterior values for all iterations of the Gibbs sampler.

Missing

A matrix of four columns: The row, column, and time for each missing dyad, as well as the posterior probability that the dyad equals one.

Additionally, each dnc class object comes with a $pm value, which is a list of the MAP estimates for alpha, X, s, tau, r, u, Z, and beta.

References

Sewell, D. K., and Chen, Y. (2016). Latent Space Approaches to Community Detection in Dynamic Networks. Bayesian Analysis. doi: 10.1214/16-BA1000. http://projecteuclid.org/euclid.ba/1461603847

Examples

1
2
3
4
5
6
7
8
9
  data(friendship)
  set.seed(123)
  dncObj <- dnc(friendship,M=4,p=3,method="Gibbs",
                controls=list(nDraws=250,burnin=50,
                              MaxItStg2=25,epsilonStg2=1e-15))
  print(dncObj)
  BIC(dncObj)
  par(mar=rep(0,4)+0.05)
  plot(dncObj,plotRGL=FALSE,pch=16,phi=60,lwd=2,cex=1.5)