sienaGOF-auxiliary: Auxiliary functions for goodness of fit assessment by...

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

The functions given here are auxiliary to function sienaGOF which assesses goodness of fit for actor-oriented models.

The auxiliary functions are, first, some functions of networks or behavior (i.e., statistics) for which the simulated values for the fitted model are compared to the observed value; second, some extraction functions to extract the observed and simulated networks and/or behavior from the sienaFit object produced by siena07 with returnDeps=TRUE.

These functions are exported here mainly to enable users to write their own versions. At the end of this help page some non-exported functions are listed. These are not exported because they depend on packages that are not in the R base distribution; and to show templates for readers wishing to contruct their own functions.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
OutdegreeDistribution(i, obsData, sims, period, groupName, varName,
         levls=0:8, cumulative=TRUE)

IndegreeDistribution(i, obsData, sims, period, groupName, varName,
         levls=0:8, cumulative=TRUE)

BehaviorDistribution(i, obsData, sims, period, groupName, varName,
         levls=NULL, cumulative=TRUE)

TriadCensus(i, obsData, sims, period, groupName, varName, levls=1:16)

mixedTriadCensus(i, obsData, sims, period, groupName, varName)

dyadicCov(i, obsData, sims, period, groupName, varName, dc)

sparseMatrixExtraction(i, obsData, sims, period, groupName, varName)

networkExtraction(i, obsData, sims, period, groupName, varName)

behaviorExtraction(i, obsData, sims, period, groupName, varName)

Arguments

i

Index number of simulation to be extracted, ranging from 1 to length(sims); if NULL, the data observation will be extracted.

obsData

The observed data set to which the model was fitted; normally this is x$f where x is the sienaFit object for which the fit is being assessed.

sims

The simulated data sets to be compared with the observed data; normally this is x$sims where x is the sienaFit object for which the fit is being assessed.

period

Period for which data and simulations are used (may run from 1 to number of waves - 1).

groupName

Name of group; relevant for multi-group data sets; defaults in sienaGOF to "Data1".

varName

Name of dependent variable.

levls

Levels used as values of the auxiliary statistic. For BehaviorDistribution, this defaults to the observed range of values.

cumulative

Are the distributions to be considered as raw or cumulative (<=) distributions?

dc

Dyadic covariate: matrix with the same dimensions as adjacency matrix for dependent network variable.

Details

The statistics should be chosen to represent features of the network that are not explicitly fit by the estimation procedure but can be considered important properties that the model at hand should represent well. The three given here are far from a complete set; they will be supplemented in due time by statistics depending on networks and behavior jointly. The examples below give a number of other statistics, using the packages sna and igraph.

The levls parameter must be adapted to the range of values that is considered important. For indegrees and outdegrees, the whole range should usually be covered. If the range is large, which could be the case, e.g., for indegrees of two-mode networks where the second mode has few nodes, think about the possibility of making a selection such as levls=5*(0:20) or levls=c(0:4,5*(1:20)); which in most cases will make sense only if cumulative=TRUE.

The method signature for the auxiliary statistics generally is
function(i, obsData, sims, period, groupName, varName, ...). For constructing new auxiliary statistics, it is helpful to study the code of OutdegreeDistribution, IndegreeDistribution, and BehaviorDistribution and of the example functions below.

TriadCensus returns the distribution of the Holland-Leinhardt triad census according to the algorithm by Batagelj and Mrvar (implementation by Parimalarangan Slota, and Madduri). An alternative is the TriadCensus.sna function mentioned below, from package sna, which gives the same results. Here the levls parameter can be used to exclude some triads, e.g., for non-directed networks.
The Batagelj-Mrvar algorithm is optimized for sparse, large graphs and may be much faster than the procedure implemented in sna. For dense graphs the sna procedure may be faster.

dyadicCov assumes that dc is a categorical dyadic variable, and returns the frequencies of the non-zero values for realized ties. Since zero values of dc are not counted, it may be advisable to code dc so that all non-diagonal values are non-zero, and all diagonal values are zero.

Value

OutdegreeDistribution returns a named vector, the distribution of the observed or simulated outdegrees for the values in levls.

IndegreeDistribution returns a named vector, the distribution of the observed or simulated indegrees for the values in levls.

BehaviorDistribution returns a named vector, the distribution of the observed or simulated behavioral variable for the values in levls.

TriadCensus returns a named vector, the distribution of the Holland-Leinhardt triad census according to the algorithm by Batagelj and Mrvar.

mixedTriadCensus returns a named vector, the distribution of the mixed triad census of Hollway, Lomi, Pallotti,and Stadtfeld (2017).

dyadicCov returns a named vector, the frequencies of the non-missing non-zero values dc(ego,alter) of the observed or simulated (ego,alter) ties.

sparseMatrixExtraction returns the simulated network as a dgCMatrix; this is the "standard" class for sparse numeric matrices in the Matrix package. See the help file for dgCMatrix-class.
Tie variables for ordered pairs with a missing value for wave=period or period+1 are zeroed; note that this also is done in RSiena for calculation of target statistics. Tie variables that are structurally determined at the beginning of a period are used to replace observed values at the end of the period; tie variables that are structurally determined at the end, but not the beginning, of a period are used to replace simulated values at the end of the period.
To treat the objects returned by this function as regular matrices, it is necessary to attach the Matrix package in your session.

networkExtraction returns the network as an edge list of class "network" according to the network package (used for package sna). Missing values and structural values are treated as in sparseMatrixExtraction, see above.

behaviorExtraction returns the dependent behavior variable as an integer vector. Values for actors with a missing value for wave=period or period+1 are transformed to NA.

Author(s)

Josh Lospinoso, Tom Snijders

References

See Also

siena07, sienaGOF

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
### For use out of the box:

mynet1 <- sienaDependent(array(c(s501, s502), dim=c(50, 50, 2)))
mybeh <- sienaDependent(s50a[,1:2], type="behavior")
mycov <- c(rep(1:3,16),1,2) # artificial, just for trying
mydycov <- matrix(rep(1:5, 500), 50, 50) # also artificial, just for trying
mydata <- sienaDataCreate(mynet1, mybeh)
myeff <- getEffects(mydata)
myeff <- includeEffects(myeff, transTies, cycle3)
# Shorter phases 2 and 3, just for example:
myalgorithm <- sienaAlgorithmCreate(nsub=1, n3=50, seed=122)
(ans <- siena07(myalgorithm, data=mydata, effects=myeff, returnDeps=TRUE,
   batch=TRUE))

# NULL for the observations:
OutdegreeDistribution(NULL, ans$f, ans$sims, period=1, groupName="Data1",
  levls=0:7, varName="mynet1")
dyadicCov(NULL, ans$f, ans$sims, period=1, groupName="Data1",
  dc=mydycov, varName="mynet1")
# An arbitrary selection for simulation run i:
IndegreeDistribution(5, ans$f, ans$sims, period=1, groupName="Data1",
  varName="mynet1")
BehaviorDistribution(20, ans$f, ans$sims, period=1, groupName="Data1",
  varName="mybeh")
sparseMatrixExtraction(50, ans$f, ans$sims, period=1, groupName="Data1",
  varName="mynet1")
networkExtraction(40, ans$f, ans$sims, period=1, groupName="Data1",
  varName="mynet1")
behaviorExtraction(50, ans$f, ans$sims, period=1, groupName="Data1",
  varName="mybeh")

gofi <- sienaGOF(ans, IndegreeDistribution, verbose=TRUE, join=TRUE,
  varName="mynet1")
gofi
plot(gofi)

(gofo <- sienaGOF(ans, OutdegreeDistribution, verbose=TRUE, join=TRUE,
    varName="mynet1", cumulative=FALSE))
# cumulative is an example of "\dots".
plot(gofo)

(gofdc <- sienaGOF(ans, dyadicCov, verbose=TRUE, join=TRUE,
    dc=mydycov, varName="mynet1"))
plot(gofdc)

# How to use dyadicCov for ego-alter combinations of a monadic variable:
mycov.egoalter <- outer(10*mycov, mycov ,'+')
diag(mycov.egoalter) <- 0
dim(mycov.egoalter) # 50 * 50 matrix
# This is a dyadic variable indicating ego-alter combinations of mycov.
# This construction works since mycov has integer values 
# not outside the interval from 1 to 9.
# All cells of this matrix contain a two-digit number,
# left digit is row (ego) value, right digit is column (alter) value.
# See the top left part of the matrix:
mycov.egoalter[1:10,1:12]
# The number of values is the square of the number of values of mycov;
# therefore, unwise to do this for a monadic covariate with more than 5 values.
gof.mycov <- sienaGOF(ans, dyadicCov, verbose=TRUE, varName="mynet1", 
    dc=mycov.egoalter)  
plot(gof.mycov)
descriptives.sienaGOF(gof.mycov, showAll=TRUE)

(gofb <- sienaGOF(ans, BehaviorDistribution, varName = "mybeh",
    verbose=TRUE, join=TRUE, cumulative=FALSE))
plot(gofb)

(goftc <- sienaGOF(ans, TriadCensus, verbose=TRUE, join=TRUE,
    varName="mynet1"))
plot(goftc, center=TRUE, scale=TRUE)
# For this type of auxiliary statistics
# it is advisable in the plot to center and scale.
# note the keys at the x-axis (widen the plot if they are not clear).
descriptives.sienaGOF(goftc)
round(descriptives.sienaGOF(goftc, center=TRUE, scale=TRUE), 0)


### The mixed triad census for co-evolution of one-mode and two-mode networks:
actors <- sienaNodeSet(50, nodeSetName="actors")
activities <- sienaNodeSet(20, nodeSetName="activities")
onemodenet <- sienaDependent(array(c(s501, s502), dim=c(50, 50, 2)),
                            nodeSet="actors")
twomodenet <- sienaDependent(array(c(s502[1:50, 1:20], s503[1:50, 1:20]),
                                                        dim=c(50, 20, 2)),
                            type= "bipartite", nodeSet=c("actors", "activities"))
twodata <- sienaDataCreate(onemodenet, twomodenet,
                        nodeSets=list(actors, activities))
twoeff <- getEffects(twodata)
twoeff <- includeEffects(twoeff, outActIntn, name="onemodenet",
                            interaction1="twomodenet")
twoeff <- includeEffects(twoeff, outActIntn, name="twomodenet",
                            interaction1="onemodenet")
twoeff <- includeEffects(twoeff, from, name="onemodenet",
                            interaction1="twomodenet")
twoeff <- includeEffects(twoeff, to, name="twomodenet",
                            interaction1="onemodenet")
twoeff
# Shorter phases 2 and 3, just for example:
twoalgorithm <- sienaAlgorithmCreate(projname="twomode", nsub=1, n3=100,
                                     seed=5634)
(ans <- siena07(twoalgorithm, data=twodata, effects=twoeff, returnDeps=TRUE,
   batch=TRUE))
(gof.two <- sienaGOF(ans, mixedTriadCensus,
                        varName=c("onemodenet", "twomodenet"), verbose=TRUE))
plot(gof.two, center=TRUE, scale=TRUE)

## Not run: 
### Here come some useful functions for building your own auxiliary statistics:
### First an extraction function.

# igraphNetworkExtraction extracts simulated and observed networks
# from the results of a siena07 run.
# It returns the network as an edge list of class "graph"
# according to the igraph package.
# Ties for ordered pairs with a missing value for wave=period or period+1
# are zeroed;
# note that this also is done in RSiena for calculation of target statistics.
# However, changing structurally fixed values are not taken into account.
igraphNetworkExtraction <- function(i, data, sims, period, groupName, varName) {
  require(igraph)
  dimsOfDepVar<- attr(data[[groupName]]$depvars[[varName]], "netdims")
  missings <- is.na(data[[groupName]]$depvars[[varName]][,,period]) |
    is.na(data[[groupName]]$depvars[[varName]][,,period+1])
  if (is.null(i)) {
    # sienaGOF wants the observation:
    original <- data[[groupName]]$depvars[[varName]][,,period+1]
    original[missings] <- 0
    returnValue <- graph.adjacency(original)
  }
  else
  {
    missings <- graph.adjacency(missings)
    #sienaGOF wants the i-th simulation:
    returnValue <- graph.difference(
      graph.empty(dimsOfDepVar) +
        edges(t(sims[[i]][[groupName]][[varName]][[period]][,1:2])),
      missings)
  }
  returnValue
}

### Then some auxiliary statistics.

# GeodesicDistribution calculates the distribution of non-directed
# geodesic distances; see ?sna::geodist
# The default for \code{levls} reflects that geodesic distances larger than 5
# do not differ appreciably with respect to interpretation.
# Note that the levels of the result are named;
# these names are used in the \code{plot} method.
GeodesicDistribution <- function (i, data, sims, period, groupName,
  varName, levls=c(1:5,Inf), cumulative=TRUE, ...) {
  x <- networkExtraction(i, data, sims, period, groupName, varName)
  require(network)
  require(sna)
  a <- sna::geodist(symmetrize(x))$gdist
  if (cumulative)
  {
    gdi <- sapply(levls, function(i){ sum(a<=i) })
  }
  else
  {
    gdi <- sapply(levls, function(i){ sum(a==i) })
  }
  names(gdi) <- as.character(levls)
  gdi
}

# Holland and Leinhardt Triad Census from sna; see ?sna::triad.census.
# For undirected networks, call this with levls=1:4
TriadCensus.sna <- function(i, data, sims, period, groupName, varName, levls=1:16){
  unloadNamespace("igraph") # to avoid package clashes
  require(network)
  require(sna)
  x <- networkExtraction(i, data, sims, period, groupName, varName)
  if (network.edgecount(x) <= 0){x <- symmetrize(x)}
  # because else triad.census(x) will lead to an error
  tc <- sna::triad.census(x)[levls]
  # names are transferred automatically
  tc
}

# Holland and Leinhardt Triad Census from igraph; see ?igraph::triad_census.
TriadCensus.i <- function(i, data, sims, period, groupName, varName){
  unloadNamespace("sna") # to avoid package clashes
  require(igraph)
  x <- igraphNetworkExtraction(i, data, sims, period, groupName, varName)
# suppressWarnings is used because else warnings will be generated
# when a generated network happens to be symmetric.
  setNames(suppressWarnings(triad_census(x)),
            c("003", "012", "102", "021D","021U", "021C", "111D", "111U",
            "030T", "030C", "201",  "120D", "120U", "120C", "210", "300"))
}

# CliqueCensus calculates the distribution of the clique census
# of the symmetrized network; see ?sna::clique.census.
CliqueCensus<-function (i, obsData, sims, period, groupName, varName, levls = 1:5){
  require(sna)
  x <- networkExtraction(i, obsData, sims, period, groupName, varName)
  cc0 <- sna::clique.census(x, mode='graph', tabulate.by.vertex = FALSE,
    enumerate=FALSE)[[1]]
  cc <- 0*levls
  names(cc) <- as.character(levls)
  levels.used <- as.numeric(intersect(names(cc0), names(cc)))
  cc[levels.used] <- cc0[levels.used]
  cc
}

# Distribution of Bonacich eigenvalue centrality; see ?igraph::evcent.
EigenvalueDistribution <- function (i, data, sims, period, groupName, varName,
  levls=c(seq(0,1,by=0.125)), cumulative=TRUE){
  require(igraph)
  x <- igraphNetworkExtraction(i, data, sims, period, groupName, varName)
  a <- igraph::evcent(x)$vector
  a[is.na(a)] <- Inf
  lel <- length(levls)
  if (cumulative)
  {
    cdi <- sapply(2:lel, function(i){sum(a<=levls[i])})
  }
  else
  {
    cdi <- sapply(2:lel, function(i){
      sum(a<=levls[i]) - sum(a <= levls[i-1])})
  }
  names(cdi) <- as.character(levls[2:lel])
  cdi
}

## Finally some examples of the three auxiliary statistics constructed above.

mynet1 <- sienaDependent(array(c(s501, s502, s503), dim=c(50, 50, 3)))
mybeh <- sienaDependent(s50a, type="behavior")
mydata <- sienaDataCreate(mynet1, mybeh)
myeff <- getEffects(mydata)
myeff <- includeEffects(myeff, transTrip, cycle3)
myeff <- includeEffects(myeff, outdeg, name="mybeh", interaction1="mynet1")
myeff <- includeEffects(myeff,  outdeg, name="mybeh", interaction1="mynet1")
# Shorter phases 2 and 3, just for example:
myalgorithm <- sienaAlgorithmCreate(nsub=2, n3=200, seed=765)
(ans2 <- siena07(myalgorithm, data=mydata, effects=myeff, returnDeps=TRUE,
   batch=TRUE))
gofc <- sienaGOF(ans2, EigenvalueDistribution, varName="mynet1",
  verbose=TRUE, join=TRUE)
plot(gofc)
descriptives.sienaGOF(gofc, showAll=TRUE)

goftc <- sienaGOF(ans2, TriadCensus, varName="mynet1", verbose=TRUE, join=TRUE)
plot(goftc, center=TRUE, scale=TRUE)
# For this type of auxiliary statistics
# it is advisable in the plot to center and scale.
# note the keys at the x-axis; these names are given by sna::triad.census
descriptives.sienaGOF(goftc)
round(descriptives.sienaGOF(goftc))

gofgd <- sienaGOF(ans2, GeodesicDistribution, varName="mynet1",
  verbose=TRUE, join=TRUE, cumulative=FALSE)
plot(gofgd)
# and without infinite distances:
gofgdd <- sienaGOF(ans2, GeodesicDistribution, varName="mynet1",
  verbose=TRUE, join=TRUE, levls=1:7, cumulative=FALSE)
plot(gofgdd)

## End(Not run)

RSienaTest documentation built on July 14, 2021, 3 a.m.