# R/lolog.R In lolog: Latent Order Logistic Graph Models

#### Documented in coef.lologlologplot.lologGmmprint.lologsimulate.lologsummary.lolog

#' Fits a LOLOG model via Monte Carlo Generalized Method of Moments
#'
#'
#' @description
#' \code{lolog} is used to fit Latent Order Logistic Graph (LOLOG) models. LOLOG models are
#' motivated by the idea of network growth where the network begins empty, and edge variables
#' are sequentially 'added' to the network with an either unobserved, or partially observed
#' order \eqn{s}. Conditional upon the inclusion order, the probability of an edge has a
#' logistic relationship with the change in network statistics.
#'
#'
#' @param formula A lolog formula for the sufficient statistics (see details).
#' @param auxFormula A lolog formula of statistics to use for moment matching.
#' @param theta Initial parameters values. Estimated via \code{\link{lologVariational}} if NULL.
#' @param nsamp The number of sample networks to draw at each iteration.
#' @param includeOrderIndependent If TRUE, all order independent terms in formula are used for
#' moment matching.
#' @param targetStats A vector of network statistics to use as the target for the moment equations.
#' If \code{NULL}, the observed statistics for the network are used.
#' @param weights The type of weights to use in the GMM objective. Either 'full' for the inverse
#' of the full covariance matrix or 'diagonal' for the inverse of the diagonal of the covariance matrix.
#' @param tol The Hotelling's T^2 p-value tolerance for convergence for the transformed moment conditions.
#' @param nHalfSteps The maximum number of half steps to take when the objective is not improved
#' in an iteration.
#' @param maxIter The maximum number of iterations.
#' @param minIter The minimum number of iterations.
#' @param startingStepSize The starting dampening of the parameter update.
#' @param maxStepSize The largest allowed value for dampening.
#' @param cluster A parallel cluster to use for graph simulation.
#' @param verbose Level of verbosity 0-3.
#'
#'
#' @details
#' LOLOG represents the probability of a tie, given the network grown up to a time point as
#' \deqn{
#'   \textrm{logit}\big(p(y_{s_t}=1 | \eta, y^{t-1}, s_{ \leq t})\big) = \theta \cdot c(y_{s_t}=1 | y^{t-1}, s_{ \leq t})
#' }
#' where \eqn{s_{\leq t}} is the growth order of the network up to time \eqn{t}, \eqn{y^{t-1}} is the
#' state of the graph at time \eqn{t-1}. \eqn{c(y_{s_t} | y^{t-1}, s_{ \leq t})} is a vector
#' representing the change in graph statistics from time \eqn{t-1} to \eqn{t} if an edge is present, and
#' \eqn{\theta} is a vector of parameters.
#'
#' The motivating growth order proceeds 'by vertex.' The network begins 'empty' and then vertices are 'added'
#' to the network sequentially. The order of vertex inclusion may be random or fixed. When a vertex 'enters' the
#' network, each of the edge variables connecting it and vertices already in the network are considered for
#' edge creation in a completely random order.
#'
#' LOLOG formulas contain a network, DirectedNet or UndirectedNet object on the left hand side.
#' the right hand side contains the model terms used. for example,
#'
#' \code{net ~ edges}
#'
#' represents and Erdos-Renyi model and
#'
#' \code{net ~ edges + preferentialAttachment()}
#'
#' represents a Barabasi-Albert model. See \code{\link{lolog-terms}} for a list of allowed model statistics
#'
#' Conditioning on (partial) vertex order can be done by
#' placing an ordering variable on the right hand side of the '|' operator, as in
#'
#' \code{net ~ edges + preferentialAttachment() | order}
#'
#' 'order' should be a numeric vector with as many elements as there are vertices in the network.
#' Ties are allowed. Vertices with higher order values will always be included later. Those with the same
#' values will be included in a random order in each simulated network.
#'
#' offsets and constraints are specified by wrapping them with either \code{offset()} or \code{constraint()},
#' for example, the following specifies an Erdos-Renyi model with the constraint that degrees must be less
#' that 10
#'
#' \code{net ~ edges + constraint(boundedDegree(0L, 10L))}
#'
#' If the model contains any order dependent statistics, additional moment constraints
#' must be specified in \code{auxFormula}. Ideally these should be chosen to capture
#' the features modeled by the order dependent statistics. For example, \code{preferentialAttachment}
#' models the degree structure, so we might choose two-stars as a moment constraint.
#'
#'  \code{lolog(net ~ edges + preferentialAttachment(), net ~ star(2))}
#'
#' will fit a Barabasi-Albert model with the number of edges and number of two-stars as moment constraints.
#'
#'
#' @return An object of class 'lolog'. If the model is dyad independent, the returned object will
#' also be of class "lologVariational" (see \code{\link{lologVariational}}, otherwise it will
#' also be a "lologGmm" object.
#'
#' lologGmm objects contain:
#'
#' \item{method}{"Method of Moments" for order independent models, otherwise "Generalized Method of Moments"}
#' \item{formula}{The model formula}
#' \item{auxFormula}{The formula containing additional moment conditions}
#' \item{theta}{The parameter estimates}
#' \item{stats}{The statistics for each network in the last iteration}
#' \item{estats}{The expected stats (G(y,s)) for each network in the last iteration}
#' \item{obsStats}{The observed h(y) network statistics}
#' \item{targetStats}{The target network statistics}
#' \item{obsModelStats}{The observed g(y,s) network statistics}
#' \item{net}{A network simulated from the fit model}
#' \item{vcov}{The asymptotic covariance matrix of the parameter estimates}
#' \item{likelihoodModel}{An object of class *LatentOrderLikelihood at the fit parameters}
#'
#'
#' @examples
#' library(network)
#' set.seed(1)
#' data(flo)
#' flomarriage <- network(flo,directed=FALSE)
#' flomarriage %v% "wealth" <- c(10,36,27,146,55,44,20,8,42,103,48,49,10,48,32,3)
#'
#' # A dyad independent model
#' fit <- lolog(flomarriage ~ edges + nodeCov("wealth"))
#' summary(fit)
#'
#' # A dyad dependent model with 2-stars and triangles
#' fit2 <- lolog(flomarriage ~ edges + nodeCov("wealth") + star(2) + triangles, verbose=FALSE)
#' summary(fit2)
#'
#' \dontrun{
#'
#' # An order dependent model
#' fit3 <- lolog(flomarriage ~ edges + nodeCov("wealth") + preferentialAttachment(),
#'               flomarriage ~ star(2:3), verbose=FALSE)
#' summary(fit3)
#'
#' # Try something a bit more real
#' data(ukFaculty)
#'
#' # Delete vertices missing group
#' delete.vertices(ukFaculty, which(is.na(ukFaculty %v% "Group")))
#'
#' fituk <- lolog(ukFaculty ~ edges() + nodeMatch("GroupC") + nodeCov("GroupC") + triangles + star(2))
#' summary(fituk)
#' plot(fituk$net, vertex.col= ukFaculty %v% "Group" + 2) #' #' } #' lolog <- function(formula, auxFormula = NULL, theta = NULL, nsamp = 1000, includeOrderIndependent = TRUE, targetStats = NULL, weights = "full", tol = .1, nHalfSteps = 10, maxIter = 100, minIter = 2, startingStepSize = .1, maxStepSize = .5, cluster = NULL, verbose = TRUE) { vcat <- function(..., vl=1){ if(verbose >= vl) cat(...) } vprint <- function(..., vl=1){ if(verbose >= vl) print(...) } #initialize theta via variational inference if (is.null(theta)) { vcat("Initializing Using Variational Fit\n") varFit <- lologVariational(formula, targetFrameSize = 1000000) if (varFit$allDyadIndependent) {
vcat("Model is dyad independent. Returning maximum likelihood estimate.\n")
return(varFit)
}
theta <- varFit$theta vcat("Initial Theta:\n", drop(theta),"\n") } lolik <- createLatentOrderLikelihood(formula, theta = theta) obsModelStats <- lolik$getModel()$statistics() statNames <- names(obsModelStats) orderIndependent <- lolik$getModel()$isIndependent(FALSE, TRUE) dyadIndependent <- lolik$getModel()$isIndependent(TRUE, TRUE) dyadIndependentOffsets <- lolik$getModel()$isIndependent(TRUE, FALSE) if (all(dyadIndependent) && all(dyadIndependentOffsets) && is.null(targetStats)) { vcat("Model is dyad independent. Returning maximum likelihood estimate.\n") varFit <- lologVariational(formula, dyadInclusionRate = 1) return(varFit) } obsModelStats[!orderIndependent] <- NA terms <- .prepModelTerms(formula) auxTerms <- .prepModelTerms(auxFormula) samp <- NULL obsStats <- NULL if (!is.null(auxFormula)) { auxModel <- createCppModel(auxFormula) auxModel$setNetwork(lolik$getModel()$getNetwork())
auxModel$calculate() obsStats <- auxModel$statistics()
}
if (includeOrderIndependent) {
obsStats <-
c(lolik$getModel()$statistics()[orderIndependent], obsStats)
}
if(!is.null(targetStats)){
if(any(is.na(targetStats)))
stop("targetStats may not have missing values")
if(length(obsStats)!=length(targetStats))
stop("Incorrect length of the targetStats vector: should be ",
length(obsStats), " but is ",length(targetStats),".")
}else{
targetStats <- obsStats
}
if(length(targetStats) < length(theta))
stop("Too few moment conditions. Specify more in auxFormula.")

stepSize <- startingStepSize
lastTheta <- NULL
lastObjective <- Inf
hsCount <- 0
iter <- 0
if (!is.null(cluster)) {
tmpNet <- lolik$getModel()$getNetwork()$clone() tmpNet$emptyGraph()
network <- as.network(tmpNet)
clusterExport(cluster, "terms", envir = environment())
clusterExport(cluster, "auxTerms", envir = environment())
clusterExport(cluster, "network", envir = environment())
clusterExport(cluster, "orderIndependent", envir = environment())
clusterExport(cluster, "includeOrderIndependent", envir = environment())
clusterEvalQ(cluster, {
# Load lolog on each node
library(lolog)

# Pull from package while making R CMD check happy
.createLatentOrderLikelihoodFromTerms <- eval(parse(text="lolog:::.createLatentOrderLikelihoodFromTerms"))
.makeCppModelFromTerms <- eval(parse(text="lolog:::.makeCppModelFromTerms"))

# Assign variables with external pointers
enet <- as.BinaryNet(network)
lolik2 <-
.createLatentOrderLikelihoodFromTerms(terms, enet)
if (!is.null(auxTerms))
auxModel2 <- .makeCppModelFromTerms(auxTerms, enet)
NULL
})
}
while (iter < maxIter) {
iter <- iter + 1
vcat("\n\nIteration", iter,"\n")

#generate networks
lolik$setThetas(theta) stats <- matrix(0, ncol = length(theta), nrow = nsamp) estats <- matrix(0, ncol = length(theta), nrow = nsamp) if (includeOrderIndependent) auxStats <- matrix(0, ncol = length(targetStats) - sum(orderIndependent), nrow = nsamp) else auxStats <- matrix(0, ncol = length(targetStats), nrow = nsamp) if (is.null(cluster)) { vcat("Drawing", nsamp, "Monte Carlo Samples:\n") if(verbose) pb <- utils::txtProgressBar(min = 0, max = nsamp, style = ifelse(interactive(),3,1)) for (i in 1:nsamp) { if (verbose) utils::setTxtProgressBar(pb, i) samp <- lolik$generateNetwork()
if (!is.null(auxFormula)) {
auxModel$setNetwork(samp$network)
auxModel$calculate() auxStats[i, ] <- auxModel$statistics()
}
stats[i, ] <- samp$stats + samp$emptyNetworkStats
estats[i, ] <- samp$expectedStats + samp$emptyNetworkStats
}
if(verbose)
close(pb)
vcat("\n")
if (includeOrderIndependent)
auxStats <- cbind(stats[, orderIndependent], auxStats)
} else{
worker <- function(i, theta) {
lolik2$setThetas(theta) samp <- lolik2$generateNetwork()
if (!is.null(auxTerms)) {
auxModel2$setNetwork(samp$network)
auxModel2$calculate() as <- auxModel2$statistics()
} else{
as <- numeric()
}
list(
stats = samp$stats + samp$emptyNetworkStats,
estats = samp$expectedStats + samp$emptyNetworkStats,
auxStats = as
)
}
results <-
parallel::parLapply(cluster, 1:nsamp, worker, theta = theta)
stats <- t(sapply(results, function(x)
x$stats)) estats <- t(sapply(results, function(x) x$estats))
colnames(stats) <- colnames(estats) <- statNames
if (!is.null(auxFormula))
auxStats <- t(sapply(results, function(x)
x$auxStats)) else auxStats <- NULL if (includeOrderIndependent) auxStats <- cbind(stats[, orderIndependent], drop(auxStats)) colnames(auxStats) <- names(obsStats) } # Calculate gradient of moment conditions grad <- matrix(0, ncol = length(theta), nrow = length(targetStats)) for (i in 1:length(targetStats)) { for (j in 1:length(theta)) { grad[i, j] <- -(cov(auxStats[, i], stats[, j]) - cov(auxStats[, i], estats[, j])) } } if (weights == "diagonal") W <- diag(1 / (diag(var(auxStats)))) else{ tr <- try(W <- solve(var(auxStats))) if(inherits(tr, "try-error")){ warning("Singular statistic covariance matrix. Using diagnoal.") W <- diag(1 / (diag(var(auxStats)))) } } # Calculate moment conditions and stat/observed stat differences transformed by W. mh <- colMeans(auxStats) diffs <- -sweep(auxStats, 2, targetStats) transformedDiffs <- t(t(grad) %*% W %*% t(diffs)) momentCondition <- colMeans(transformedDiffs) objective <- colMeans(diffs) %*% W %*% colMeans(diffs) vcat("Objective: ", drop(objective), "\n") objCrit <- max(-1000000, objective - lastObjective) / (lastObjective + 1) # Calculate inverse invFailed <- inherits(try(gradInv <- solve(t(grad) %*% W %*% grad), silent = TRUE) ,"try-error") if (verbose >= 3){ ns <- nrow(stats) os <- obsModelStats pairs(rbind(stats, os), pch='.', cex=c(rep(1, ns), 10), col=c(rep("black", ns), "red"), diag.panel = .panelHist, main=paste("Iteration",iter),cex.main=0.9) } vcat("Statistic GM Skewness Coef :", apply(stats,2, .gmSkewness),"\n", vl=2) # If inverse failed, or the objective has increased significantly, initiate half stepping if (hsCount < nHalfSteps && !is.null(lastTheta) && (invFailed || objCrit > .3)) { vcat("Half Step Back\n") theta <- (lastTheta + theta) / 2 hsCount <- hsCount + 1 stepSize <- stepSize / 2 vcat("Theta:", drop(theta), "\n", vl=2) next } else{ stepSize <- min(maxStepSize, stepSize * 1.25) hsCount <- 0 } vcat("Step size:", stepSize, "\n", vl=2) # Update theta lastTheta <- theta theta <- theta - stepSize * gradInv %*% momentCondition lastObjective <- objective # Print diagnostic Information algoState <- data.frame(lastTheta, theta,momentCondition) colnames(algoState) <- c("Theta","Next Theta","Moment Conditions") rownames(algoState) <- statNames if(!is.null(auxFormula)){ momState <- data.frame(colMeans(diffs)/sqrt(diag(var(diffs)))) colnames(momState) <- "(h(y) - E(h(Y))) / sd(h(Y))" rownames(momState) <- names(obsStats) vprint(algoState, vl=2) vprint(momState, vl=2) }else{ algoState[["(h(y) - E(h(Y))) / sd(h(Y))"]] <- colMeans(diffs)/sqrt(diag(var(diffs))) vprint(algoState, vl=2) } #Hotelling's T^2 test hotT <- momentCondition %*% solve(var(transformedDiffs) / nrow(transformedDiffs)) %*% momentCondition pvalue <- pchisq(hotT, df = length(theta), lower.tail = FALSE) vcat("Hotelling's T2 p-value: ", format.pval(pvalue,digits=5,eps=1e-5), "\n") if (pvalue > tol && iter >= minIter) { break } else if (iter < maxIter) { } } if (is.null(samp)) { samp <- lolik$generateNetwork()
}

# Calculate parameter covariances
omega <- var(auxStats)

# Some formatting of return items
lastTheta <- drop(lastTheta)
rownames(vcov) <- colnames(vcov) <- colnames(grad) <-
colnames(stats) <- colnames(estats) <- names(lastTheta) <- statNames

method <-
if (is.null(auxFormula))
"Method of Moments"
else
"Generalized Method Of Moments"

result <- list(
method = method,
formula = formula,
auxFromula = auxFormula,
theta = lastTheta,
stats = stats,
estats = estats,
auxStats = auxStats,
obsStats = obsStats,
targetStats = targetStats,
obsModelStats = obsModelStats,
net = samp$network, grad = grad, vcov = vcov, likelihoodModel = lolik ) class(result) <- c("lologGmm", "lolog", "list") result } #' Print a lolog object #' @param x the object #' @param ... additional parameters (unused) #' @method print lolog print.lolog <- function(x, ...) { cat(x$method, "Coefficients:\n")
print(x$theta) } #' Summary of a lolog object #' @param object the object #' @param ... additional parameters (unused) #' @method summary lolog #' @examples #' data(lazega) #' fit <- lologVariational(lazega ~ edges() + nodeMatch("office") + triangles, #' nReplicates=50L, dyadInclusionRate=1) #' summary(fit) #' @method summary lolog summary.lolog <- function(object, ...) { x <- object theta <- x$theta
se <- sqrt(diag(x$vcov)) pvalue <- 2 * pnorm(abs(theta / se), lower.tail = FALSE) stats <- x$likelihoodModel$getModel()$statistics()
orderInd <- x$likelihoodModel$getModel()$isIndependent(FALSE, TRUE) stats[!orderInd] <- NA result <- data.frame( observed_statistics = stats, theta = theta, se = se, pvalue = round(pvalue, 4) ) rownames(result) <- names(stats) result } #' Generates BinaryNetworks from a fit lolog object #' #' #' @param object A lolog object. #' @param nsim The number of simulated networks #' @param seed Either NULL or an integer that will be used in a call to set.seed before simulating #' @param convert convert to a network object#' #' @param ... unused #' #' @return A list of BinaryNet (or network if convert=TRUE) objects. Networks contain an additional #' vertex covariate "__order__" that indicates the sequence order in which the vertex was 'added' #' into the network. #' #' #' @examples #' library(network) #' data(flo) #' flomarriage <- network(flo,directed=FALSE) #' flomarriage %v% "wealth" <- c(10,36,27,146,55,44,20,8,42,103,48,49,10,48,32,3) #' fit <- lolog(flomarriage ~ edges + nodeCov("wealth")) #' net <- simulate(fit)[[1]] #' plot(net) #' #' @method simulate lolog simulate.lolog <- function(object, nsim = 1, seed = NULL, convert = FALSE, ...) { if (!is.null(seed)) set.seed(seed) l <- list() for (i in 1:nsim) { l[[i]] <- object$likelihoodModel$generateNetwork()$network
if (convert)
l[[i]] <- as.network(l[[i]])
}
l
}

#' Extracts estimated model coefficients.
#'
#' @param object A lolog object.
#' @param ... unused
#' @examples
#' # Extract parameter estimates as a numeric vector:
#' data(ukFaculty)
#' fit <- lolog(ukFaculty ~ edges)
#' coef(fit)
#' @method coef lolog
coef.lolog <- function(object, ...){
object$theta } #' Conduct Monte Carlo diagnostics on a lolog model fit #' #' This function creates simple diagnostic #' plots for MC sampled statistics produced from a lolog fit. #' #' Plots are produced that represent the distributions of the #' output sampled statistic values or the target statistics values. #' The values of the observed target statistics for the networks are #' also represented for comparison with the sampled statistics. #' #' @param x A model fit object to be diagnosed. #' @param type The type of diagnostic plot. "histograms", the default, produces histograms of the sampled #' output statistic values with the observed statistics represented by vertical lines. "target" produces a pairs plot of the #' target output statistic values with the pairs of observed target statistics represented by red squares. #' output statistic values with the observed statistics represented by vertical lines. "model" produces a pairs plot of the #' sampled output statistic values with the pairs of observed statistics represented by red squares. #' @param ... Additional parameters. Passed to \link{geom_histogram} if type="histogram" #' and \link{pairs} otherwise. #' #' @examples #' library(network) #' set.seed(1) #' data(flo) #' flomarriage <- network(flo,directed=FALSE) #' flomarriage %v% "wealth" <- c(10,36,27,146,55,44,20,8,42,103,48,49,10,48,32,3) #' #' #' # An order dependent model #' fit3 <- lolog(flomarriage ~ edges + nodeCov("wealth") + preferentialAttachment(), #' flomarriage ~ star(2:3), verbose=FALSE) #' plot(fit3) #' plot(fit3, "target") #' plot(fit3, "model") plot.lologGmm <- function(x, type=c("histograms", "target","model"), ...) { type <- match.arg(type, c("histograms", "target","model")) if(type == "target"){ stats <- x$auxStats
ns <- nrow(stats)
stats <- rbind(stats, x$targetStats) pch <- '.' cex <- c(rep(1, ns), 10) col <- c(rep("black", ns), "red") pairs(stats, pch=pch, cex=cex, col=col, ...) }else if(type == "model"){ stats <- x$stats
ns <- nrow(stats)
stats <- rbind(stats, x$obsModelStats) pch <- '.' cex <- c(rep(1, ns), 10) col <- c(rep("black", ns), "red") pairs(stats, pch=pch, cex=cex, col=col, diag.panel = .panelHist, ...) }else if(type == "histograms"){ stats <- x$auxStats
obs <- x$targetStats nin <- !(colnames(x$stats) %in% colnames(stats))
stats <- cbind(x$stats[,nin,drop=FALSE], stats) obs <- c(x$obsModelStats[nin], obs)

Var2 <- value <- NULL # for R CMD check
ss <- reshape2::melt(stats)
oo <- na.omit(data.frame(Var2=names(obs),value=obs))
pp <- ggplot2::ggplot(data=ss, ggplot2::aes(x=value)) +
ggplot2::geom_histogram(...) +
ggplot2::geom_vline(ggplot2::aes(xintercept=value), data=oo, color=I("red"), size=I(1)) +
ggplot2::facet_wrap(~Var2, scales="free") + ggplot2::xlab("") + ggplot2::theme_bw()
invisible(print(pp))
}else
stop("unknown plot type")
invisible()
}


## Try the lolog package in your browser

Any scripts or data that you put into this service are public.

lolog documentation built on July 1, 2021, 9:09 a.m.