Automultinomial Vignette"

knitr::opts_chunk$set(echo = TRUE)

This vignette explains the installation and use of the R package \texttt{automultinomial}. The \texttt{automultinomial} package is designed to be used for regressions similar to logistic or multinomial logit regression. However, unlike ordinary logistic and multinomial logit models, the autologistic/automultinomial model includes an autocorrelation parameter to account for spatial dependence between observations.

The organization of this document is:

  1. Description of the problem \texttt{automultinomial} solves
  2. cran and Github installation how-to
  3. A data example with binary response (2 response categories)
  4. A data example with 3 response categories
  5. Some comments on autologistic model parameterization

\section{The problem \texttt{automultinomial} solves}

Consider a data problem where covariates (the independent variables) and categorical outcomes (the dependent variables) are observed on a spatial grid or lattice. If the outcomes are spatially autocorrelated, then our coefficient estimates and inference procedures ought to take this into account. Intuitively, when outcomes are correlated, the effective sample size of the dataset is smaller than if the outcomes were independent. The statistical information we can gain based on samples from multiple nearby sites may be less than if we were able to sample sites that are ``more independent''.

A particular practical risk in neglecting spatial correlation is that confidence intervals may be too narrow. We might find ``statistically significant'' relationships that are in fact byproducts of noise or dependent sampling. The \texttt{automultinomial} package provides a tool for dealing with outcomes with spatial autocorrelation. It still might happen that nearby sites in a dataset are effectively independent. But we can only assess this using a method that accounts for the spatial arrangement of the samples.

Having given a qualitative description of the data problem, we will now move on to a discussion of the probability model used by \texttt{automultinomial}.

Mathematical description

To describe the setup in \texttt{automultinomial}, we will use the notations

We will assume that for each site $i$, a collection $N_i$ of neighboring sites is known. If the data are collected in a square lattice fashion, then a natural neighborhood setup is to set $N_i$ to be the 4 Up, Down, Left, and Right neighbors of site $i$ (sites at the boundary of a lattice may have less than 4 neighbors). We will assume $i\notin N_{i}$ for each $i$, so that site $i$ is not a neighbor of itself. It is allowable for sites to have no neighbors. In this case $N_i=\phi$, where $\phi$ denotes the empty set. We will use the notation $i'\sim i$ to indicate that site $i'$ is a neighbor of site $i$.

In the \texttt{automultinomial} package, we will require that the neighborhoods are symmetric: if $i'\in N_i$, then necessarily $i\in N_{i'}$ as well.

We will also define an energy function $H(\mathbf{z}|\boldsymbol{\beta},\gamma)$. When $K=2$ and the two categories are $k=1$ and $k=2$, then $\boldsymbol{\beta}$ is just a vector $\beta$ with length $p$, and \begin{eqnarray} H(\mathbf{z}|\boldsymbol{\beta},\gamma)=\sum_{i=1}^{n}x_i^T\beta I(z_i=2)+\gamma \sum_{i=1}^{n}\sum_{\substack{i'\sim i\i'>i}}\sum_{k=1}^{2}I(z_i=z_{i'}=k)\label{H} \end{eqnarray}

In general, for $K\geq 2$ we define $H(\mathbf{z}|\boldsymbol{\beta},\gamma)$ by

\begin{eqnarray} H(\mathbf{z}|\boldsymbol{\beta},\gamma)=\sum_{i=1}^{n}\sum_{k=1}^{K-1}x_i^T\beta_kI(z_i=k+1)+\gamma \sum_{i=1}^{n}\sum_{\substack{i'\sim i\i'>i}}\sum_{k=1}^{K}I(z_i=z_{i'}=k) \end{eqnarray}

With this preamble accomplished, we can define the probability model \texttt{automultinomial} seeks to estimate:

\begin{eqnarray} p(\mathbf{z}|\boldsymbol{\beta},\gamma)=\frac{\exp{H(\mathbf{z}|\boldsymbol{\beta},\gamma)}}{\sum_{\mathbf{z'}}\exp{H(\mathbf{z'}|\boldsymbol{\beta},\gamma)}}\label{density} \end{eqnarray}

The term $\sum_{\mathbf{z'}}$ indicates a sum over all possible categorical responses for the entire dataset.

Some intuition regarding \eqref{density} can be gained as follows: when $\gamma=0$, then $\gamma \sum_{i=1}^{n}\sum_{\substack{i'\sim i\i'>i}}\sum_{k=1}^{2}I(z_i=z_{i'}=k)=0$ so that the $z_i$ are completely independent. In this case the autologistic (automultinomial) model is the same as an ordinary logistic (multinomial logit) model. On the other hand, when $\gamma>0$, then response configurations $\mathbf{z}$ where $z_i=z_{i'}$ for many neighbor pairs $i\sim i'$ become more likely. In this way, $\gamma$ incorporates positive spatial correlation into the responses. When $\gamma<0$, we expect neighboring $z_i,z_{i'}$ to disagree more frequently than if the $z_i$ were independent, and the $\gamma<0$ case will in practice be less common.

Estimating $\boldsymbol{\beta}$ and $\gamma$

The \texttt{automultinomial} package follows the pseudolikelihood approach of (Besag, 1974) to estimate the parameters $\beta,\gamma$. We briefly explain the procedure and its motivation below.

When $\gamma$ is known to be $0$, then the responses $z_i$ are independent, and we can use common maximum likelihood techniques such as logistic or multinomial regression to estimate $\beta$. On the other hand, when spatial correlation is present, then we need to estimate $\boldsymbol{\beta}$ and $\gamma$ jointly. Unfortunately, maximum likelihood for the model in equation \eqref{density} is computationally infeasible due to the well known computational intractability of the denominator of $\eqref{density}$ when $\gamma\neq 0$.

Fortunately, a proposal of (Besag, 1974) provides a consistent estimation procedure, maximum pseudolikelihood, for the parameters of the model in equation \eqref{density}. In maximum pseudolikelihood, we maximize

\begin{eqnarray} \ell_{PL}(\boldsymbol{\beta},\gamma)=\sum_{i=1}^{n}\log{p(z_i|z_{-i},\boldsymbol{\beta},\gamma)}\label{pseudolikelihood} \end{eqnarray}

over $\boldsymbol{\beta}$ and $\gamma$. The expression $p(z_i|z_{-i},\boldsymbol{\beta},\gamma)$ refers to the conditional density of $z_i$ given the sites at all of the other grid locations. A short calculation based on equation \eqref{density} shows that for $K=2$,

\begin{eqnarray} p(z_i=1|z_{-i},\boldsymbol{\beta},\gamma)&=&\frac{\exp{\gamma\sum_{i'\sim i}I(z_{i'}=1)}}{\exp{\gamma\sum_{i'\sim i}I(z_{i'}=1)}+\exp{x_i^T\beta+\gamma\sum_{i'\sim i}I(z_{i'}=2)}}\label{p1}\ p(z_i=2|z_{-i},\boldsymbol{\beta},\gamma)&=&\frac{\exp{x_i^T\beta+\gamma\sum_{i'\sim i}I(z_{i'}=2)}}{\exp{\gamma\sum_{i'\sim i}I(z_{i'}=1)}+\exp{x_i^T\beta+\gamma\sum_{i'\sim i}I(z_{i'}=2)}}\label{p2} \end{eqnarray}

For $K\geq 2$ response categories, we have

\begin{eqnarray} p(z_i=1|z_{-i},\boldsymbol{\beta},\gamma)&=&\frac{\exp{\gamma\sum_{i'\sim i}I(z_{i'}=1)}}{\exp{\gamma\sum_{i'\sim i}I(z_{i'}=1)}+\sum_{k=1}^{K-1}\exp{x_i^T\beta_k+\gamma\sum_{i'\sim i}I(z_{i'}=k+1)}}\label{p3} \end{eqnarray}

and for $1\leq k<K$, \begin{eqnarray} p(z_i=k+1|z_{-i},\boldsymbol{\beta},\gamma)&=&\frac{\exp{x_i^T\beta_k+\gamma\sum_{i'\sim i}I(z_{i'}=k+1)}}}{\exp{\gamma\sum_{i'\sim i}I(z_{i'}=1)}+\sum_{k=1}^{K-1}\exp{x_i^T\beta_k+\gamma\sum_{i'\sim i}I(z_{i'}=k+1)}}\label{p4} \end{eqnarray}

In the \texttt{automultinomial} package, equations \eqref{p1}, \eqref{p2}, \eqref{p3}, and \eqref{p4} are plugged into equation \eqref{pseudolikelihood}, and \eqref{pseudolikelihood} is optimized over $\boldsymbol{\beta}$ and $\gamma$ using the \texttt{optim()} function.

\section{Installation}

cran installation

Simply use

install.packages("automultinomial")
library(automultinomial)

and the package will be ready to run.

Github installation

The most recent (development) version of the package can be installed from Github.

Installing from Github is as easy as installing from cran. First, make sure Hadley Wickham's package \texttt{devtools} is installed on your RStudio:

install.packages("devtools")

Then, run the following:

devtools::install_github(repo="stephenberg/automultinomial")
library(automultinomial)

Now the package is ready to run.

\section{Data example 1: K=2 response categories}

Here, we will demonstrate how to simulate data using \texttt{automultinomial}, how to fit data using \texttt{automultinomial}, and how to analyze the output.

Simulating data

First we will use the \texttt{drawSamples()} function to simulate some data from the autologistic model. The \texttt{drawSamples()} function takes the following arguments:

First, we will set up the simulation parameters:

library(automultinomial)
set.seed(33)

#10 predictors
p=5


#n times n grid
n=40

#make grid and adjacency matrix
latticeGraph=igraph::make_lattice(c(n,n))
A=igraph::get.adjacency(latticeGraph)

#set coefficient values 
beta=matrix(rnorm(p),ncol=1)*0.3
beta

#set covariate values
X=matrix(rnorm(n^2*p),ncol=p)

#set the correlation parameter value (0.7 is a moderate amount of spatial correlation)
gamma=0.7
load("data/2category.RData")
y=as.numeric(y)
y2=as.numeric(y2)

Then, we will use the \texttt{drawSamples()} function to generate simulated data.

#use drawSamples to simulate data with parameters beta and gamma by Gibbs sampling
y=drawSamples(beta,gamma,X,A,nSamples = 1)
y2=drawSamples(beta,0,X,A,nSamples = 1)

Figure \ref{fig:plotk2} shows plots of the responses on the grid. On the left plot, we can see "clumping" of the responses due to the positive autocorrelation parameter $\gamma=0.7$. The right hand plot is from the distribution with the same $\boldsymbol{\beta}$, and $\gamma=0$.

\begin{figure} \centering \includegraphics[width=1\textwidth]{plots/plotk2.png} \caption{On the left, a plot of the dataset we just simulated, where the truth is $\gamma=0.7$. For comparison, the plot on the right shows data taken from the same model but with $\gamma=0$.} \label{fig:plotk2} \end{figure}

Fitting an autologistic model to the data (K=2 categories)

Now we'll fit an autologistic model using the \texttt{MPLE()} function. The \texttt{MPLE()} function takes 3 primary arguments:

There are also several secondary arguments.

After fitting the model, we will examine confidence intervals for the fitted parameters. The asymptotic type confidence intervals can be computed very quickly, but may be less accurate in practice than bootstrap confidence intervals.

First, we will fit the model using the \texttt{MPLE()} function.

# responses must be input as a factor
y=factor(y)
fit1=MPLE(X=X,y=y,A=A,ciLevel = 0.95,method = "asymptotic")
fit2=MPLE(X=X,y=y,A=A,ciLevel = 0.95,method = "boot",nBoot = 500)

Then, we can use the \texttt{MPLE_summary()} function to view the model summary in table form.

#to see the information contained in fit1 and fit2, use str() (not run to save space)
#str(fit1)
#str(fit2)

fitSummary1=MPLE_summary(fit1)
fitSummary2=MPLE_summary(fit2)
library(ggplot2)
ciMat1=fit1$ciMatrix
ciMat2=fit1$ciMatrix
df=cbind(fit1$ciMatrix,fit2$ciMatrix,c(beta,gamma),c(fit1$betaHat,fit1$gammaHat))
df=as.data.frame(df)
df=cbind(df,factor(c(as.character(1:5),"gamma")))
colnames(df)[5:7]=c("Truth","MPLE","coefficient")
colnames(df)[1:4]=as.character(1:4)
cbPalette <- c("Asymptotic"="#009E73" ,"Bootstrap"="#CC79A7","MPLE"="#009E73" ,"Truth"="#CC79A7")
cbPalette2 <- c("MPLE"=21 ,"Truth"=23)
p1=ggplot()+geom_errorbar(data=df,aes(x=coefficient,ymin=df[,1],ymax=df[,2],color="Asymptotic"),size=1)+geom_errorbar(data=df,aes(x=coefficient,ymin=df[,3],ymax=df[,4],color="Bootstrap"),size=1)
p1=p1+geom_point(show.legend=TRUE)
p1=p1+geom_point(data=df,aes(x=coefficient,y=MPLE,shape="MPLE"))
p1=p1+geom_point(data=df,aes(x=coefficient,y=Truth,shape="Truth"))
p1=p1+scale_y_continuous(limits=c(-1.0,1.0),name = "Coefficient Estimates")
p1=p1+scale_colour_manual(name="95% CI",values=cbPalette)
p1=p1+scale_shape_manual(name="Type",values=cbPalette2)
p1

\section{Data example 2: K=3 response categories}

Now, we demonstrate the use of the package when there are 3 response categories. Essentially, everything is still the same, and most of the previous code doesn't change at all. The only difference is that now, the $\boldsymbol{\beta}$ parameter is a matrix with $2=K-1$ columns, rather than a vector as in the $K=2$ case.

Simulating data

Generating simulated data using the function \texttt{drawSamples()}.

set.seed(42)

#10 predictors
p=5

#n times n grid
n=40

#make grid and adjacency matrix
latticeGraph=igraph::make_lattice(c(n,n))
A=igraph::get.adjacency(latticeGraph)

#set coefficient values 
#with 3 categories in the response, beta is now a matrix
beta=matrix(rnorm(p*2),ncol=2)*0.3
beta

#set covariate values
X=matrix(rnorm(n^2*p),ncol=p)

#set the correlation parameter value (0.7 is a moderate amount of spatial correlation)
gamma=0.7
load("data/3category.RData")
y=as.numeric(y)
y2=as.numeric(y2)
#use drawSamples to simulate data with parameters beta and gamma by Gibbs sampling
y=drawSamples(beta,gamma,X,A,nSamples = 1)
y2=drawSamples(beta,0,X,A,nSamples = 1)

Figure \ref{fig:plotk3} shows plots of the 3-category responses on the grid. On the left plot, we again see "clumping" of the responses due to the positive autocorrelation parameter. The right hand plot is from the distribution with the same $\boldsymbol{\beta}$, and $\gamma=0$.

\begin{figure} \centering \includegraphics[width=1\textwidth]{plots/plotk3.png} \caption{On the left, a plot of the 3-category dataset we just simulated, where the truth is $\gamma=0.7$. For comparison, the plot on the right shows data taken from the same model but with $\gamma=0$.} \label{fig:plotk3} \end{figure}

Fitting an automultinomial model to the data (K=3 categories)

#responses must be input as a factor
y=factor(y)
fit1=MPLE(X=X,y=y,A=A,ciLevel = 0.95,method = "asymptotic")
fit2=MPLE(X=X,y=y,A=A,ciLevel = 0.95,method = "boot",nBoot = 500)
#to see the information contained in fit1 and fit2, use str() (not run to save space)
#str(fit1)
#str(fit2)

fitSummary1=MPLE_summary(fit1)
fitSummary2=MPLE_summary(fit2)
library(ggplot2)
ciMat1=fit1$ciMatrix
ciMat2=fit1$ciMatrix
df=cbind(fit1$ciMatrix,fit2$ciMatrix,c(beta,gamma),c(fit1$betaHat,fit1$gammaHat))
df=as.data.frame(df)
df=cbind(df,factor(rownames(ciMat1)))
colnames(df)[5:7]=c("Truth","MPLE","coefficient")
colnames(df)[1:4]=as.character(1:4)
cbPalette <- c("Asymptotic"="#009E73" ,"Bootstrap"="#CC79A7","MPLE"="#009E73" ,"Truth"="#CC79A7")
cbPalette2 <- c("MPLE"=21 ,"Truth"=23)
p1=ggplot()+geom_errorbar(data=df,aes(x=coefficient,ymin=df[,1],ymax=df[,2],color="Asymptotic"),size=1)+geom_errorbar(data=df,aes(x=coefficient,ymin=df[,3],ymax=df[,4],color="Bootstrap"),size=1)
p1=p1+geom_point(show.legend=TRUE)
p1=p1+geom_point(data=df,aes(x=coefficient,y=MPLE,shape="MPLE"))
p1=p1+geom_point(data=df,aes(x=coefficient,y=Truth,shape="Truth"))
p1=p1+scale_y_continuous(limits=c(-0.5,0.9),name = "Coefficient Estimates")
p1=p1+scale_colour_manual(name="95% CI",values=cbPalette)
p1=p1+scale_shape_manual(name="Type",values=cbPalette2)
p1

\section{A comparison of different autologistic parameterizations}

Finally, we'll conclude with a discussion of different parameterizations of the autologistic model. We will focus on the case $K=2$ here. The goal for now is to show by example that the parameterization used by \texttt{automultinomial} will be invariant to the choice of reference category (not guaranteed by all of the parameterizations) and to give some concrete data examples.

Commonly used autologistic parameterizations

Frequently, autologistic parameterizations are given in terms of the conditional probability distributions at each site, rather than in terms of the joint distribution over all sites. This convention might perhaps lead to confusion about what model is actually being fit, but we will follow the common convention here.

We suppose that the response at each site is binary, so that there are $K=2$ response categories.

\textbf{Parameterization 1 (Asymmetric)}: In what we will call the asymmetric parameterization, the set of allowed responses at each site is usually taken to be ${0,1}$. The conditional probabilities for the set ${0,1}$ are

\begin{eqnarray} p(z_i=1|z_{-i})=\frac{\exp{x_i^T\beta+\gamma\sum_{i'\sim i}z_{i'}}}{1+\exp{x_i^T\beta+\gamma\sum_{i'\sim i}z_{i'}}}\ p(z_i=0|z_{-i})=\frac{1}{1+\exp{x_i^T\beta+\gamma\sum_{i'\sim i}z_{i'}}} \end{eqnarray}

If we shift the values of the responses up by 1, so that we identify $0$ with $1$ and $1$ with $2$, then with a slight abuse of notation we have

\begin{eqnarray} p(z_i=2|z_{-i})=\frac{\exp{x_i^T\beta+\gamma\sum_{i'\sim i}I(z_{i'}=2)}}{1+\exp{x_i^T\beta+\gamma\sum_{i'\sim i}I(z_{i'}=2)}}\ p(z_i=1|z_{-i})=\frac{1}{1+\exp{x_i^T\beta+\gamma\sum_{i'\sim i}I(z_{i'}=2)}}\label{p12} \end{eqnarray}

This form of the conditional distributions corresponds to the model

\begin{eqnarray} p_{asy}(\mathbf{z}|\boldsymbol{\beta},\gamma)=\frac{\exp{H_{asy}(\mathbf{z}|\boldsymbol{\beta},\gamma)}}{\sum_{\mathbf{z'}}\exp{H_{asy}(\mathbf{z'}|\boldsymbol{\beta},\gamma)}}\label{asym} \end{eqnarray} where \begin{eqnarray} H_{asy}(\mathbf{z}|\boldsymbol{\beta},\gamma)=\sum_{i=1}^{n}x_i^T\beta I(z_i=2)+\gamma \sum_{i=1}^{n}\sum_{\substack{i'\sim i\i'>i}}I(z_i=z_{i'}=2) \end{eqnarray}

By the Hammersley-Clifford theorem, the density in equation \eqref{asym} is the unique joint distribution corresponding to the conditional distributions in equation \eqref{p12}.

\textbf{Parameterization 2 (Symmetric)}: This parameterization is sometimes referred to as the $\pm 1$ parameterization. We will now show that for $K=2$, the symmetric parameterization is equivalent to the parameterization used by \texttt{automultinomial}. On the other hand, the formulation given for the \texttt{automultinomial} parameterization in section 1 is much more cleanly generalized to $K>2$ categories.

In the symmetric parameterization, the set of allowed outcomes is taken to be ${-1,1}$, and the conditional probabilities at each site are \begin{eqnarray} p(z_i=1|z_{-i})=\frac{\exp{x_i^T\beta+\gamma\sum_{i'\sim i}z_{i'}}}{\exp{-\gamma\sum_{i'\sim i}z_{i'}}+\exp{x_i^T\beta+\gamma\sum_{i'\sim i}z_{i'}}}\ p(z_i=-1|z_{-i})=\frac{\exp{-\gamma\sum_{i'\sim i}z_{i'}}}{\exp{-\gamma\sum_{i'\sim i}z_{i'}}+\exp{x_i^T\beta+\gamma\sum_{i'\sim i}z_{i'}}} \end{eqnarray}

Now, as with the asymmetric model, we will map the response set ${-1,1}$ to ${1,2}$. We will identify $-1$ with $1$ and $1$ with $2$, so that the conditional probabilities become

\begin{eqnarray} p(z_i=2|z_{-i})&=&\frac{\exp\left[x_i^T\beta+\gamma\sum_{i'\sim i}{I(z_{i'}=2)-I(z_{i'}=1)}\right]}{1+\exp\left[x_i^T\beta+\gamma\sum_{i'\sim i}{I(z_{i'}=2)-I(z_{i'}=1)}\right]}\ &=&\frac{\exp{x_i^T\beta +\gamma\sum_{i'\sim i}I(z_{i'}=2)}}{\exp{\gamma\sum_{i'\sim i}I(z_{i'}=1)}+\exp{x_i^T\beta+\gamma\sum_{i'\sim i}I(z_{i'}=2)}}\ p(z_i=1|z_{-i})&=&\frac{\exp{\gamma\sum_{i'\sim i}I(z_{i'}=1)}}{\exp{\gamma\sum_{i'\sim i}I(z_{i'}=1)}+\exp{x_i^T\beta+\gamma\sum_{i'\sim i}I(z_{i'}=2)}}\label{p22} \end{eqnarray}

This form of the conditional distributions corresponds to the model

\begin{eqnarray} p_{sym}(\mathbf{z}|\boldsymbol{\beta},\gamma)=\frac{\exp{H_{sym}(\mathbf{z}|\boldsymbol{\beta},\gamma)}}{\sum_{\mathbf{z'}}\exp{H_{sym}(\mathbf{z'}|\boldsymbol{\beta},\gamma)}}\label{sym} \end{eqnarray}

\begin{eqnarray} H_{sym}(\mathbf{z}|\boldsymbol{\beta},\gamma)=\sum_{i=1}^{n}x_i^T\beta I(z_i=2)+\gamma \sum_{i=1}^{n}\sum_{\substack{i'\sim i\i'>i}}\sum_{k=1}^{2}I(z_i=z_{i'}=k)\label{hsym} \end{eqnarray}

By the Hammersley-Clifford theorem, the density in equation \eqref{sym} is the unique joint distribution corresponding to the conditional distributions in equation \eqref{p22}.

\textbf{Parameterization 3: the parameterization used in \texttt{automultinomial}}s: This parameterization is the parameterization defined in section 2 of this document.

\subsection{Comparison} Having defined the 3 parameterizations, we will now compare them. Our first conclusion is that when $K=2$, the symmetric ($\pm 1$ ) parameterization is exactly equivalent to the parameterization used by \texttt{automultinomial}. This conclusion follows from determining that the energies in equations \eqref{H} and \eqref{hsym} are exactly the same, so that the densities in equation \eqref{sym} and \eqref{density} are exactly the same.

Therefore, from now on we need only compare the \texttt{automultinomial} parameterization and the asymmetric parameterization. The following statements can be shown:

We will now give some examples on simulated data.

#making a square lattice graph and adjacency matrix with toroidal boundary conditions
#every site has 4 neighbors
t1=igraph::make_lattice(c(40,40),circular=TRUE)
a1=igraph::get.adjacency(t1)


#making a square lattice graph and adjacency matrix with free boundary conditions
#sites have 2 neighbors at the corner of the lattice, 3 neighbors on edges of the lattice,
#and 4 neighbors internal to the lattice
t2=igraph::make_lattice(c(40,40),circular=FALSE)
a2=igraph::get.adjacency(t2)

#making the X matrices: X does not have an intercept, but X_intercept has an intercept
X=matrix(rnorm(1600*2),ncol=2)
X_intercept=cbind(rep(1,1600),X)

beta=rnorm(3)
gamma=0.5
#simulated responses
y=drawSamples(beta=beta,gamma=gamma,X = X_intercept, A=a1,nSamples = 1)

\textbf{Case 1: same number of neighbors, covariate matrix includes intercept}

Here, we see that the pseudolikelihoods match and that coefficients (excepting $\gamma$ and the intercept) also match. The intercept and $\gamma$ from the two models are related by a simple linear transformation.

load("data/paramExample.RData")
fitAsymmetric=glm(factor(y)~cbind(X,as.matrix(a1%*%(y-1))),family=binomial())
fitAutomultinomial=MPLE(X=X_intercept,y=factor(y),A = a1,method="asymptotic")

#comparing pseudolikelihoods: identical
fitAsymmetric$deviance/-2
fitAutomultinomial$pseudolikelihood

#comparing beta: identical
fitAsymmetric$coefficients[2:3]
fitAutomultinomial$betaHat[2:3]

#comparing intercept and gamma: not identical, but there is a mapping between them
gammaAsymmetric=fitAsymmetric$coefficients[4]
gammaAutomultinomial=fitAutomultinomial$gammaHat
gammaAsymmetric
gammaAutomultinomial*2

interceptAutomultinomial=fitAutomultinomial$betaHat[1]
interceptAsymmetric=fitAsymmetric$coefficients[1]
interceptAsymmetric
interceptAutomultinomial-4*gammaAutomultinomial

\textbf{Case 2: different number of neighbors, covariate matrix includes intercept}

Here, we see that the pseudolikelihoods and coefficients no longer match between parameterizations. Additionally, while the \texttt{automultinomial} parameterization is invariant to changes in the reference category, the asymmetric parameterization is not.

fitAsymmetric1=glm(factor(y)~cbind(X,as.matrix(a2%*%(y-1))),family=binomial())
fitAutomultinomial1=MPLE(X=X_intercept,y=factor(y),A = a2,method="asymptotic")

#refitting with different reference category
y2=as.factor(y)
y2=as.numeric(relevel(y2,ref = "2"))
fitAsymmetric2=glm(factor(y2)~cbind(X,as.matrix(a2%*%(y2-1))),family=binomial())
fitAutomultinomial2=MPLE(X=X_intercept,y=factor(y2),A = a2,method="asymptotic")


#comparing pseudolikelihoods: not identical
fitAsymmetric1$deviance/-2 
fitAsymmetric2$deviance/-2
fitAutomultinomial1$pseudolikelihood
fitAutomultinomial2$pseudolikelihood

#comparing beta: not identical
fitAsymmetric1$coefficients[2:3]
fitAsymmetric2$coefficients[2:3]
fitAutomultinomial1$betaHat[2:3]
fitAutomultinomial2$betaHat[2:3]

#comparing gamma: not identical. Asymmetric parameterization not invariant to
#choice of reference category. Automultinomial parameterization invariant to 
#choice of reference category
gammaAsymmetric1=fitAsymmetric1$coefficients[4]
gammaAsymmetric2=fitAsymmetric2$coefficients[4]
gammaAutomultinomial1=fitAutomultinomial1$gammaHat
gammaAutomultinomial2=fitAutomultinomial2$gammaHat
gammaAsymmetric1
gammaAsymmetric2
gammaAutomultinomial1
gammaAutomultinomial2

\textbf{Case 3: same number of neighbors, covariate matrix does not include intercept}

Here, we see that the pseudolikelihoods and coefficients no longer match between parameterizations. Additionally, while the \texttt{automultinomial} parameterization is invariant to changes in the reference category, the asymmetric parameterization is not.

fitAsymmetric1=glm(factor(y)~cbind(X,as.matrix(a2%*%(y-1)))-1,family=binomial())
fitAutomultinomial1=MPLE(X=X,y=factor(y),A = a2,method="asymptotic")

#refitting with different reference category
y2=as.factor(y)
y2=as.numeric(relevel(y2,ref = "2"))
fitAsymmetric2=glm(factor(y2)~cbind(X,as.matrix(a2%*%(y2-1)))-1,family=binomial())
fitAutomultinomial2=MPLE(X=X,y=factor(y2),A = a2,method="asymptotic")


#comparing pseudolikelihoods: not identical
fitAsymmetric1$deviance/-2 
fitAsymmetric2$deviance/-2
fitAutomultinomial1$pseudolikelihood
fitAutomultinomial2$pseudolikelihood

#comparing beta: not identical
fitAsymmetric1$coefficients[1:2]
fitAsymmetric2$coefficients[1:2]
fitAutomultinomial1$betaHat[1:2]
fitAutomultinomial2$betaHat[1:2]

#comparing gamma: not identical. Asymmetric parameterization not invariant to
#choice of reference category. Automultinomial parameterization invariant to 
#choice of reference category
gammaAsymmetric1=fitAsymmetric1$coefficients[3]
gammaAsymmetric2=fitAsymmetric2$coefficients[3]
gammaAutomultinomial1=fitAutomultinomial1$gammaHat
gammaAutomultinomial2=fitAutomultinomial2$gammaHat
gammaAsymmetric1
gammaAsymmetric2
gammaAutomultinomial1
gammaAutomultinomial2


Try the automultinomial package in your browser

Any scripts or data that you put into this service are public.

automultinomial documentation built on May 2, 2019, 7:12 a.m.