suppressPackageStartupMessages(library("sparseMVN")) suppressPackageStartupMessages(library("Matrix")) suppressPackageStartupMessages(library("mvtnorm")) suppressPackageStartupMessages(library("tidyverse")) suppressPackageStartupMessages(library("kableExtra")) knitr::opts_chunk$set(prompt=TRUE, cache=FALSE,error=FALSE, echo=FALSE, eval=TRUE, comment="#", collapse=FALSE) #$ options(replace.assign=TRUE, width=77, digits=4, xtable.include.rownames=FALSE, dplyr.summarise.inform=FALSE) sanitize <- function(x) x
The mvtnorm package [@R_mvtnorm] provides the dmvnorm() function to compute the density of a multivariate normal (MVN) distribution, and the rmvnorm() function to simulate MVN random variables. These functions require the user to supply a full, "dense" covariance matrix; if starting with a precision matrix, the user must first invert it explicitly. This covariance matrix is dense in the sense that, for an $M$-dimensional MVN random variable, all $M^2$ elements are stored, so memory requirements grow quadratically with the size of the problem. Internally, both functions factor the covariance matrix using a Cholesky decomposition, whose complexity is $\mathcal{O}!\left(M^3\right)$[@GolubVanLoan1996].[^1] This factorization is performed every time the function is called, even if the covariance matrix does not change from call to call. Also, the internal algorithm in rmvnorm involves multiplication of a triangular matrix, and for dmvnorm(), involves solving a triangular linear system. Both of these operations are $\mathcal{O}!\left(M^2\right)$ [@GolubVanLoan1996] on dense matrices. MVN functions in other packages, such as [@R_MASS] and [@R_LaplacesDemon], face similar limitations.[^2] Thus, existing tools for working with the MVN distribution in are not practical for high-dimensional MVN random variables.
However, for many applications the covariance or precision matrix is sparse, meaning that the proportion of nonzero elements is small, relative to the total size of the matrix. The functions in the sparseMVN
package exploit that sparsity to reduce memory requirements, and to gain computational efficiencies. The dmvn.sparse()
function computes the MVN density, and the rmvn.sparse()
function samples from an MVN distribution. Instead of requiring the user to supply a dense covariance matrix, rmvn.sparse()
and dmvn.sparse()
accept a pre-computed Cholesky decomposition of either the covariance or precision matrix in a compressed sparse format. This approach has several advantages:
Memory requirements are smaller because only the nonzero elements of the matrix are stored in a compressed sparse format.
Linear algebra algorithms that are optimized for sparse matrices are more efficient because they avoid operations on matrix elements that are known to be zero.
When the precision matrix is initially available, the user can avoid the need to explicitly invert it into a covariance matrix. This feature of preserves sparsity, because the inverse of a sparse matrix is not necessarily sparse.
The Cholesky factor of the matrix is computed once, before the first function call, and is not repeated with subsequent calls (as long as the matrix does not change).
The functions in sparseMVN
rely on sparse matrix classes and functions defined in the Matrix package. The user creates the covariance or precision matrix as a sparse, symmetric dsCMatrix matrix, and computes the sparse Cholesky factor using the Matrix::Cholesky() function. Other than ensuring that the factor for the covariance or precision matrix is in the correct format, the sparseMVN
functions behave in much the same way as the corresponding mvtnorm
functions. Internally, sparseMVN
uses standard methods of computing the MVN density and simulating MVN random variables (see Section 1.1{reference-type="ref" reference="sec:algorithms"}). Since a large proportion of elements in the matrix are zero, we need to store only the row and column indices, and the values, of the unique nonzero elements. The efficiency gains in sparseMVN come from storing the covariance or precision matrix in a compressed format without explicit zeros, and applying linear algebra routines that are optimized for those sparse matrix structures. Matrix calls sparse linear algebra routines that are implemented in the CHOLMOD
library [@ChenDavis2008; @DavisHager1999; @DavisHager2009].
Let $x\in\mathbb{R}^{M}$ be a realization of random variable $X\sim\mathbf{MVN}!\left(\mu,\Sigma\right)$, where $\mu\in\mathbb{R}^{M}$ is a vector, $\Sigma\in\mathbb{R}^{M\times M}$ is a positive-definite covariance matrix, and $\Sigma^{-1}\in\mathbb{R}^{M\times M}$ is a positive-definite precision matrix.
The log probability density of $x$ is
$$\begin{aligned} \log f(x)&=-\frac{1}{2}\left(M \log (2\pi) + \log|\Sigma| +z^\top z\right),\quad\text{where}~z^\top z=\left(x-\mu\right)^\top\Sigma^{-1}\left(x-\mu\right) \end{aligned}$$
The two computationally intensive steps in evaluating $\log f(x)$ are computing $\log|\Sigma|$, and $z^\top z$, without explicitly inverting $\Sigma$ or repeating mathematical operations. But once we have $z$, computing the dot product $z^\top z$ is cheap. How one these steps efficiently in practice depends on whether the covariance matrix $\Sigma$, or the precision matrix $\Sigma^{-1}$ is available. For both cases, we start by finding a lower triangular matrix root: $\Sigma=LL^\top$ or $\Sigma^{-1}=\Lambda\Lambda^\top$. Since $\Sigma$ and $\Sigma^{-1}$ are positive definite, we will use the Cholesky decomposition, which is the unique matrix root with all positive elements on the diagonal.
With the Cholesky decomposition in hand, we compute the log determinant of $\Sigma$ by adding the logs of the diagonal elements of the factors. $$\begin{aligned} \label{eq:logDet} \log|\Sigma|= \begin{cases} \phantom{-}2\sum_{m=1}^M\log L_{mm}&\text{ when $\Sigma$ is given}\ -2\sum_{m=1}^M\log \Lambda_{mm}&\text{ when $\Sigma^{-1}$ is given} \end{cases}\end{aligned}$$
Having already computed the triangular matrix roots also speeds up the computation of $z^\top z$. If $\Sigma^{-1}$ is given, $z=\Lambda^\top(x-\mu)$ can be computed efficiently as the product of an upper triangular matrix and a vector. When $\Sigma$ is given, we find $z$ by solving the lower triangular system $Lz=x-\mu$. The subsequent $z^\top z$ computation is trivially fast.
The algorithm for simulating $X\sim\mathbf{MVN}!\left(\mu,\Sigma\right)$ also depends on whether $\Sigma$ or $\Sigma^{-1}$ is given. As above, we start by computing the Cholesky decomposition of the given covariance or precision matrix. Define a random variable $Z\sim\mathbf{MVN}!\left(0,I_M\right)$, and generate a realization $z$ as a vector of $M$ samples from a standard normal distribution. If $\Sigma$ is given, then evaluate $x=Lz+\mu$. If $\Sigma^{-1}$ is given, then solve for $x$ in the triangular linear system $\Lambda^\top\left(x-\mu\right)=z$. The resulting $x$ is a sample from $\mathbf{MVN}!\left(\mu,\Sigma\right)$. The following shows that this approach is correct:
$$\begin{aligned} \mathbf{E}!\left(X\right)&=\mathbf{E}!\left(LZ+\mu\right)=\mathbf{E}!\left(\Lambda^\top Z+\mu\right)=\mu\ \mathbf{cov}!\left(X\right)&= \mathbf{cov}!\left(LZ+\mu\right)=\mathbf{E}!\left(LZZ^\top L^\top\right)=LL^\top=\Sigma\ \mathbf{cov}!\left(X\right)&=\mathbf{cov}!\left((\Lambda^\top)^{-1}Z+\mu\right)=\mathbf{E}!\left((\Lambda^\top)^{-1}ZZ^\top\Lambda^{-1}\right) =(\Lambda^\top)^{-1}\Lambda^{-1}=(\Lambda\Lambda^\top)^{-1}=\Sigma \end{aligned}$$
These algorithms apply when the covariance/precision matrix is either sparse or dense. When the matrix is dense, the computational complexity is $\mathcal{O}!\left(M^3\right)$ for a Cholesky decomposition, and $\mathcal{O}!\left(M^2\right)$ for either solving the triangular linear system or multiplying a triangular matrix by another matrix [@GolubVanLoan1996]. Thus, the computational cost grows cubically with $M$ before the decomposition step, and quadratically if the decomposition has already been completed. Additionally, the storage requirement for $\Sigma$ (or $\Sigma^{-1}$) grows quadratically with $M$.
The Matrix package [@R_Matrix] defines various classes for storing sparse matrices in compressed formats. The most important class for our purposes is dsCMatrix, which defines a symmetric matrix, with numeric (double precision) elements, in a column-compressed format. Three vectors define the underlying matrix: the unique nonzero values (just one triangle is needed), the indices in the value vector for the first value in each column, and the indices of the rows in which each value is located. The storage requirements for a sparse $M\times M$ symmetric matrix with $V$ unique nonzero elements in one triangle are for $V$ double precision numbers, $V+M+1$ integers, and some metadata. In contrast, a dense representation of the same matrix stores $M^2$ double precision values, regardless of symmetry and the number of zeros. If $V$ grows more slowly than $M^2$, the matrix becomes increasingly sparse (a smaller percentage of elements are nonzero), with greater efficiency gains from storing the matrix in a compressed sparse format.
To illustrate how sparse matrices require less memory resources when compressed than when stored densely, consider the following example, which borrows heavily from the vignette of the package [@R_sparseHessianFD].
Suppose we have a dataset of $N$ households, each with $T$ opportunities to purchase a particular product. Let $y_i$ be the number of times household $i$ purchases the product, out of the $T$ purchase opportunities, and let $p_i$ be the probability of purchase. The heterogeneous parameter $p_i$ is the same for all $T$ opportunities, so $y_i$ is a binomial random variable.
Let $\beta_i\in\mathbb{R}^{k}$ be a heterogeneous coefficient vector that is specific to household $i$, such that $\beta_i=(\beta_{i1},\dotsc,\beta_{ik})$. Similarly, $w_i\in\mathbb{R}^{k}$ is a vector of household-specific covariates. Define each $p_i$ such that the log odds of $p_i$ is a linear function of $\beta_i$ and $w_i$, but does not depend directly on $\beta_j$ and $w_j$ for another household $j\neq i$: $$\begin{aligned} p_i=\frac{\exp(w_i'\beta_i)}{1+\exp(w_i'\beta_i)},~i=1,\dots, N\end{aligned}$$
The coefficient vectors $\beta_i$ are distributed across the population of households following a MVN distribution with mean $\mu\in\mathbb{R}^{k}$ and covariance $\mathbf{A}\in\mathbb{R}^{k\times k}$. Assume that we know $\mathbf{A}$, but not $\mu$, so we place a multivariate normal prior on $\mu$, with mean $0$ and covariance $\mathbf{\Omega}\in\mathbb{R}^{k\times k}$. Thus, the parameter vector $x\in\mathbb{R}^{(N+1)k}$ consists of the $Nk$ elements in the $N$ $\beta_i$ vectors, and the $k$ elements in $\mu$.
The log posterior density, ignoring any normalization constants, is $$\begin{aligned} \label{eq:LPD} \log \pi(\beta_{1:N},\mu|\mathbf{Y}, \mathbf{W}, \mathbf{A},\mathbf{\Omega})=\sum_{i=1}^N\left(p_i^{y_i}(1-p_i)^{T-y_i} -\frac{1}{2}\left(\beta_i-\mu\right)^\top\mathbf{A}^{-1}\left(\beta_i-\mu\right)\right) -\frac{1}{2}\mu^\top\mathbf{\Omega}^{-1}\mu\end{aligned}$$
Because one element of $\beta_i$ can be correlated with another element of $\beta_i$ (for the same unit), we allow for the cross-partials between elements of $\beta_i$ for any $i$ to be nonzero. Also, because the mean of each $\beta_i$ depends on $\mu$, the cross-partials between $\mu$ and any $\beta_i$ can be nonzero. However, since the $\beta_i$ and $\beta_j$ are independent samples, and the $y_i$ are conditionally independent, the cross-partial derivatives between an element of $\beta_i$ and any element of any $\beta_j$ for $j\neq i$, must be zero. When $N$ is much greater than $k$, there will be many more zero cross-partial derivatives than nonzero, and the Hessian of the log posterior density will be sparse.
N <- 5 k <- 2 p <- k ## dimension of mu nv1 <- N*k+p nels1 <- nv1^2 nnz1 <- N*k^2 + 2*p*N*k + p^2 nnz1LT <- N*k*(k+1)/2 + p*N*k + p*(p+1)/2 Q <- 1000 nv2 <- Q*k+p nels2 <- nv2^2 nnz2 <- Q*k^2 + 2*p*Q*k + p^2 nnz2LT <- Q*k*(k+1)/2 + p*Q*k + p*(p+1)/2 options(scipen=999)
The sparsity pattern depends on how the variables are ordered. One such
ordering is to group all of the coefficients in the $\beta_i$ for each
unit together, and place $\mu$ at the end. $$\begin{aligned}
\beta_{11},\dotsc,\beta_{1k},\beta_{21},\dotsc,\beta_{2k},~\dotsc~,\beta_{N1},\dotsc,\beta_{Nk},\mu_1,\dotsc,\mu_k\end{aligned}$$
In this case, the Hessian has a "block-arrow" pattern. Figure BLOCKARROW illustrates this pattern for $N=r scales::comma(N)
$
and $k=r k
$ (r nv1
total variables).
Mat1a <- as(kronecker(diag(N), matrix(1, k, k)),"sparseMatrix") Mat1a <- rbind(Mat1a, Matrix(1, p, N*k)) Mat1a <- cbind(Mat1a, Matrix(1, k*N+p, p)) tmp_a <- as.matrix(Mat1a) colnames(tmp_a) <- letters[1:NCOL(tmp_a)] tmp_a %>% as_tibble(.name_repair="minimal") %>% mutate(across(colnames(tmp_a), ~ifelse(.x == 1, "|", "."))) %>% kable(col.names=NULL, caption="A block-arrow pattern") %>% kable_paper()
Another possibility is to group coefficients for each covariate together. $$\begin{aligned} \beta_{11},\dotsc,\beta_{N1},\beta_{12},\dotsc,\beta_{N2},~\dotsc~,\beta_{1k},\dotsc,\beta_{Nk},\mu_1,\dotsc,\mu_k\end{aligned}$$ Now the Hessian has an \"banded\" sparsity pattern, as in Figure BANDED.
Mat1b <- kronecker(Matrix(1, k, k), diag(N)) Mat1b <- rbind(Mat1b, Matrix(1, p, N * k)) Mat1b <- cbind(Mat1b, Matrix(1, k*N+p, p)) tmp_b <- as.matrix(Mat1b) colnames(tmp_b) <- letters[1:NCOL(tmp_b)] tmp_b %>% as_tibble(.name_repair="minimal") %>% mutate(across(colnames(tmp_b), ~ifelse(.x == 1, "|", "."))) %>% kable(col.names=NULL, caption="A banded sparsity pattern") %>% kable_paper()
Mat2 <- as(kronecker(diag(Q),matrix(1,k,k)),"lMatrix") %>% rbind(Matrix(TRUE,p,Q*k)) %>% cbind(Matrix(TRUE, k*Q+p, p)) %>% as("dgCMatrix") %>% as("symmetricMatrix") A2 <- as(Mat2,"matrix")
In both cases, the number of nonzeros is the same. There are r nels1
elements in this symmetric matrix. If the
matrix is stored in the standard \pkg{base} \proglang{R} dense
format, memory is reserved for all r nels1
values, even though only r nnz1
values are
nonzero, and only r nnz1LT
values are unique. For larger
matrices, the reduction in memory requirements by storing the matrix
in a sparse format can be substantial.\footnote{Because sparse matrix
structures store row and column indices of the nonzero values, they
may use more memory than dense storage if the total number of
elements is small} If $N=r scales::comma(Q)
$ and $k=r k
$, then
$M=$r scales::comma(nv2)
, with more than $r floor(nels2/10^6)
$ million
elements in the Hessian. However, only r scales::comma(nnz2)
of those elements are
nonzero, with r scales::comma(nnz2LT)
unique values in the lower
triangle. The dense matrix requires r format(object.size(A2),units='Mb')
of RAM,
while a sparse symmetric matrix of the \class{dsCMatrix} class requires only r format(object.size(Mat2),units='Kb')
.
This example is relevant because, when evaluated at the posterior mode, the Hessian matrix of the log posterior is the MVN precision matrix $\Sigma^{-1}$ of a MVN approximation to the posterior distribution of $\left(\beta_{1:N},\mu\right)$. If we were to simulate from this MVN using , or evaluate MVN densities using , we would need to invert $\Sigma^{-1}$ to $\Sigma$ first, and store it as a dense matrix. Internally, and invoke dense linear algebra routines, including matrix factorization.
The signatures of the key sparseMVN functions are
rmvn.sparse(n, mu, CH, prec=TRUE, log=TRUE) dmvn.sparse(x, mu, CH, prec=TRUE, log=TRUE)
The rmvn.sparse() function returns a matrix $x$ with $n$ rows and lenght(mu)
columns. dmvn.sparse()
returns a
vector of length n: densities if log=FALSE
, and log densities if log=TRUE
.
| | |
|-----|-----|
| x |A numeric matrix. Each row is an MVN sample.|
| mu |A numeric vector. The mean of the MVN random variable.|
| CH |Either a dCHMsimpl or dCHMsuper object representing the Cholesky decomposition of the covariance/precision matrix.|
| prec |Logical value that identifies CH as the Cholesky decomposition of either a covariance ($\Sigma$, ) or precision($\Sigma^{-1}$, ) matrix.|
| n |Number of random samples to be generated.|
| log |If log=TRUE
, the log density is returned.|
Table 1{reference-type="ref" reference="tab:args"} describes the function arguments. These functions do require the user to compute the Cholesky decomposition CH
beforehand, but this needs to be done only once (as long as $\Sigma$ or $\Sigma^{-1}$ does not change). CH
should be computed using Matrix::Cholesky(), whose first argument is a sparse symmetric matrix stored as a dsCMatrix object. As far as we know, there is no particular need to deviate from the defaults of the remaining arguments. If Matrix::Cholesky() uses a fill-reducing permutation to compute CH
, the sparseMVN functions will handle that directly, with no additional user intervention required. The chol() function in base R should not be used.
Suppose we want to generate samples from an MVN approximation to the posterior distribution of our example model from Section 1.2{reference-type="ref" reference="sec:sparse"}. sparseMVN includes functions to simulate data for the example (binary.sim()), and to compute the log posterior density (binary.f()), gradient (binary.grad()), and Hessian (binary.hess()). trustOptim::trust.optim() [@R_trustOptim] is a nonlinear optimizer that estimates the curvature of the objective function using a sparse Hessian.
D <- sparseMVN::binary.sim(N=50, k=2, T=50) priors <- list(inv.A=diag(2), inv.Omega=diag(2)) start <- rep(c(-1,1),51) opt <- trustOptim::trust.optim(start, fn=sparseMVN::binary.f, gr=sparseMVN::binary.grad, hs=sparseMVN::binary.hess, data=D, priors=priors, method="Sparse", control=list(function.scale.factor=-1))
The calls to trustOptim::trust.optim() return the posterior mode, and the Hessian evaluated at the mode. These results serve as the mean and the negative precision of the MVN approximation to the posterior distribution of the model.
R <- 100 pm <- opt[["solution"]] H <- -opt[["hessian"]] CH <- Cholesky(H)
We can then sample from the posterior using an MVN approximation, and compute the MVN log density for each sample.
samples <- rmvn.sparse(R, pm, CH, prec=TRUE) logf <- dmvn.sparse(samples, pm, CH, prec=TRUE)
The ability to accept a precision matrix, rather than having to invert
it to a covariance matrix, is a valuable feature of sparseMVN . This is because
the inverse of a sparse matrix is not necessarily sparse. In the
following chunk, we invert the Hessian, and drop zero values to maintain
any remaining sparseness. Note that there are r scales::comma(102^2)
total elements in
the Hessian.
Matrix::nnzero(H) Hinv <- drop0(solve(H)) Matrix::nnzero(Hinv)
Nevertheless, we should check that the log densities from dmvn.sparse() correspond to those that we would get from mvtnorm::dmvnorm().
logf_dense <- dmvnorm(samples, pm, as.matrix(Hinv), log=TRUE) all.equal(logf, logf_dense)
load("runtimes.Rdata")
tab1 <- filter(runtimes, stat %in% c("density","rand")) %>% group_by(N, k, stat, pattern, type) %>% summarize(mean_ms=mean(time/1000000), sd_ms=sd(time/1000000)) %>% tidyr::gather(time, value, c(mean_ms, sd_ms)) %>% reshape2::dcast(N+k+stat+time~pattern+type) tab2 <- filter(runtimes, stat %in% c("chol","solve")) %>% group_by(N, k, stat, pattern, type) %>% summarize(mean_ms=mean(time/1000000), sd_ms=sd(time/1000000)) %>% tidyr::gather(time, value, c(mean_ms, sd_ms)) %>% reshape2::dcast(N+k+time~stat+pattern)
In this section we show the efficiency gains from sparseMVN by comparing the run times between rmvn.sparse() and rmvnorm, and between dmvn.sparse() and dmvnorm . In these tests, we construct covariance/precision matrices with the same structure as the Hessian of the log posterior density of the example model in Section 2.1{reference-type="ref" reference="sec:example"}. Parameters are ordered such that the matrix has a block-arrow pattern.
as in Figure [fig:blockarrow]
The size and sparsity of the test matrices vary through manipulation of the number of blocks ($N$), the size of each block ($k$), and the number of rows/columns in the margin (also $k$). Each test matrix has $(N+1)k$ rows and columns. Table 2{reference-type="ref" reference="tab:cases"} summarizes the case conditions.
## tmp <- c("\\multirow{8}{*}{k=2}",rep(NA,7), ## "\\multirow{8}{*}{k=4}",rep(NA,7)) cases %>% select(k,N, nvars,nels, nnz, nnzLT, pct.nnz) %>% kable(digits=c(rep(0,6),3), format.args=list(big.mark=","), col.names=c('k', 'N', 'variables', 'elements', 'full', 'lower tri', 'sparsity')) %>% kable_paper() %>% add_header_above(c(" "=4, "nonzeros"=3)) ## xtable::xtable(digits=c(rep(0,7),3)) %>% ## print(include.rownames=FALSE,only.contents=TRUE, ## include.colnames=FALSE, ## floating=FALSE, sanitize.colnames.function=sanitize, ## sanitize.text.function=sanitize, ## hline.after=8, ## ## -
tmp <- filter(tab1, N==min(tab1[['N']]) & k==min(tab1[['k']]) & time=="mean_ms" & stat=="density") sm <- with(tmp, c(dense_cov,sparse_cov))
Figure~\ref{fig:densRand} compares mean run times to compute 1,000 MVN densities, and generate 1,000
MVN samples, using rmvn.sparse() and dmvn.sparse() from sparseMVN, and
mvtnorm::dmvnorm() and mvtnorm::rmvnorm(). Times were collected over 200 replications on a
2013-vintage Mac Pro with a 12-core 2.7 GHz Intel Xeon E5 processor
and 64 GB of RAM.
The times for \pkg{mvtnorm} are faster than
\pkg{sparseMVN} for low dimensional cases ($N\leq 50$), but grow
quadratically in the number of variables.\footnote{As an example, in
the $N=r min(tab1[['N']])
$, $k=r min(tab1[['k']])
$ case, the mean time to compute 1,000 MVN
densities is r format(sm[1],digits=2)
ms using dmvnorm, but more
than r format(sm[2],digits=2)
ms using dmvn.sparse().} This is because the number
of elements stored in a dense covariance matrix grows quadratically
with the number of variables. In this example,
storage and computation requirements for the sparse matrix grow linearly with the number
of variables, so the sparseMVN
run times grow linearly as
well \citep[sec.~4]{BraunDamien2016}. The comparative advantage of sparseMVN
increases with
the sparsity of the covariance matrix.[^5]
DF1 <- filter(tab1, time=="mean_ms") %>% mutate(stat=plyr::revalue(stat, c(density="density", rand="random"))) %>% rename(dense=dense_cov, sparse=sparse_cov) %>% tidyr::gather(pattern, value, c(dense, sparse)) P1 <- ggplot(DF1, aes(x=N, y=value, color=pattern, shape=pattern, linetype=pattern)) + geom_line() + geom_point(size=2) + scale_x_continuous("Number of blocks (N)") + scale_y_continuous("Computation time (milliseconds)", labels=scales::comma) + scale_color_manual("Pattern",values=c(dense='red', sparse='blue')) + scale_shape("Pattern") + scale_linetype("Pattern") + facet_grid(stat~k, scales="free_y", labeller=label_bquote(cols = k==.(k))) + theme_bw() + theme(strip.background=element_rect(fill='white')) print(P1)
The sparseMVN functions always require a sparse Cholesky decomposition of the covariance or precision matrix, and the mvtnorm
functions require a dense precision matrix to be inverted into a dense covariance matrix. Figure 2{reference-type="ref" reference="fig:cholSolve"} compares the computation times of these preparatory steps. There are three cases to consider: inverting a dense matrix using the solve() function, decomposing a sparse matrix using Matrix::Cholesky(), and decomposing a dense matrix using chol(). Applying chol() to a dense function is not a required operation in advance of calling dmvnorm or rmvnorm, but those functions will invoke some kind of decomposition internally. We include it in our comparison because it comprises a substantial part of the computation time. The decomposition and inversion operations on the dense matrices grow cubically as the size of the matrix increases. The sparse Cholesky decomposition time is negligible. For example, the mean run time for the $N=500$, $k=4$ case is about 0.39 ms.
fig.cap="Computation time for Cholesky decomposition of sparse and dense matrices, and inversion of dense matrices."
DF2 <- filter(tab2, time=="mean_ms") %>% rename(`dense inversion`=solve_dense, `dense Cholesky`=chol_dense, `sparse Cholesky`=chol_sparse) %>% tidyr::gather(pattern, value, c(`dense inversion`,`sparse Cholesky`,`dense Cholesky`)) P2 <- ggplot(DF2, aes(x=N, y=value, color=pattern, shape=pattern, linetype=pattern)) + geom_line() + geom_point(size=2) + scale_x_continuous("Number of blocks (N)") + scale_y_continuous("Computation time (milliseconds)", labels=scales::comma) + scale_color_manual("Pattern/Operation", values=c(`dense Cholesky`='red', `sparse Cholesky`='blue', `dense inversion`='black')) %>% scale_shape("Pattern/Operation") + scale_linetype("Pattern/Operation") + facet_grid(.~k, scales="free_y", labeller=label_bquote(cols = k==.(k))) + theme_bw() + theme(strip.background=element_rect(fill='white')) print(P2)
[^1]: has options for eigen and singular value decompositions. These are both $\mathcal{O}!\left(M^3\right)$ as well.
[^2]: LaplacesDemon does offer options for the user to supply pre-factored covariance and precision matrices. This avoids repeated calls to the $\mathcal{O}!\left(M^3\right)$ factorization step, but not the $\mathcal{O}!\left(M^2\right)$ matrix multiplication and linear system solution steps.
[^3]: Because sparse matrix structures store row and column indices of the nonzero values, they may use more memory than dense storage if the total number of elements is small
[^4]: As an example, in the $N=10$, $k=2$ case, the mean time to compute 1,000 MVN densities is 1.1 ms using , but more than 3.7 ms using .
[^5]: Across all cases there was hardly any difference in the run times when providing the precision matrix instead of the covariance.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.