lmrm: Bayesian semiparametric modelling of covariance matrices for...

View source: R/bnpLongMulti.R

lmrmR Documentation

Bayesian semiparametric modelling of covariance matrices for multivariate longitudinal data

Description

Implements an MCMC algorithm for posterior sampling based on a semiparametric model for continuous longitudinal multivariate responses. The overall model consists of 5 regression submodels and it utilizes spike-slab priors for variable selection and function regularization. See ‘Details’ section for a full description of the model.

Usage

lmrm(formula, data = list(), centre=TRUE, id, time,
sweeps, burn = 0, thin = 1, seed, StorageDir,
c.betaPrior = "IG(0.5,0.5*n*p)", pi.muPrior = "Beta(1,1)", 
c.alphaPrior = "IG(1.1,1.1)", pi.phiPrior = "Beta(1,1)", c.psiPrior = "HN(2)",
sigmaPrior = "HN(2)", pi.sigmaPrior = "Beta(1,1)", 
corr.Model = c("common", nClust = 1), DP.concPrior = "Gamma(5,2)",
c.etaPrior = "IG(0.5,0.5*samp)", pi.nuPrior = "Beta(1,1)", 
pi.fiPrior = "Beta(1,1)", c.omegaPrior = "IG(1.1,1.1)", sigmaCorPrior = "HN(2)",
tuneCalpha, tuneSigma2, tuneCbeta, tuneAlpha, tuneSigma2R, tuneR, tuneCpsi, 
tuneCbCor, tuneOmega, tuneComega, tau, FT = 1,...)

Arguments

formula

a formula defining the responses and the covariates in the 5 regression models e.g. y1 | y2 ~ x | w | z | t | t or for smooth effects y1 | y2 ~ sm(x) | sm(w) | sm(z) | sm(t) | sm(t). The package uses the extended formula notation, where the responses are defined on the left of ~ and the models on the right.

data

a data frame.

centre

Binary indicator. If set equal to TRUE, the design matrices are centred, to have column mean equl to zero, otherwise, if set to FALSE, the columns are not centred.

id

identifiers of the individuals or other sampling units that are observed over time.

time

a vector input that specifies the time of observation

sweeps

total number of posterior samples, including those discarded in burn-in period (see argument burn) and those discarded by the thinning process (see argument thin).

burn

length of burn-in period.

thin

thinning parameter.

seed

optional seed for the random generator.

StorageDir

a required directory to store files with the posterior samples of models parameters.

c.betaPrior

The inverse Gamma prior of c_{\beta}. The default is "IG(0.5,0.5*n*p)", that is, an inverse Gamma with parameters 1/2 and np/2, where n is the number of sampling units and p is the length of the response vector.

pi.muPrior

The Beta prior of \pi_{\mu}. The default is "Beta(1,1)". It can be of dimension 1, of dimension K (the number of effects that enter the mean model), or of dimension p K

c.alphaPrior

The inverse Gamma prior of c_{\alpha}^2. The default is "IG(1.1,1.1)". Half-normal priors for c_{\alpha} are also available, declared using "HN(a)", where "a" is a positive number. It can be of dimension 1 or p (the length of the multivariate response).

pi.phiPrior

The Beta prior of \pi_{\phi}. The default is "Beta(1,1)". It can be of dimension 1, of dimension B (the number of effects that enter the dependence model), or of dimension p^2 B

c.psiPrior

The prior of c_{\psi}^2. The default is "HN(2)", a half-normal prior for c_{\psi} with variance equal to two, c_{\psi} \sim N(0,2) I[c_{\psi} > 0]. Inverse Gamma priors for c_{\psi}^2 are also available, declared using "IG(a,b)". It can be of dimension 1 or p^2 (the number of dependence models).

sigmaPrior

The prior of \sigma_k^2, k=1,\dots,p. The default is "HN(2)", a half-normal prior for \sigma_k with variance equal to two, \sigma_k \sim N(0,2) I[\sigma>0]. Inverse Gamma priors for \sigma_k^2 are also available, declared using "IG(a,b)". It can be of dimension 1 or p (the length of the multivariate response).

pi.sigmaPrior

The Beta prior of \pi_{\sigma}. The default is "Beta(1,1)". It can be of dimension 1, of dimension L (the number of effects that enter the variance model), or of dimension pL

corr.Model

Specifies the model for the correlation matrices R_t. The three choices supported are "common", that specifies a common correlations model, "groupC", that specifies a grouped correlations model, and "groupV", that specifies a grouped variables model. When the model chosen is either "groupC" or "groupV", the upper limit on the number of clusters can also be specified, using corr.Model = c("groupC", nClust = d) or corr.Model = c("groupV", nClust = p). If the number of clusters is left unspecified, for the "groupV" model, it is taken to be p, the number of responses. For the "groupC" model, it is taken to be d = p(p-1)/2, the number of free elements in the correlation matrices.

DP.concPrior

The Gamma prior for the Dirichlet process concentration parameter.

c.etaPrior

The inverse Gamma prior of c_{\eta}. The default is "IG(0.5,0.5*samp)", that is, an inverse Gamma with parameters 1/2 and samp/2, where samp is the number of correlations observed over time, that is $samp=M*d$ where $M$ is the number of unique observation time points and $d$ is the number of non-redundant elements of $R$.

pi.nuPrior

The Beta prior of \pi_{\nu}. The default is "Beta(1,1)". It can be of dimension 1.

pi.fiPrior

The Beta prior of \pi_{\varphi}. The default is "Beta(1,1)". It can be of dimension 1.

c.omegaPrior

The prior of c_{\omega}^2. The default is "HN(2)", a half-normal prior for c_{\omega} with variance equal to two, c_{\omega} \sim N(0,2) I[c_{\omega} > 0]. Inverse Gamma priors for c_{\omega}^2 are also available, declared using "IG(a,b)". It can be of dimension 1.

sigmaCorPrior

The prior of \sigma^2. The default is "HN(2)", a half-normal prior for \sigma^2 with variance equal to two, \sigma \sim N(0,2) I[\sigma > 0]. Inverse Gamma priors for \sigma^2 are also available, declared using "IG(a,b)". It can be of dimension 1.

tuneCalpha

Starting value of the tuning parameter for sampling c_{\alpha k}, k=1,\dots,p. Defaults at a vector of $p$ ones. It could be of dimension p.

tuneSigma2

Starting value of the tuning parameter for sampling \sigma^2_k, k=1,\dots,p. Defaults at a vector of $p$ ones. It could be of dimension p.

tuneCbeta

Starting value of the tuning parameter for sampling c_{\beta}. Defaults at 100.

tuneAlpha

Starting value of the tuning parameter for sampling regression coefficients of the variance models \alpha_k, k=1,\dots,p. Defaults at a vector of 5s. It could be of dimension L p

tuneSigma2R

Starting value of the tuning parameter for sampling \sigma^2. Defaults at 1.

tuneR

Starting value of the tuning parameter for sampling correlation matrices. Defaults at 40*(p+2)^3. Can be of dimension 1 or M is the number of unique observation time points.

tuneCpsi

Starting value of the tuning parameter for sampling variances c_{\psi}^2. Defaults at 5. Can be of dimension 1 or p^2

.

tuneCbCor

Starting value of the tuning parameter for sampling c_{\eta}^2. Defaults at 10.

tuneOmega

Starting value of the tuning parameter for sampling regression coefficients of the variance models \omega. Defaults at 5.

tuneComega

Starting value of the tuning parameter for sampling c_{\omega}. Defaults at 1.

tau

The tau of the shadow prior. Defaults at 0.01.

FT

Binary indicator. If set equal to 1, the Fisher's z transform of the correlations is modelled, otherwise if set equal to 0, the untransformed correlations are modelled.

...

Other options that will be ignored.

Details

Function lmrm returns samples from the posterior distributions of the parameters of a regression model with normally distributed multivariate longitudinal responses. To describe the model, let Y_{ij} = (Y_{ij1},\dots,Y_{ijp})^{\top} denote the vector of p responses observed on individual i, i=1,\dots,n, at time point t_{ij}, j=1,\dots,n_i. The observational time points t_{ij} are allowed to be unequally spaced. Further, let u_{ij} denote the covariate vector that is observed along with Y_{ij} and that may include time, other time-dependent covariates and time-independent ones. In addition, let Y_{i}=(Y_{i1}^{\top},\dots,Y_{in_i}^{\top})^{\top} denote the ith response vector. With \mu_i=E(Y_{i}) and \Sigma_i = cov(Y_i), the model assumes multivariate normality, Y_{i} \sim N(\mu_i, \Sigma_i), i=1,2,\dots,n. The means \mu_i and covariance matrices \Sigma_i are modelled semiparametrically in terms of covariates. For the means one can specify semiparametric models,

\mu_{ijk} = \beta_{k0} + \sum_{l=1}^{K_1} u_{ijl} \beta_{kl} + \sum_{l=K_1+1}^{K} f_{\mu,k,l}(u_{ijl}).

This is the first of the 5 regression submodels.

To model the covariance matrix, first consider the modified Cholesky decomposition, L_i \Sigma_i L_i^{\top} = D_i, where L_i is a unit block lower triangular matrix and D_i is a block diagonal matrix,

\begin{array}{cc} L_i = \left[ \begin{array}{cccc} I & 0 & \dots & 0 \\ -\Phi_{i21} & I & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ -\Phi_{in_i1} & -\Phi_{in_i1} & \dots & I \\ \end{array} \right], & D_i = \left[ \begin{array}{cccc} D_{1} & 0 & \dots & 0 \\ 0 & D_{2} & 0 & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & D_{n_i} \\ \end{array} \right], \end{array}

For modelling D_{ij}, i=1,\dots,n, j=1,\dots,n_i in terms of covariates, first we separate the variances and the correlations D_{ij} = S_{ij}^{1/2} R_{ij} S_{ij}^{1/2}. It is easy to model matrix S_{ij} in terms of covariates as the only requirement on its diagonal elements is that they are nonnegative,

\log \sigma^2_{ijk} = \alpha_{k0} + \sum_{l=1}^{L_1} w_{ijl} \alpha_{kl} + \sum_{l=L_1+1}^{L} f_{\sigma,k,l}(w_{ijl})

This is the second of the 5 regression submodels.

For \phi_{ijklm}, the (l,m) element of \Phi_{ijk}, l,m=1,\dots,p, one can specify semiparametric models

\phi_{ijklm} = \psi_{lm0} + \sum_{b=1}^{B_1} v_{ijkb} \psi_{lmb} + \sum_{b=B_1+1}^{B} f_{\phi,l,m,b}(v_{ijkb})

This is the third of the 5 regression submodels.

The elements of the correlations matrices R_{ij} are modelled in terms of covariate time only, hence they are denoted by R_t. Subject to the positive definiteness constraint, the elements of R_t are modelled using a normal distribution with location and scale parameters, \mu_{ct} and \sigma^2_{ct}, modelled as

\mu_{ct} = \eta_0 + f_{\mu}(t),

\log \sigma^2_{ct} = \omega_0 + f_{\sigma}(t),

and these are the last 2 of the 5 submodels.

Value

Function lmrm returns samples from the posteriors of the model parameters.

Author(s)

Georgios Papageorgiou gpapageo@gmail.com

References

Papageorgiou, G. (2020). Bayesian semiparametric modelling of covariance matrices for multivariate longitudinal data. arXiv:2012.09833.

Examples

	# Fit a joint mean-covariance model on the simulated dataset simD2
	require(ggplot2)
	data(simD2)
	model <- Y1 | Y2 ~ time | sm(time) | sm(lag) | sm(time) | 1 
	# the above defines the responses and the regression models on the left and 
	# right of "~", respectively 
	# the first model, for the mean, is a linear function of time, this is sufficient as 
	# the 2 responses have constant mean.
	# the second model, for the variances, is a smooth function of time
	# the third model, for the dependence structure, is a smooth function of lag, 
	# that lmrm figures out and it does not need to be computed by the user
	# the fourth model, for location of the correlations, is a smooth function of time
	# the fifth model, for scale of the correlations, is just an intercept model
	## Not run: 
	m1 <- lmrm(formula = model, corr.Model = c("common", nClust = 1), data = simD2,
		       id = id, time = time, sweeps = 2500, burn = 500, thin = 2, 
		       StorageDir = getwd(), seed = 1)
	plot(m1)
	
## End(Not run)

BNSP documentation built on May 31, 2023, 7:05 p.m.