knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(BayesMVProbit)

A. Data augmentation Gibbs sampling for Multivariate Multinomial Probit model

Reference link of the paper

Bayesian Analysis of Multivariate Nominal Measures Using Multivariate Multinomial Probit Models; Xiao Zhang, W. John Boscardin, and Thomas R. Belin

Data structure

Suppose for each subject i, $i = 1,..,n$ there are g nominal measures, the first with $p_1$ levels, the next with $p_2$ levels, and so on up to the last with $p_g$ levels.

The data looks like,

$$ y_i = \begin{bmatrix} y_{11i}\ y_{12i}\ .\ y_{1p_1i}\ .\ .\ y_{g1i}\ .\ y_{g{p_g}i} \end{bmatrix} $$

We consider $y_{qji} = 1$ if i th subject has j th outcome in q th nominal measure, $y_{qji} = 0$ otherwise.

We assume each of these g nominal measures follows an Multinomial Probit (MNP) model.

To consider additive and multiplicative redundancy problem we assume, for the q-th measure, q = 1, ., g, there is a $p_{q-1}$ dimensional latent variable $z_{qi} = (z_{q1i},.. ,z_{q{p_q}i})$

$$z_{qji} = \beta_{qj}^t x_{qi} + \epsilon_{qji}$$ where $\beta_{qj}^t$ is the vector of regression coefficient for jth level of qth nominal measure and $x_{qi}$ be covariate vector for q th level and for ith subject and $\epsilon_{qji}$ is the corresponding error term.

$$z_{qi} = B_q x_{qi} + \epsilon_{qi}$$ where $B_q$ is the matrix of coefficients for qth nominal measure

Alternatively, for making computation easier, we write $\beta_q$ matrix in vector form.

$$z_{qi} = X_{qi} \beta_q + \epsilon_{qi} $$ $\beta_q$ is the vectorized form of $B_q$ matrix, (row wise stacked) where and $X_{qi} = I_{p_q-1} \otimes x_{qi}^t$ is matrix of covariates.

Here, $\epsilon_i \sim N(0, \Sigma)$

For compact notation, Now we define, $d_i = (d_{1i}, ., d_{gi})$ denote the index vector of the alternatives the i-th subject chooses for the g measures.

$d_{qi} = j \iff y_{qji} =1$ for j = 1, 2, .. , $p_q - 1$ and $d_{qi} = 0 \iff y_{qji} = p_q$

$d_{qi} = 0$ if $max_{1 \leq l \leq p_q -1} \quad z_{qli} < 0$

$d_{qi} = j$ if $max_{1 \leq l \leq p_q 1} \quad z_{qli} = z_{qji} > 0$

for i = 1, .. ,n, and q = 1, .. ,g

Data generation example [Binary data : 2 nominal measures, respectively have $p_1 = 3$ and $p_2 = 4$ levels]

variable_dim = 2 # no of variables
category = c(3,4) # no of levels for each variable
n = 50 # no of subjects
covariate_num = c(2,3) # no of covariates for each level

z_dim = sum(category) - length(category)  ## dimension of z for each subject
beta_dim = sum((category - 1) * covariate_num)

set.seed(1287)

beta_act1 = matrix(rep(2, (category[1]-1) * covariate_num[1]), nrow = category[1]-1)
  ## matrix of regression coefficient for 1st nominal measure
beta_act2 = matrix(rep(2, (category[2]-1) * covariate_num[2]), nrow = category[2]-1)
 ## matrix of regression coefficient for 1st nominal measure
beta_act = as.vector(c(beta_act1, beta_act2)) ## vectorized form of beta
## should be of same length  as beta_dim

x1_mat = matrix(rnorm(n * covariate_num[1]), nrow = covariate_num[1] ) # each column is for each subject, each row is for each  covariates
x2_mat = matrix(rnorm(n * covariate_num[2]), nrow = covariate_num[2] ) # each column is for each subject, each row is for each  covariates
x_list = list(x1_mat, x2_mat)  # input should be given as list 

z1_mat = beta_act1 %*% x1_mat 
z2_mat = beta_act2 %*% x2_mat

Sigma matrix (variance covariance matrix for error) should be defined by user. In this case Sigma matrix is considered as mentioned in the paper(mentioned in Readme)

#GEneration of sigma matrix 

sig1 = matrix(c(1, 0.36, 0.36, 0.81), nrow = 2)
sig2 = matrix(c(1, 0.45, 0.24, 0.45, 0.81, 0.41, 0.24, 0.41, 0.9), nrow =3)
sig3 = matrix(c(rep(0.2, 6)), nrow =2)
sig4 = matrix(c(rep(0.2, 6)), nrow =3)
sig_gen = as.matrix(rbind(cbind(sig1, sig3), cbind(sig4, sig2))) ## Final sigma 

Example of Matrix for variance covariance matrix for error (user should provide):

sig_gen
# GEneration of error 

e_mat = MASS::mvrnorm(n , mu = rep(0, z_dim) , Sigma = sig_gen )  # Generation of error term
e_mat = t(e_mat) # each column is for each subject

#Generation of  Z vectors for each subject, each column is for each subject
z_mat = matrix(rep(0, n * z_dim), ncol = n) ## each column is for each subject

for(i in 1: n){
  z_mat[, i] = c(z1_mat[, i], z2_mat[, i]) + e_mat[, i]

}


#Computation of d matrix ## user should provide this input

d = matrix(rep(0, n * length(category)), ncol = n)  # each col corresponds to each subject
q = rep(1, length(category) + 1)  ## the indicator to compute d matrix (as mentioned in the paper)
q[1] = 1
category_cum_sum = cumsum(category)

for(g in 2: length(q))
{
  q[g] = category_cum_sum[g-1] - (g-1 - 1)    ## ex: p1 + p2 + p3 - 2
}


for(i in 1: n)
{ 
  for(j in 1:variable_dim )
  {
    x = z_mat[ q[j] : (q[j+1] - 1) , i]
    # print(x)

    if(max(x) < 0)
    {
      d[j, i] = 0
    }
    if(max(x) >= 0)
    {
      d[j, i] = which.max(x)
    }
  }
}

Example of d matrix (user should provide):

d

Prior specification and likelihood

Prior: $\beta \sim N(\mu, C)$

Likelihood: $Z| \beta \sim N (X \beta, \Sigma)$

where Z is a vector of length $n* length(z_i) \times 1$ we get after stacking $z_i s$ from i = 1 (1) n.

and X is a matrix of order $n * nrows(X_i) \times ncols(x_i)$ we get after stacking $X_i s$ from i = 1,2, .., n.

Posterior

Posterior of $\beta$ : $\beta |Z, d \sim N(Q^{-1}b, Q^{-1})$ where $Q = (X^t \Sigma^{-1} X) + C^{-1}$ and and $b = X^t Z$

Posterior of Z (Holmes Held method):

$Z|d \sim N(\nu, \Sigma + X C X^t)$ truncated to $\xi = \xi_1 \otimes \xi_2 .... \otimes \xi_n$

$\xi_i = f_i z_i > 0$ where $\quad z_i = (z_{11i}, z_{12i},z_{21i}, z_{22i},z_{23i})^t \quad$ is a vector of length 5 and $f_i$ is a $5 \times 5$ matrix, whose first 2x2 diagonal matrix would from $z_{1i}$ and the second 3x3 matrix would be from $z_{2i}$, the rest of the elements would be zero.

Here the possible outcomes for p2= 4 is discussed and only the lower 3x3 matrix is shown here.

p2 = 4; Case-1

$d_{2i} = 0$ if $max(z_{21i}, z_{22i},z_{23i}) < 0$

$$f_{2i} = \begin{bmatrix} -1 & 0 & 0\ 0 & -1 & 0\ 0 & 0 & -1\ \end{bmatrix} $$

case-2

$d_{2i} = 1$ if $z_{21i} > max(z_{22i},z_{23i})$and also $z_{21i} > 0$

$$f_{2i} = \begin{bmatrix} 1 & 0 & 0\ 1 & -1 & 0\ 1 & 0 & -1\ \end{bmatrix} $$

case-3

$d_{2i} = 1$ if $z_{22i} > max(z_{21i},z_{23i})$ and also $z_{22i} > 0$

$$f_{2i} = \begin{bmatrix} -1 & 1 & 0\ 0 & 1 & 0\ 0 & 1 & -1\ \end{bmatrix} $$

case-4

$d_{2i} = 3$ if $z_{23i} > max(z_{21i},z_{22i})$ and also $z_{23i} > 0$

$$f_{2i} = \begin{bmatrix} -1 & 0 & 1\ 0 & -1 & 1\ 0 & 0 & 1\ \end{bmatrix} $$

Interpretation of the results:

The function 'nominal_post_beta' returns the following outputs:

  1. Posterior_mean: It gives posterior mean of beta in vector form indicating the ordinal measure and the corresponding covariate of the ordinal measure

  2. Credible_interval: It gives credible interval for beta vector at specified level (user input otherwise bydefault at 95%)

  3. trace_plot: Displays a plot of iterations vs. sampled values for each variable in the chain, with a separate plot per variable. It indicates about how well the mixing of posterior beta and z .

  4. density_plot: A density plot shows the distribution of variables, with a separate plot per variable. As normal prior is considered, so it is expected that density plot for each variable would be like normal distribution with mode at posterior mean.

  5. carter_plot: Creates plots of credible intervals for parameters from an MCMC simulation. It should match with the credible_interval outputs

Interpretation of the specific example:

Input Variables:

sig = sig_gen

nominal = nominal_post_beta(category = c(3,4), iter = 300, burn = 100, cred_level = 0.95, x_list, sig, d, prior_beta_mean = NULL, prior_beta_var = NULL)

nominal$Posterior_mean
nominal$Credible_interval

Output Variables

The data is generated from original beta vector all beta = 2

The posterior mean also around 2

95% credible interval is also around 2

Carter plot is also provided.

Conclusion

Hence the model and code well represent the original data.

-------------------------------------------------------------------------------------------------------------------------------------------------

B. Data augmentation Gibbs sampling for Multivariate Multinomial Probit model

Reference link of the paper

FITTING AND COMPARISON OF MODELS FOR MULTIVARIATE ORDINAL OUTCOMES; Ivan Jeliazkov, Jennifer Graves, Mark Kutzbach

Data structure

Suppose for each subject i, $i = 1,..,n$ there are q ordinal measures, the first with $p_1$ levels, the next with $p_2$ levels, and so on up to the last with $p_g$ levels.

The data looks like,

$$ y_i = \begin{bmatrix} y_{1i}\ y_{2i}\ .\ y_{qi}\ \end{bmatrix} $$

Here we consider q dimesndional vector of latent variables $$z_i = (z_{1i}, z_{2i},..,z_{qi})^t$$ We consider $$z_{ji} = \beta_j^t x_{ji} + \epsilon_{ji}$$ where $\beta_j$ is the vector of regression coefficients for j th level and $x_{ji}^t$ is covariate vector for i th subject for j th level.

Alternatively, $$ z_i = X_i \beta + \epsilon_i$$ where $\beta = (\beta_1, .., \beta_q)$ is a column vector and $X_i$ is a matrix $$X_i = \begin{bmatrix} x_{1i}^t\ \quad x_{2i}^t\ \quad \quad.\ \quad \quad \quad \quad x_{qi}^t\ \end{bmatrix} $$

Now the vector of observed responses $y_i$ is calculated according to the discretization imposed by the variable specific cutpoints, namely

$y_{ki} = j$ if $\gamma_{k,(j-1)} < z_{ki} < \gamma_{kj}$ for i = 1,..,n and k = 1,..,q.

Now, $\epsilon_i \sim N(0,\Sigma)$

Each element of $y_i$ can have different number of categories $J=(J_1, J_2, ..,J_q)^t$ and its own set of cutpoints $\gamma_k = (\gamma_1, ..,\gamma_{J_k})$ for k = 1,..,q

To consider the identification contraints $\gamma_{k0} = -\infty, \gamma_{k1} = 0, \gamma_{J_k} = \infty $ for k = 1,..,q.

Albert Chib(2001) simplified the sampling of cutpoints $\gamma$ by a one-to-one transpormation as follow:

$ \delta_{kj} = ln(\gamma_{kj} - \gamma_{(kj)-1}$ for $2 \leq kj \leq J_k - 1$

Data generation example [Binary data : 2 ordinal measures, respectively have $p_1 = 4$ and $p_2 = 3$ levels]

variable_dim = 2 # no of variables, i.e dimension of yi (can be obtained if y is given)
category = c(4, 3) ## no of categories(levels) for each variable
covariate_num = c(2, 3) # No of covariates for each level
nu_tot = category + 1 ## no of cut points 
nu = nu_tot - 3 ## no of cutpoints to be considered for each variable ( nu0, nu1, nuJ are fixed)
nu_cutoff1 = c(0, 2 , 2.5)  # of length nu + 1
nu_cutoff2 = c(0, 4.5)
beta_dim = sum(covariate_num) # dimension of beta vector

n = 50 # no of subjects

##  Generation of beta

beta_act1 = rep(2, covariate_num[1])
beta_act2 = rep(2,covariate_num[2])
beta_act = c(beta_act1, beta_act2)

set.seed(1287)

# Generation of X

x1 = matrix(rnorm(covariate_num[1]* n), nrow = covariate_num[1])  # each column is for each subject
x2 = matrix(rnorm(covariate_num[2]* n), nrow = covariate_num[2])  # each column is for each subject

x_list = list(x1, x2)

x = lapply(1:n, function(x) matrix(rep(0, variable_dim * beta_dim), nrow = variable_dim) ) ## initialization
a = matrix(rep(0, variable_dim * beta_dim), nrow = variable_dim) ## to store the value in the loop
col_indi = c(0, cumsum(covariate_num))  # of length variable_dim + 1
for(i in 1:n)
{
  for(l in 1: variable_dim)
  {
    a[l, (col_indi[l] + 1) : col_indi[l + 1]] = t(x_list[[l]][,i])
  }
  x[[i]] = a
}


# Generation of error 

sig = diag(variable_dim) ## error covariance matrix of order variable_dim x variable_dim ( to be mentioned in the input)

e = mvtnorm::rmvnorm(n = n, mean = rep(0, variable_dim) , sigma = sig)
e = t(e) # # each column is for each subject


# Generation of  Z

z = matrix(rep(0, variable_dim * n), nrow = variable_dim)  # matrix of order variable_dim x n # # each column is for each subject
for(i in 1:n)
{
  z[,i] =  x[[i]] %*% beta_act + e[, i]
}


## Generation of Y 

cutoff1 = c(-Inf, nu_cutoff1, Inf)  ## To also consider the fixed cutoffs 
cutoff2 = c(-Inf, nu_cutoff2, Inf)
cutoff = list(cutoff1, cutoff2)

y = matrix(rep(0, variable_dim * n), nrow = variable_dim) ## initialization of matrix of order variable_dim x n # each column is for each subject


for(i in 1:n)
{
  for(l in 1:variable_dim)
  {
    for(j in 1: length(cutoff[[l]]))
    {
      if(z[l, i] > cutoff[[l]][j] && z[l, i] <= cutoff[[l]][j + 1]) # making one side inclusive
        y[l, i] = j
    }
  }
}

Example of y matrix (user should provide):

y
table(y[1,])
table(y[2,])

Prior specification and likelihood

Prior for $\delta_k$: $\delta_k \sim N(\delta_{k0}, D_{k0})$

Prior for $\beta$: $\beta \sim N(\beta_0, B_0)$

Likelihood : $z_i \sim N(X_i \beta, \Sigma)$

Posterior

  1. Sample $\delta_k|y, \beta$ marginally of $z_k$ from proposal density student's t distribution , through Metropolis Hastings (MH) algorithm. The parameters for proposal density is obtained by the method described in th paper.

  2. Sample $z_k|y,\beta,\delta_k, z_{k^c}$ by drawing $z_{ik}|y\beta, \gamma,z_{k^c} \sim TN_{(\gamma_{j-1}, \gamma_j)}(\mu_{k.}, \sigma_{k.}^2)$ for i = 1,..,n where $\mu_{k.}, \sigma_{k.}^2$ are conditional mean and variance for a Gaussian Distribution.

  3. Sample $\beta|z \sim N(\hat{\beta}, \hat{B})$ where $\hat{B} = (B_0^{-1} + \sum_{i = 1} ^{n} X_i^t \Sigma^{-1} X_i)^{-1}$ and $\hat{\beta} =\hat{B}(B_0^{-1} \beta_0 + \sum_{i = 1} ^{n} X_i^t \Sigma^{-1} z_i )$

Interpretation of the results:

The function 'nominal_post_beta' returns the following outputs:

  1. Posterior_mean: It gives posterior mean of beta in vector form indicating the nominal measure, level of the nominal measure and the corresponding covariate of the nominal measure

  2. Credible_interval: It gives credible interval for beta vector at specified level (user input otherwise bydefault at 95%)

  3. trace_plot: Displays a plot of iterations vs. sampled values for each variable in the chain, with a separate plot per variable. It indicates about how well the mixing of posterior beta and z .

  4. density_plot: A density plot shows the distribution of variables, with a separate plot per variable. As normal prior is considered, so it is expected that density plot for each variable would be like normal distribution with mode at posterior mean.

  5. carter_plot: Creates plots of credible intervals for parameters from an MCMC simulation. It should match with the credible_interval outputs

Interpretation of the specific example:

Input Variables:

library(BayesMVProbit)
ordinal = ordinal_post_beta(category = c(4, 3), df_t = NULL, iter = 1000,  burn = 100, cred_level = 0.95, x_list, sig = diag(2), y , prior_delta_mean = NULL, prior_delta_var = NULL, prior_beta_mean = NULL, prior_beta_var = NULL)

ordinal$Posterior_mean
ordinal$Credible_interval

Output Variables

The data is generated from original beta vector all beta = 2

The posterior mean also around 2

95% credible interval is also around 2

Carter plot is also provided.

Conclusion

Hence the model and code well represent the original data.



rochita07/BayesMVProbit documentation built on Dec. 11, 2019, 2:07 a.m.