mvmise_b: A multivariate mixed-effects selection model with correlated...

Description Usage Arguments Details Value References Examples

Description

This function fits a multivariate mixed-effects selection model with correlated outcome-specific random intercepts allowing potential ignorable or non-ignorable missing values in the outcome. Here an outcome refers to a response variable, for example, a genomic feature. The proposed model and function jointly analyze multiple outcomes/features.

Usage

1
2
mvMISE_b(Y, X, id, maxIter = 100, tol = 0.001, verbose = FALSE, cov_miss = NULL, 
    miss_y = TRUE, sigma_diff = FALSE)

Arguments

Y

an outcome matrix. Each row is a sample, and each column is an outcome variable, with potential missing values (NAs).

X

a covariate matrix. Each row is a sample, and each column is a covariate. The covariates can be common among all of the outcomes (e.g., age, gender) or outcome-specific. If a covariate is specific for the k-th outcome, one may set all the values corresponding to the other outcomes to be zero. If X is common across outcomes, the row number of X equals the row number of Y. Otherwise, if X is outcome-specific, the row number of X equals the number of elements in Y, i.e., outcome-specific X is stacked across outcomes within each cluster. See the Examples for demonstration.

id

a vector of cluster/batch index, matching with the rows of Y, and X if it is not outcome specific.

maxIter

the maximum number of iterations for the EM algorithm.

tol

the tolerance level for the relative change in the observed-data log-likelihood function.

verbose

logical. If TRUE, the iteration history of each step of the EM algorithm will be printed. The default is FALSE.

cov_miss

the covariate that can be used in the missing-data model. If it is NULL, the missingness is assumed to be independent of the covariates. Check the Details for the missing-data model. If it is specified and the covariate is not outcome specific, its length equals the length of id. If it is outcome specific, the outcome-specific covariate is stacked across outcomes within each cluster.

miss_y

logical. If TRUE, the missingness depends on the outcome Y (see the Details). The default is TRUE. This outcome-dependent missing data pattern was motivated by and was observed in the mass-spectrometry-based quantitative proteomics data.

sigma_diff

logical. If TRUE, the sample error variance of the first sample in each cluster/batch is different from that for the rest of samples within the same cluster/batch. This option is designed and used when analyzing batch-processed proteomics data with the first sample in each cluster/batch being the common reference sample. The default is FALSE.

Details

The multivariate mixed-effects selection model consists of two components, the outcome model and the missing-data model. Here the outcome model is a multivariate mixed-effects model, with correlations among multivariate outcomes modeled via correlated outcome-specific random intercepts with a factor-analytic structure

\mathbf{y}_{i} = \mathbf{X}_{i}\boldsymbol{β} + ≤ft(\mathbf{I}_{K}\otimes\mathbf{1}_{n_{i}}\right) \boldsymbol{τ}b_{i}+\mathbf{e}_{i},

where i denotes a cluster/batch, n_{i} is the number of samples/observations within each cluster, \boldsymbol{τ} is a K\times 1 vector for the outcome-specific variance components corresponding to the random effect b_i (a standard normal random variable), and K is the number of outcomes. By default, a matrix with each column as an indicator for each outcome is generated and is used as the random-effect design matrix (\mathbf{I}_{K}\otimes\mathbf{1}_{n_{i}}), and the model will estimate the outcome-specific random intercepts. The factor-analytic structure assumes the outcome-specific random intercepts are identically correlated and this model is often used to capture the highly structured experimental or biological correlations among naturally related outcomes. For example, the correlation among multiple phosphopeptides (i.e. phosphorylated segments) of a same protein. The model assumes that the random effects are derived from a latent variable b_i with a loading vector \boldsymbol{τ}. With this model specification, only K parameters instead of K(K+1)/2 are needed in the estimation for the covariance matrix of random-effects, and as such that greatly facilitates the computation.

The missing-data model can be written as

\textrm{Pr}≤ft(r_{ik}=1|\mathbf{y}_{ik}\right)= \mathrm{exp}≤ft(φ_{0} + φ_{1}/n_{i}\cdot \mathbf{1}^{'}\mathbf{y}_{ik} + φ_{2}/n_{i}\cdot \mathbf{1}^{'}\mathbf{c}_{i} \right),

where r_{ik} is the missing indicator for the k-th outcome in the i-th cluster. If r_{ik}=1, the values of the k-th outcome in the i-th cluster \mathbf{y}_{ik} are missing altogether. The estimation is implemented via an EM algorithm. Parameters in the missing-data models can be specified via the arguments miss_y and cov_miss. If miss_y = TURE, the missingness depends on the outcome values. If cov_miss is specified, the missingness can (additionally) depend on the specified covariate (cov_miss).

The model also works for fully observed data if miss_y = FALSE and cov_miss = NULL. It would also work for a univariate outcome with potential missing values, if the outcome Y is a matrix with one column.

Value

A list containing

beta

the estimated fixed-effects.

var

the variance-covariance matrix of the estimated fixed effects. With the fixed effects and their covariance matrix estimates, one can obtain the Wald-statistics for testing fixed-effects beta/sqrt(diag(var)).

pval

the parametric p-values for testing non-zero fixed-effects. It is obtained as the two-sided p-value based on the Wald statistics of beta/sqrt(diag(var)).

sigma2

the estimated sample error variance(s). If sigma_diff is TRUE, it returns a vector of two elements, the variances for the first sample and for the rest of samples within each cluster.

tau

the estimated variance components for the outcome-specific factor-analytic random-effects.

phi

the estimated parameters for the missing-data mechanism. Check the Details for the missing-data model. A zero estimate implies that the parameter is ignored via the specification of miss_y and/or cov_miss.

loglikelihood

the observed-data log-likelihood values.

iter

the number of iterations for the EM algorithm when reaching the convergence.

References

Jiebiao Wang, Pei Wang, Donald Hedeker, and Lin S. Chen. A multivariate mixed-effects selection model framework for labelling-based proteomics data with non-ignorable missingness. (In preparation).

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
data(sim_dat)

# Covariates X common across outcomes with common coefficients

fit0 = mvMISE_b(Y = sim_dat$Y, X = sim_dat$X, id = sim_dat$id)

## Not run: 

# In the example below, we showed how to estimate outcome-specific
# coefficients for a common covariate. The second column of
# sim_dat$X matrix is a common covariate. But it has different
# effects/coefficients on different outcomes.

nY = ncol(sim_dat$Y)
# stack X across outcomes
X_mat = sim_dat$X[rep(1:nrow(sim_dat$X), nY), ]
# Y_ind is the indicator matrix corresponding to different outcomes
Y_ind = kronecker(diag(nY), rep(1, nrow(sim_dat$Y)))
# generate outcome-specific covariates
cidx = 2  # the index for the covariate with outcome-specific coefficient
X_mat = cbind(1, X_mat[, cidx] * Y_ind)

# X_mat is a matrix of 460 (92*5) by 6, the first column is
# intercept and the next 5 columns are covariate for each outcome

fit1 = mvMISE_b(Y = sim_dat$Y, X = X_mat, id = sim_dat$id)


# A covariate only specific to the first outcome

X_mat1 = X_mat[, 1:2]

fit2 = mvMISE_b(Y = sim_dat$Y, X = X_mat1, id = sim_dat$id)


## An example that allows missingness depending on both a covariate
## and the outcome

fit3 = mvMISE_e(Y = sim_dat$Y, X = sim_dat$X, id = sim_dat$id, 
    cov_miss = sim_dat$X[, 2])


## End(Not run)

lschen-stat/mvMISE documentation built on May 14, 2019, 11:27 a.m.