mlmm: The multilevel model to handle missing dependants: mlmm().
In mlm4omics: Multilevel Model for Multivariate Responses with Missing Values

Description Usage Arguments Value Examples

View source: R/mlmm.R

mlmm() handles Bayesian multilevel model with response variable that has missing values that depends on the response value itself. Apart from the response value, the missingness is also known to associate with the other variables. The method is created for analyzing mass-spectrometry data when it has abundance-dependant missing and censored values, and there are prior information available for the associations between the probability of missing and the known variables. The function mlmm is written for response variable has no censored values while mlmc function include imputing censored values.

mlmm(formula_completed, formula_missing, formula_subject, pdata,
    respond_dep_missing = TRUE, pidname, sidname, prec_prior = NULL,
    alpha_prior = NULL, iterno = 100, chains = 3, thin = 1, seed = 125,
    algorithm = "NUTS", warmup = floor(iterno/2), adapt_delta_value = 0.9,
    savefile = FALSE)

`formula_completed`	The main regression model formula. It has the same formula format as lmr() and it is used to define the first level response and its explanatory variables.
`formula_missing`	The logistic regression model formula. It has the same formula as formula_completed.
`formula_subject`	The second level formula in the multilevel model which is used to define second level unit such as subject and explanatory variables.
`pdata`	The dataset contains response and predictors in a long format. Response is a vector with an indicator variable to define the corresponding unit. The data needs to have the following rudimental variables: the indicator variable for first level response, the indicator variable for second level unit such as subject or a sampling unit, an indicator for missingness and indicator of censoring. Missingness and censored are two different classification, there should not have any overlap between missingness and censored. Data structure can be referenced from the example and reference papers.
`respond_dep_missing`	An indicator of whether response value is missing-dependant.
`pidname`	Variable name to define the multilevel response unit, i.e. protein name or gene name
`sidname`	Vriable name to define the subject unit, i.e. patient id or sampling id
`prec_prior`	prior precision matrix of the explanatory variables at the first level unit in the multilevel model, for example, the variables to predict the ion intensity. The dimension will be q x q, where q is the number of explanatory variables at the right-hand side of formula_completed. The default is a matrix with diagonal value of 0.01 and off-diagonal value of 0.005.
`alpha_prior`	prior for coefficients of predictors to missing probability in the logistic regression. Its length will be equal to the number of variables at the right-hand side of the formula_missing. Default is a vector of zeros.
`iterno`	Number of iterations for the posterior samplings
`chains`	rstan() parameter to define number of chains of posterior samplings.
`thin`	rstan() parameter to define the frequency of iterations saved.
`seed`	random seed for rstan() function
`algorithm`	rstan() parameter which has three options c(NUTS,HMC, Fixed_param).
`warmup`	Number of iterations for burn-out in stan.
`adapt_delta_value`	Adaptive delta value is an adaptation parameters for sampling algorithms,default is 0.85, value between 0-1.
`savefile`	A logical variable to indicate if the sampling files are to be saved.

Return of the function is the result fitted by stan(). It will have the summarized parameters from all chains and summary results for each chain. Plot() function will return the visualization of the mean and parameters.

library(MASS)
set.seed(150)
var2 <- abs(rnorm(1000,0,1)); treatment <- c(rep(0,500),rep(1,500))
geneid <- rep(seq_len(20),50); 
sid <- c(rep(seq_len(25),20),rep(seq_len(25)+25,20))
cov1 <- rWishart(1,df=100,Sigma <- diag(rep(1,100)))
u <- rnorm(100,0,1)
mu <- mvrnorm(n=1,mu=u,cov1[,,1])
sdd <- rgamma(1,shape=1,scale=1/10)
var1=(1/0.85)*var2+2*treatment
for (i in seq_len(1000)) {var1[i]=var1[i]+rnorm(1,mu[geneid[i]],sdd)}
miss_logit <- var2*(-0.9)+var1*(0.01)
probmiss <- exp(miss_logit)/(exp(miss_logit)+1)
miss <- rbinom(1000,1,probmiss); table(miss)
pdata <- data.frame(var1,var2,treatment,miss,geneid,sid)
for ( i in seq_len(1000)) if (pdata$miss[i]==1) pdata$var1[i]=NA;
pidname="geneid"; sidname="sid";
#copy and paste the following formulas to the mmlm() function respectively
formula_completed=var1~var2+treatment
formula_missing=miss~var2
formula_censor=censor~1
formula_subject=~treatment
model3 <- mlmm(formula_completed=var1~var2+treatment, 
formula_missing=miss~var2,
formula_subject=~treatment, pdata=pdata, respond_dep_missing=TRUE,
pidname="geneid", sidname="sid", iterno=10, chains=2,
savefile=FALSE)