mlmm: The multilevel model to handle missing dependants: mlmm().

Description Usage Arguments Value Examples

View source: R/mlmm.R

Description

mlmm() handles Bayesian multilevel model with response variable that has missing values that depends on the response value itself. Apart from the response value, the missingness is also known to associate with the other variables. The method is created for analyzing mass-spectrometry data when it has abundance-dependant missing and censored values, and there are prior information available for the associations between the probability of missing and the known variables. The function mlmm is written for response variable has no censored values while mlmc function include imputing censored values.

Usage

1
2
3
4
5
mlmm(formula_completed, formula_missing, formula_subject, pdata,
    respond_dep_missing = TRUE, pidname, sidname, prec_prior = NULL,
    alpha_prior = NULL, iterno = 100, chains = 3, thin = 1, seed = 125,
    algorithm = "NUTS", warmup = floor(iterno/2), adapt_delta_value = 0.9,
    savefile = FALSE)

Arguments

formula_completed

The main regression model formula. It has the same formula format as lmr() and it is used to define the first level response and its explanatory variables.

formula_missing

The logistic regression model formula. It has the same formula as formula_completed.

formula_subject

The second level formula in the multilevel model which is used to define second level unit such as subject and explanatory variables.

pdata

The dataset contains response and predictors in a long format. Response is a vector with an indicator variable to define the corresponding unit. The data needs to have the following rudimental variables: the indicator variable for first level response, the indicator variable for second level unit such as subject or a sampling unit, an indicator for missingness and indicator of censoring. Missingness and censored are two different classification, there should not have any overlap between missingness and censored. Data structure can be referenced from the example and reference papers.

respond_dep_missing

An indicator of whether response value is missing-dependant.

pidname

Variable name to define the multilevel response unit, i.e. protein name or gene name

sidname

Vriable name to define the subject unit, i.e. patient id or sampling id

prec_prior

prior precision matrix of the explanatory variables at the first level unit in the multilevel model, for example, the variables to predict the ion intensity. The dimension will be q x q, where q is the number of explanatory variables at the right-hand side of formula_completed. The default is a matrix with diagonal value of 0.01 and off-diagonal value of 0.005.

alpha_prior

prior for coefficients of predictors to missing probability in the logistic regression. Its length will be equal to the number of variables at the right-hand side of the formula_missing. Default is a vector of zeros.

iterno

Number of iterations for the posterior samplings

chains

rstan() parameter to define number of chains of posterior samplings.

thin

rstan() parameter to define the frequency of iterations saved.

seed

random seed for rstan() function

algorithm

rstan() parameter which has three options c(NUTS,HMC, Fixed_param).

warmup

Number of iterations for burn-out in stan.

adapt_delta_value

Adaptive delta value is an adaptation parameters for sampling algorithms,default is 0.85, value between 0-1.

savefile

A logical variable to indicate if the sampling files are to be saved.

Value

Return of the function is the result fitted by stan(). It will have the summarized parameters from all chains and summary results for each chain. Plot() function will return the visualization of the mean and parameters.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
library(MASS)
set.seed(150)
var2 <- abs(rnorm(1000,0,1)); treatment <- c(rep(0,500),rep(1,500))
geneid <- rep(seq_len(20),50); 
sid <- c(rep(seq_len(25),20),rep(seq_len(25)+25,20))
cov1 <- rWishart(1,df=100,Sigma <- diag(rep(1,100)))
u <- rnorm(100,0,1)
mu <- mvrnorm(n=1,mu=u,cov1[,,1])
sdd <- rgamma(1,shape=1,scale=1/10)
var1=(1/0.85)*var2+2*treatment
for (i in seq_len(1000)) {var1[i]=var1[i]+rnorm(1,mu[geneid[i]],sdd)}
miss_logit <- var2*(-0.9)+var1*(0.01)
probmiss <- exp(miss_logit)/(exp(miss_logit)+1)
miss <- rbinom(1000,1,probmiss); table(miss)
pdata <- data.frame(var1,var2,treatment,miss,geneid,sid)
for ( i in seq_len(1000)) if (pdata$miss[i]==1) pdata$var1[i]=NA;
pidname="geneid"; sidname="sid";
#copy and paste the following formulas to the mmlm() function respectively
formula_completed=var1~var2+treatment
formula_missing=miss~var2
formula_censor=censor~1
formula_subject=~treatment
model3 <- mlmm(formula_completed=var1~var2+treatment, 
formula_missing=miss~var2,
formula_subject=~treatment, pdata=pdata, respond_dep_missing=TRUE,
pidname="geneid", sidname="sid", iterno=10, chains=2,
savefile=FALSE)

mlm4omics documentation built on Oct. 31, 2019, 9:43 a.m.