mlmc: The multilevel function for missing and censored dependents:...

Description Usage Arguments Value Examples

View source: R/mlmc.R

Description

mlmc() handles Bayesian multilevel model with response variable that has left-censored values, and missing values that depends on the response value itself. Apart from the response value, the missingness is also known to associate with the other variables. The method is created for analyzing mass-spectrometry data when it has abundance-dependant missing and censored values, and there are prior information available for the associations between the probability of missing and the known variables. The imputed values for the censored response are outputted as part of the parameters.

Usage

1
2
3
4
5
6
mlmc(formula_completed, formula_missing, formula_censor = NULL,
    formula_subject, pdata, respond_dep_missing = TRUE,
    response_censorlim = NULL, pidname, sidname, prec_prior = NULL,
    alpha_prior = NULL, iterno = 100, chains = 3, thin = 1, seed = 125,
    algorithm = "NUTS", warmup = floor(iterno/2), adapt_delta_value = 0.9,
    savefile = FALSE)

Arguments

formula_completed

The main regression model formula; It has the same formula format as lmr() and it is used to define the first level response and its explanatory variables.

formula_missing

The logistic regression model formula; It has the same formula as formula_completed.

formula_censor

The formula used in the program to define the observations with censored values.

formula_subject

The second level formula in the multilevel model which is used to define responses such as subject and its explanatory variables.

pdata

The dataset contains response and predictors in a long format. Response is a vector with an indicator variable to define the corresponding unit. The data needs to have the following rudimental variables: the indicator variable for first level response, second level indicator variable for subject such as subject id or a sampling unit, an indicator for missingness and indicator of censoring. Missingness and censored are two different classifications, when these two variables are tabulated, there must not have any observation defined as censored and missing. Data structure can be referred from the example and vignette.

respond_dep_missing

A logical variable to indicate whether response value is missing-dependant.

response_censorlim

The detectable limit for the response value, i.e. 1 mg per Liter for intensity value.

pidname

Variable name to define the multilevel response unit, i.e. protein name or gene name.

sidname

Variable name to define the subject unit, i.e. patient id or sampling id.

prec_prior

prior precision matrix of the explanatory variables at the first level unit in the multilevel model, for example, the variables to predict the ion intensity. The dimension will be q x q, where q is the number of explanatory variables at the right-hand side of formula_completed. The default is a matrix with diagonal values of 0.01 and off-diagonal values of 0.005.

alpha_prior

prior for coefficients of predictors to missing probability in the logistic regression. Its length will be equal to the number of variables at the right-hand side of the formula_missing. Default is a vector of zeros.

iterno

Number of iterations for the posterior samplings.

chains

rstan parameter to define number of chains of posterior samplings.

thin

rstan parameter to define the frequency of iterations saved.

seed

random seed for rstan function.

algorithm

rstan parameter which has three options NUTS, HMC, Fixed param.

warmup

Number of iterations for burn-out in stan.

adapt_delta_value

Adaptive delta value is an adaptation parameters for sampling algorithms, default is 0.85, value between 0-1.

savefile

A logical variable to indicate if the sampling files are to be saved.

Value

Return of the function is the result fitted by stan. It will have the summarized parameters from all chains and summary results for each chain.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
## Not run: 
set.seed(150)
library(MASS)
var2 <- abs(rnorm(800,0,1)); treatment <- c(rep(0,400), rep(1,400));
var1 <- (1/0.85)*var2+2*treatment;
geneid <- rep(seq_len(50),16);
sid <- c(rep(seq_len(50),8), rep(seq_len(50)+50,8))
cov1 <- rWishart(1,df=50, Sigma <- diag(rep(1,50)))
u <- rnorm(50,0,1);mu <- mvrnorm(n=1, mu=u, cov1[,,1])
sdd <- rgamma(1, shape=1, scale=1/10);
for (i in seq_len(800)) {var1[i] <- var1[i]+rnorm(1, mu[geneid[i]], sdd)}
miss_logit <- var2*(-0.9)+var1*(0.001);
miss <- rbinom(800, 1, exp(miss_logit)/(exp(miss_logit)+1));
censor <- rep(0,800)
for (i in seq_len(800)) {if (var1[i]<0.002) censor[i]=1}
pdata <- data.frame(var1, var2, treatment, miss, censor, geneid, sid);
for ( i in seq_len(800)) 
{if ((pdata$miss[i]==1) & (pdata$censor[i]==1)) pdata$miss[i]=0};
for ( i in seq_len(800)) {
if (pdata$miss[i]==1) pdata$var1[i]=NA;
if (pdata$censor[i]==1) pdata$var1[i]=0.002};
pidname="geneid";sidname="sid";
#copy and paste the following formulas to the mlmm() function respectively
formula_completed=var1~var2+treatment;
formula_missing=miss~var2;
formula_censor=censor~1;
formula_subject=~treatment;
response_censorlim=0.002;

model1 <- mlmc(formula_completed=var1~var2+treatment,
formula_missing=miss~var2,
formula_censor=censor~1,
formula_subject=~treatment,
pdata=pdata,
response_censorlim=0.002,
respond_dep_missing=TRUE,
pidname="geneid",sidname="sid",
iterno=50,
chains=2,
savefile=FALSE)

## End(Not run)

mlm4omics documentation built on Oct. 31, 2019, 9:43 a.m.