mlmc: The multilevel function for missing and censored dependents:...
In mlm4omics: Multilevel Model for Multivariate Responses with Missing Values

Description Usage Arguments Value Examples

View source: R/mlmc.R

mlmc() handles Bayesian multilevel model with response variable that has left-censored values, and missing values that depends on the response value itself. Apart from the response value, the missingness is also known to associate with the other variables. The method is created for analyzing mass-spectrometry data when it has abundance-dependant missing and censored values, and there are prior information available for the associations between the probability of missing and the known variables. The imputed values for the censored response are outputted as part of the parameters.

mlmc(formula_completed, formula_missing, formula_censor = NULL,
    formula_subject, pdata, respond_dep_missing = TRUE,
    response_censorlim = NULL, pidname, sidname, prec_prior = NULL,
    alpha_prior = NULL, iterno = 100, chains = 3, thin = 1, seed = 125,
    algorithm = "NUTS", warmup = floor(iterno/2), adapt_delta_value = 0.9,
    savefile = FALSE)

`formula_completed`	The main regression model formula; It has the same formula format as lmr() and it is used to define the first level response and its explanatory variables.
`formula_missing`	The logistic regression model formula; It has the same formula as formula_completed.
`formula_censor`	The formula used in the program to define the observations with censored values.
`formula_subject`	The second level formula in the multilevel model which is used to define responses such as subject and its explanatory variables.
`pdata`	The dataset contains response and predictors in a long format. Response is a vector with an indicator variable to define the corresponding unit. The data needs to have the following rudimental variables: the indicator variable for first level response, second level indicator variable for subject such as subject id or a sampling unit, an indicator for missingness and indicator of censoring. Missingness and censored are two different classifications, when these two variables are tabulated, there must not have any observation defined as censored and missing. Data structure can be referred from the example and vignette.
`respond_dep_missing`	A logical variable to indicate whether response value is missing-dependant.
`response_censorlim`	The detectable limit for the response value, i.e. 1 mg per Liter for intensity value.
`pidname`	Variable name to define the multilevel response unit, i.e. protein name or gene name.
`sidname`	Variable name to define the subject unit, i.e. patient id or sampling id.
`prec_prior`	prior precision matrix of the explanatory variables at the first level unit in the multilevel model, for example, the variables to predict the ion intensity. The dimension will be q x q, where q is the number of explanatory variables at the right-hand side of formula_completed. The default is a matrix with diagonal values of 0.01 and off-diagonal values of 0.005.
`alpha_prior`	prior for coefficients of predictors to missing probability in the logistic regression. Its length will be equal to the number of variables at the right-hand side of the formula_missing. Default is a vector of zeros.
`iterno`	Number of iterations for the posterior samplings.
`chains`	rstan parameter to define number of chains of posterior samplings.
`thin`	rstan parameter to define the frequency of iterations saved.
`seed`	random seed for rstan function.
`algorithm`	rstan parameter which has three options NUTS, HMC, Fixed param.
`warmup`	Number of iterations for burn-out in stan.
`adapt_delta_value`	Adaptive delta value is an adaptation parameters for sampling algorithms, default is 0.85, value between 0-1.
`savefile`	A logical variable to indicate if the sampling files are to be saved.

Return of the function is the result fitted by stan. It will have the summarized parameters from all chains and summary results for each chain.

## Not run: 
set.seed(150)
library(MASS)
var2 <- abs(rnorm(800,0,1)); treatment <- c(rep(0,400), rep(1,400));
var1 <- (1/0.85)*var2+2*treatment;
geneid <- rep(seq_len(50),16);
sid <- c(rep(seq_len(50),8), rep(seq_len(50)+50,8))
cov1 <- rWishart(1,df=50, Sigma <- diag(rep(1,50)))
u <- rnorm(50,0,1);mu <- mvrnorm(n=1, mu=u, cov1[,,1])
sdd <- rgamma(1, shape=1, scale=1/10);
for (i in seq_len(800)) {var1[i] <- var1[i]+rnorm(1, mu[geneid[i]], sdd)}
miss_logit <- var2*(-0.9)+var1*(0.001);
miss <- rbinom(800, 1, exp(miss_logit)/(exp(miss_logit)+1));
censor <- rep(0,800)
for (i in seq_len(800)) {if (var1[i]<0.002) censor[i]=1}
pdata <- data.frame(var1, var2, treatment, miss, censor, geneid, sid);
for ( i in seq_len(800)) 
{if ((pdata$miss[i]==1) & (pdata$censor[i]==1)) pdata$miss[i]=0};
for ( i in seq_len(800)) {
if (pdata$miss[i]==1) pdata$var1[i]=NA;
if (pdata$censor[i]==1) pdata$var1[i]=0.002};
pidname="geneid";sidname="sid";
#copy and paste the following formulas to the mlmm() function respectively
formula_completed=var1~var2+treatment;
formula_missing=miss~var2;
formula_censor=censor~1;
formula_subject=~treatment;
response_censorlim=0.002;

model1 <- mlmc(formula_completed=var1~var2+treatment,
formula_missing=miss~var2,
formula_censor=censor~1,
formula_subject=~treatment,
pdata=pdata,
response_censorlim=0.002,
respond_dep_missing=TRUE,
pidname="geneid",sidname="sid",
iterno=50,
chains=2,
savefile=FALSE)

## End(Not run)