EstimateMissing: Estimate the metabolite-dependent missingness mechanisms

Description Usage Arguments Value

View source: R/AnalyzeMetaboliteData.R

Description

Estimate the metabolite-dependent missingness mechanisms with a hierarchical generalized method of moments (GMM). This function only has to be run once per metabolite dataset and the output should be stored with the metabolite data. The user need only specify Y and maybe K, although the default K = 10 should suffice.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
EstimateMissing(
  Y,
  K = 10,
  max.missing.consider = 0.5,
  Cov = NULL,
  max.miss.C = 0.05,
  n_cores = NULL,
  max.iter.C = 400,
  n.repeat.Sigma.C = 1,
  n.K.GMM = 2,
  min.a = 0.1,
  max.a = 7,
  min.y0 = 10,
  max.y0 = 30,
  t.df = 4,
  p.min.1 = 0,
  p.min.2 = 0,
  n.boot.J = 150,
  Model.Pvalue = T,
  BH.analyze.min = 0.2,
  min.quant.5 = 5,
  shrink.Est = T,
  prop.y0.sd = 0.2,
  prop.a.sd = 0.2,
  n.iter.MCMC = 20000,
  n.burn.MCMC = 1000,
  min.prob.MCMC = 1/n,
  Bayes.est = c("EmpBayes", "FullBayes", "FullBayes_ind"),
  simple.average.EB = F
)

Arguments

Y

a p x n data matrix of log2-transformed metabolite intensities, where p = #of metabolites and n = #of samples. Missing values should be left as NA.

K

a number >= 2. This gives the number of latent covariates to use to estimate the missingness mechanism. We recommend using Num.Instruments to estimate it. The default is 10. K and Y are the only variables that must be specified.

max.missing.consider

The maximum fraction of missing data a metabolite is allowed to have. Missingness mechanisms will NOT be estimated for metabolites with more missing data than this. Default, and recommended value, is 0.5

Cov

An optional n x d matrix of covariates. It is recommended the user not specify anything other than the intercept. The default is the intercept.

max.miss.C

Maximum fraction of missing data a metabolite can have to ignore the missingness mechanism in downstream estimation and inference. The default, and recommended value, is 0.05.

n_cores

The number of cores to use. The default is the number of maximum number of usable cores - 1.

max.iter.C

Maximum number of iterations to estimate the latent covariates C. Default is 400 and should not be changed.

n.K.GMM

Number of additional terms (besides the intercept) to be considered in GMM when estimating the missingness mechanism. The default, and recommendend value, is 2. If changed, this must be >= 2

t.df

The missingness mechanism is the CDF of a scaled and cetered T-distribution with t.df degrees of freedom. The default, and recommended value, is 4

n.boot.J

The number of bootstrap samples to compute the J-statistics. The defualt is 150.

Model.Pvalue

A logical value. If T, a missingness model P-value is computed. The default, and recommended value, is T.

Value

A list that should be save immediately. It can be used directly as input into CC.Missing to estimate latent factors and the coefficients of interest in a multivariate linear model.

Post.Theta

p x 2 matrix containing the posterior expectations of the missingness scale and location parameters (a,y0) for each metabolite. Returns NA for metabolites without a missingness mechansim.

Post.Var

A list of p 2x2 matrices containing the posterior variances for (a,y0) for each metabolite.

Post.W

A pxn containing the posterior expectations of 1/P(Metab is observed | y, a, y0), where the expectation is taken with respect to (a,y0) | y

Post.VarW

A pxn containing the posterior variances of 1/P(Metab is observed | y, a, y0), where the expectation is taken with respect to (a,y0) | y

Post.Pi

A pxn containing the posterior expectations of P(Metab is observed | y, a, y0), where the expectation is taken with respect to (a,y0) | y

Pi.MAR

A pxn containing estimate of P(Metab is observed | Latent covariates). This helps stabilize the inverse probability weights in downstream estimation.

Theta.Miss

p x 2 matrix with the estimates of the unshrunk GMM scale and location parameters a, y0 for each metabolite's missingness mechanism. If a missingness mechansism was not estimated, returns NA.

Pvalue.value

The J-test P-value that tests the null hypothesis H_0: Missingness mechanism is correct

Ind.Confident

A logical p-vector containing the indices of metabolites whose missingness mechanisms we are confident in.

Emp.Bayes.loga

Empirical Bayes estimate of E(log(a))

Emp.Bayes.y0

Empirical Bayes estimate of E(y0)


chrismckennan/MetabMiss documentation built on March 1, 2020, 10:03 p.m.