fit.meta.GMCM: Estimate GMCM parameters of the special model

Description Usage Arguments Details Value Note Author(s) References See Also Examples

View source: R/fit.meta.GMCM.R

Description

This function estimates the parameters of the special restricted Gaussian mixture copula model (GMCM) proposed by Li et. al. (2011). It is used to perform reproducibility (or meta) analysis using GMCMs. It features various optimization routines to identify the maximum likelihood estimate of the special GMCMs.

Usage

1
2
3
4
5
6
7
fit.meta.GMCM(u, init.par, method = c("NM", "SANN", "L-BFGS", "L-BFGS-B",
  "PEM"), max.ite = 1000, verbose = TRUE, positive.rho = TRUE,
  trace.theta = FALSE, ...)

fit.special.GMCM(u, init.par, method = c("NM", "SANN", "L-BFGS",
  "L-BFGS-B", "PEM"), max.ite = 1000, verbose = TRUE,
  positive.rho = TRUE, trace.theta = FALSE, ...)

Arguments

u

An n by d matrix of test statistics. Rows correspond to features and columns to experiments. Larger values are assumed to be indicative of stronger evidence and reproducibility.

init.par

A 4-dimensional vector of the initial parameters where, init.par[1] is the mixture proportion of spurious signals, init.par[2] is the mean, init.par[3] is the standard deviation, init.par[4] is the correlation.

method

A character vector of length 1. The optimization method used. Should be either "NM", "SANN", "L-BFGS", "L-BFGS-B", or "PEM" which are abbreviations of Nelder-Mead, Simulated Annealing, limited-memory quasi-Newton method, limited-memory quasi-Newton method with box constraints, and the pseudo EM algorithm, respectively. Default is "NM". See optim for further details.

max.ite

The maximum number of iterations. If the method is "SANN" this is the number of iterations as there is no other stopping criterion. (See optim)

verbose

Logical. If TRUE, the log-likelihood values are printed.

positive.rho

logical. If TRUE, the correlation parameter is restricted to be positive.

trace.theta

logical. Extra convergence information is appended as a list to the output returned if TRUE. The exact behavior is dependent on the value of method. If method equals "PEM", the argument is passed to trace.theta in PseudoEMAlgorithm. Otherwise it is passed to the control argument trace in optim.

...

Arguments passed to the control-list in optim or PseudoEMAlgorithm if method is "PEM".

Details

The "L-BFGS-B" method does not perform a transformation of the parameters.

fit.special.GMCM is simply an alias of fit.meta.gmcm.

Value

A vector par of length 4 of the fitted parameters where par[1] is the probability of being from the first (or null) component, par[2] is the mean, par[3] is the standard deviation, and par[4] is the correlation.

If trace.theta is TRUE, then a list is returned where the first entry is as described above and the second entry is the trace information (dependent of method.).

Note

Simulated annealing is strongly dependent on the initial values and the cooling scheme.

See optim for further details.

Author(s)

Anders Ellern Bilgrau <anders.ellern.bilgrau@gmail.com>

References

Li, Q., Brown, J. B. J. B., Huang, H., & Bickel, P. J. (2011). Measuring reproducibility of high-throughput experiments. The Annals of Applied Statistics, 5(3), 1752-1779. doi:10.1214/11-AOAS466

See Also

optim

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
set.seed(1)

# True parameters
true.par <- c(0.9, 2, 0.7, 0.6)
# Simulation of data from the GMCM model
data <- SimulateGMCMData(n = 1000, par = true.par)
uhat <- Uhat(data$u) # Ranked observed data

init.par <- c(0.5, 1, 0.5, 0.9)  # Initial parameters

# Optimization with Nelder-Mead
nm.par   <- fit.meta.GMCM(uhat, init.par = init.par, method = "NM")

## Not run: 
# Comparison with other optimization methods
# Optimization with simulated annealing
sann.par <- fit.meta.GMCM(uhat, init.par = init.par, method = "SANN",
                          max.ite = 3000, temp = 1)
# Optimization with the Pseudo EM algorithm
pem.par  <- fit.meta.GMCM(uhat, init.par = init.par, method = "PEM")

# The estimates agree nicely
rbind("True" = true.par, "Start" = init.par,
      "NM" = nm.par, "SANN" = sann.par, "PEM" = pem.par)

## End(Not run)

# Get estimated cluster
Khat <- get.IDR(x = uhat, par = nm.par)$Khat
plot(uhat, col = Khat, main = "Clustering\nIDR < 0.05")

GMCM documentation built on Nov. 6, 2019, 1:08 a.m.