fit.meta.GMCM: Estimate GMCM parameters of the special model
In AEBilgrau/GMCM: Fast Estimation of Gaussian Mixture Copula Models

Description Usage Arguments Details Value Note Author(s) References See Also Examples

This function estimates the parameters of the special restricted Gaussian mixture copula model (GMCM) proposed by Li et. al. (2011). It is used to perform reproducibility (or meta) analysis using GMCMs. It features various optimization routines to identify the maximum likelihood estimate of the special GMCMs.

fit.meta.GMCM(
  u,
  init.par,
  method = c("NM", "SANN", "L-BFGS", "L-BFGS-B", "PEM"),
  max.ite = 1000,
  verbose = TRUE,
  positive.rho = TRUE,
  trace.theta = FALSE,
  ...
)

fit.special.GMCM(
  u,
  init.par,
  method = c("NM", "SANN", "L-BFGS", "L-BFGS-B", "PEM"),
  max.ite = 1000,
  verbose = TRUE,
  positive.rho = TRUE,
  trace.theta = FALSE,
  ...
)

`u`	An `n` by `d` matrix of test statistics. Rows correspond to features and columns to experiments. Larger values are assumed to be indicative of stronger evidence and reproducibility.
`init.par`	A 4-dimensional vector of the initial parameters where, `init.par[1]` is the mixture proportion of spurious signals, `init.par[2]` is the mean, `init.par[3]` is the standard deviation, `init.par[4]` is the correlation.
`method`	A character vector of length 1. The optimization method used. Should be either `"NM"`, `"SANN"`, `"L-BFGS"`, `"L-BFGS-B"`, or `"PEM"` which are abbreviations of Nelder-Mead, Simulated Annealing, limited-memory quasi-Newton method, limited-memory quasi-Newton method with box constraints, and the pseudo EM algorithm, respectively. Default is `"NM"`. See `optim` for further details.
`max.ite`	The maximum number of iterations. If the `method` is `"SANN"` this is the number of iterations as there is no other stopping criterion. (See `optim`)
`verbose`	Logical. If `TRUE`, the log-likelihood values are printed.
`positive.rho`	`logical`. If `TRUE`, the correlation parameter is restricted to be positive.
`trace.theta`	`logical`. Extra convergence information is appended as a list to the output returned if `TRUE`. The exact behavior is dependent on the value of `method`. If `method` equals `"PEM"`, the argument is passed to `trace.theta` in `PseudoEMAlgorithm`. Otherwise it is passed to the control argument `trace` in `optim`.
`...`	Arguments passed to the `control`-list in `optim` or `PseudoEMAlgorithm` if `method` is `"PEM"`.

The "L-BFGS-B" method does not perform a transformation of the parameters.

fit.special.GMCM is simply an alias of fit.meta.gmcm.

A vector par of length 4 of the fitted parameters where par[1] is the probability of being from the first (or null) component, par[2] is the mean, par[3] is the standard deviation, and par[4] is the correlation.

If trace.theta is TRUE, then a list is returned where the first entry is as described above and the second entry is the trace information (dependent of method.).

Simulated annealing is strongly dependent on the initial values and the cooling scheme.

See optim for further details.

Anders Ellern Bilgrau <anders.ellern.bilgrau@gmail.com>

Li, Q., Brown, J. B. J. B., Huang, H., & Bickel, P. J. (2011). Measuring reproducibility of high-throughput experiments. The Annals of Applied Statistics, 5(3), 1752-1779. doi:10.1214/11-AOAS466

optim

set.seed(1)

# True parameters
true.par <- c(0.9, 2, 0.7, 0.6)
# Simulation of data from the GMCM model
data <- SimulateGMCMData(n = 1000, par = true.par)
uhat <- Uhat(data$u) # Ranked observed data

init.par <- c(0.5, 1, 0.5, 0.9)  # Initial parameters

# Optimization with Nelder-Mead
nm.par   <- fit.meta.GMCM(uhat, init.par = init.par, method = "NM")

## Not run: 
# Comparison with other optimization methods
# Optimization with simulated annealing
sann.par <- fit.meta.GMCM(uhat, init.par = init.par, method = "SANN",
                          max.ite = 3000, temp = 1)
# Optimization with the Pseudo EM algorithm
pem.par  <- fit.meta.GMCM(uhat, init.par = init.par, method = "PEM")

# The estimates agree nicely
rbind("True" = true.par, "Start" = init.par,
      "NM" = nm.par, "SANN" = sann.par, "PEM" = pem.par)

## End(Not run)

# Get estimated cluster
Khat <- get.IDR(x = uhat, par = nm.par)$Khat
plot(uhat, col = Khat, main = "Clustering\nIDR < 0.05")