hyperEM: Estimate hyperparameters using an EM algorithm

Description Usage Arguments Details Value References See Also Examples

View source: R/f_EM_hyperEstimation.R


hyperEM finds hyperparameter estimates using a variation on the Expectation-Maximization (EM) algorithm known as the Expectation/Conditional Maximization (ECM) algorithm (Meng et al, 1993). The algorithm estimates each element of the hyperparameter vector, θ, while holding fixed (conditioning on) the other parameters in the vector. Alternatively, it can estimate both parameters for each distribution in the mixture while holding the parameters from the other distribution and the mixing fraction fixed.


hyperEM(data, theta_init_vec, squashed = TRUE, zeroes = FALSE,
  N_star = 1, method = c("score", "nlminb"), profile = c("parameter",
  "distribution"), LL_tol = 1e-04, consecutive = 100,
  param_lower = 1e-05, param_upper = 20, print_level = 2,
  max_iter = 5000, conf_ints = FALSE, conf_level = c("95", "80",
  "90", "99"), track = FALSE)



A data frame from processRaw or squashData containing columns named N, E, and (if squashed) weight.


A numeric vector of initial hyperparameter guesses ordered as: α_1, β_1, α_2, β_2, P.


A scalar logical (TRUE or FALSE) indicating whether or not data squashing was used.


A scalar logical specifying if zero counts are included.


A positive scalar whole number value for the minimum count size to be used for hyperparameter estimation. If zeroes are used, set N_star to NULL.


A scalar string indicating which method (i.e. score functions or log-likelihood function) to use for the maximization steps. Possible values are "score" and "nlminb".


A scalar string indicating which method to use to optimize the log-likelihood function if method = "nlminb" (ignored if method = "score"). profile = "parameter" optimizes one parameter (α or β) from the log-likelihood function at a time. profile = "distribution" optimizes one distribution from the mixture at a time (α and β simultaneously).


A scalar numeric value for the tolerance used for determining when the change in log-likelihood at each iteration is sufficiently small. Used for convergence assessment.


A positive scalar whole number value for the number of consecutive iterations the change in log-likelihood must be below LL_tol in order to reach convergence. Larger values reduce the chance of getting stuck in a flat region of the curve.


A scalar numeric value for the smallest acceptable value for each α and β estimate.


A scalar numeric value for the largest acceptable value for each α and β estimate.


A value that determines how much information is printed during execution. Possible value are 0 for no printing, 1 for minimal information, and 2 for maximal information.


A positive scalar whole number value for the maximum number of iterations to use.


A scalar logical indicating if confidence intervals and standard errors should be returned.


A scalar string for the confidence level used if confidence intervals are requested.


A scalar logical indicating whether or not to retain the hyperparameter estimates and log-likelihood value at each iteration. Can be used for plotting to better understand the algorithm's behavior.


If method = "score", the maximization step finds a root of the score function. If method = "nlminb", nlminb is used to find a minimum of the negative log-likelihood function.

If method = "score" and zeroes = FALSE, then 'N_star' must equal 1.

If method = "score", the model is reparameterized. The parameters are transformed to force the parameter space to include all real numbers. This approach addresses numerical issues at the edge of the parameter space. The reparameterization follows: α_{prime} = log(α), β_{prime} = log(β), and P_{prime} = tan(pi * P - pi / 2). However, the values returned in estimates are on the original scale (back-transformed).

On every 100th iteration, the procedure described in Millar (2011) is used to accelerate the estimate of θ.

The score vector and its Euclidean norm should be close to zero at a local maximum and can therefore be used to help assess the reliability of the results. A local maximum might not be the global MLE, however.

Asymptotic normal confidence intervals, if requested, use standard errors calculated from the observed Fisher information matrix as discussed in DuMouchel (1999).


A list including the following:


DuMouchel W (1999). "Bayesian Data Mining in Large Frequency Tables, With an Application to the FDA Spontaneous Reporting System." The American Statistician, 53(3), 177-190.

Meng X-L, Rubin D (1993). "Maximum likelihood estimation via the ECM algorithm: A general framework", Biometrika, 80(2), 267-278.

Millar, Russell B (2011). "Maximum Likelihood Estimation and Inference", John Wiley & Sons, Ltd, 111-112.

See Also

uniroot for finding a zero of the score function and nlminb for minimizing the negative log-likelihood function

Other hyperparameter estimation functions: autoHyper, exploreHypers


proc <- processRaw(caers)
squashed <- squashData(proc, bin_size = 100, keep_pts = 0)
squashed <- squashData(squashed, count = 2, bin_size = 12, keep_pts = 24)
hyperEM(squashed, theta_init_vec = c(1, 1, 2, 2, .3), consecutive = 10)

openEBGM documentation built on Aug. 17, 2018, 1:05 a.m.