# hyperEM: Estimate hyperparameters using an EM algorithm In openEBGM: EBGM Scores for Mining Large Contingency Tables

## Description

`hyperEM` finds hyperparameter estimates using a variation on the Expectation-Maximization (EM) algorithm known as the Expectation/Conditional Maximization (ECM) algorithm (Meng et al, 1993). The algorithm estimates each element of the hyperparameter vector, θ, while holding fixed (conditioning on) the other parameters in the vector. Alternatively, it can estimate both parameters for each distribution in the mixture while holding the parameters from the other distribution and the mixing fraction fixed.

## Usage

 ```1 2 3 4 5 6``` ```hyperEM(data, theta_init_vec, squashed = TRUE, zeroes = FALSE, N_star = 1, method = c("score", "nlminb"), profile = c("parameter", "distribution"), LL_tol = 1e-04, consecutive = 100, param_lower = 1e-05, param_upper = 20, print_level = 2, max_iter = 5000, conf_ints = FALSE, conf_level = c("95", "80", "90", "99"), track = FALSE) ```

## Arguments

 `data` A data frame from `processRaw` or `squashData` containing columns named N, E, and (if squashed) weight. `theta_init_vec` A numeric vector of initial hyperparameter guesses ordered as: α_1, β_1, α_2, β_2, P. `squashed` A scalar logical (`TRUE` or `FALSE`) indicating whether or not data squashing was used. `zeroes` A scalar logical specifying if zero counts are included. `N_star` A positive scalar whole number value for the minimum count size to be used for hyperparameter estimation. If zeroes are used, set `N_star` to `NULL`. `method` A scalar string indicating which method (i.e. score functions or log-likelihood function) to use for the maximization steps. Possible values are `"score"` and `"nlminb"`. `profile` A scalar string indicating which method to use to optimize the log-likelihood function if `method = "nlminb"` (ignored if `method = "score"`). `profile = "parameter"` optimizes one parameter (α or β) from the log-likelihood function at a time. `profile = "distribution"` optimizes one distribution from the mixture at a time (α and β simultaneously). `LL_tol` A scalar numeric value for the tolerance used for determining when the change in log-likelihood at each iteration is sufficiently small. Used for convergence assessment. `consecutive` A positive scalar whole number value for the number of consecutive iterations the change in log-likelihood must be below `LL_tol` in order to reach convergence. Larger values reduce the chance of getting stuck in a flat region of the curve. `param_lower` A scalar numeric value for the smallest acceptable value for each α and β estimate. `param_upper` A scalar numeric value for the largest acceptable value for each α and β estimate. `print_level` A value that determines how much information is printed during execution. Possible value are `0` for no printing, `1` for minimal information, and `2` for maximal information. `max_iter` A positive scalar whole number value for the maximum number of iterations to use. `conf_ints` A scalar logical indicating if confidence intervals and standard errors should be returned. `conf_level` A scalar string for the confidence level used if confidence intervals are requested. `track` A scalar logical indicating whether or not to retain the hyperparameter estimates and log-likelihood value at each iteration. Can be used for plotting to better understand the algorithm's behavior.

## Details

If `method = "score"`, the maximization step finds a root of the score function. If `method = "nlminb"`, `nlminb` is used to find a minimum of the negative log-likelihood function.

If `method = "score"` and `zeroes = FALSE`, then `'N_star'` must equal `1`.

If `method = "score"`, the model is reparameterized. The parameters are transformed to force the parameter space to include all real numbers. This approach addresses numerical issues at the edge of the parameter space. The reparameterization follows: α_{prime} = log(α), β_{prime} = log(β), and P_{prime} = tan(pi * P - pi / 2). However, the values returned in `estimates` are on the original scale (back-transformed).

On every 100th iteration, the procedure described in Millar (2011) is used to accelerate the estimate of θ.

The score vector and its Euclidean norm should be close to zero at a local maximum and can therefore be used to help assess the reliability of the results. A local maximum might not be the global MLE, however.

Asymptotic normal confidence intervals, if requested, use standard errors calculated from the observed Fisher information matrix as discussed in DuMouchel (1999).

## Value

A list including the following:

• estimates: The maximum likelihood estimate (MLE) of the hyperparameter, θ.

• conf_int: A data frame including the standard errors and confidence limits for `estimates`. Only included if `conf_ints = TRUE`.

• maximum: The log-likelihood function evaluated at `estimates`.

• method: The method used in the maximization step.

• elapsed: The elapsed function execution time in seconds.

• iters: The number of iterations used.

• score: The score functions (i.e. score vector) evaluated at `estimates`. All elements should be close to zero.

• score_norm: The Euclidean norm of the score vector evaluated at `estimates`. Should be close to zero.

• tracking: The estimates of θ at each iteration and the log-likelihood function evaluated at those estimates. Unless `track = TRUE`, only shows the starting point of the algorithm.

## References

DuMouchel W (1999). "Bayesian Data Mining in Large Frequency Tables, With an Application to the FDA Spontaneous Reporting System." The American Statistician, 53(3), 177-190.

Meng X-L, Rubin D (1993). "Maximum likelihood estimation via the ECM algorithm: A general framework", Biometrika, 80(2), 267-278.

Millar, Russell B (2011). "Maximum Likelihood Estimation and Inference", John Wiley & Sons, Ltd, 111-112.

`uniroot` for finding a zero of the score function and `nlminb` for minimizing the negative log-likelihood function
Other hyperparameter estimation functions: `autoHyper`, `exploreHypers`
 ```1 2 3 4 5``` ```data(caers) proc <- processRaw(caers) squashed <- squashData(proc, bin_size = 100, keep_pts = 0) squashed <- squashData(squashed, count = 2, bin_size = 12, keep_pts = 24) hyperEM(squashed, theta_init_vec = c(1, 1, 2, 2, .3), consecutive = 10) ```