EL0: Uni-variate empirical likelihood via direct lambda search

View source: R/elfunctions.R

EL0R Documentation

Uni-variate empirical likelihood via direct lambda search

Description

Empirical likelihood with counts to solve one-dimensional problems efficiently with Brent's root search algorithm. Conducts an empirical likelihood ratio test of the hypothesis that the mean of z is mu. The names of the elements in the returned list are consistent with the original R code in \insertCiteowen2017weightedsmoothemplik.

Usage

EL0(
  z,
  mu = NULL,
  ct = NULL,
  shift = NULL,
  return.weights = FALSE,
  SEL = FALSE,
  weight.tolerance = NULL,
  boundary.tolerance = 1e-09,
  trunc.to = 0,
  chull.fail = c("taylor", "wald", "adjusted", "balanced", "none"),
  uniroot.control = list(),
  verbose = FALSE
)

Arguments

z

Numeric data vector.

mu

Hypothesized mean of z in the moment condition.

ct

Numeric count variable with non-negative values that indicates the multiplicity of observations. Can be fractional. Very small counts below the threshold weight.tolerance are zeroed.

shift

The value to add in the denominator (useful in case there are extra Lagrange multipliers): 1 + lambda'Z + shift.

return.weights

Logical: if TRUE, individual EL weights are computed and returned. Setting this to FALSE gives huge memory savings in large data sets, especially when smoothing is used.

SEL

If FALSE, then the boundaries for the lambda search are based on the total sum of counts, like in vanilla empirical likelihood, due to formula (2.9) in \insertCiteowen2001empiricalsmoothemplik, otherwise according to Cosma et al. (2019, p. 170, the topmost formula).

weight.tolerance

Weight tolerance for counts to improve numerical stability (similar to the ones in Art B. Owen's 2017 code, but adapting to the sample size).

boundary.tolerance

Relative tolerance for determining when the lambda is not an interior solution because it is too close to the boundary. Corresponds to a fraction of the interval range length.

trunc.to

Counts under weight.tolerance will be set to this value. In most cases, setting this to 0 or weight.tolerance is a viable solution of the zero-denominator problem.

chull.fail

A character: what to do if the convex hull of z does not contain mu (spanning condition does not hold). "taylor" creates a Taylor approximation of the log-ELR function near the ends of the sample. "wald" smoothly transitions between the log-ELR function into -0.5 * the Wald statistic for the weighted mean of z. "adjusted" invokes the method of \insertCitechen2008adjustedsmoothemplik, where an extra observation is added to ensure that the convex hull contains the mean, and "balanced" calls the method of \insertCiteemerson2009calibrationsmoothemplik and \insertCiteliu2010adjustedsmoothemplik with two extra points.

uniroot.control

A list passed to the brentZero.

verbose

Logical: if TRUE, prints warnings.

Details

This function provides the core functionality for univariate empirical likelihood. The technical details is given in \insertCitecosma2019inferencesmoothemplik, although the algorithm used in that paper is slower than the one provided by this function.

Since we know that the EL probabilities belong to (0, 1), the interval (bracket) for \lambda search can be determined in the spirit of formula (2.9) from \insertCiteowen2001empiricalsmoothemplik. Let z^*_i := z_i - \mu be the recentred observations.

p_i = c_i/N \cdot (1 + \lambda z^*_i + s)^{-1}

The probabilities are bounded from above: p_i<1 for all i, therefore,

c_i/N \cdot (1 + \lambda z^*_i + s)^{-1} < 1

c_i/N - 1 - s < \lambda z^*_i

Two cases: either z^*_i<0, or z^*_i>0 (cases with z^*_i=0 are trivially excluded because they do not affect the EL). Then,

(c_i/N - 1 - s)/z^*_i > \lambda,\ \forall i: z^*_i<0

(c_i/N - 1 - s)/z^*_i < \lambda,\ \forall i: z^*_i>0

which defines the search bracket:

\lambda_{\min} := \max_{i: z^*_i>0} (c_i/N - 1 - s)/z^*_i

\lambda_{\max} := \min_{i: z^*_i<0} (c_i/N - 1 - s)/z^*_i

\lambda_{\min} < \lambda < \lambda_{\max}

(This derivation contains s, which is the extra shift that extends the function to allow mixed conditional and unconditional estimation; Owen's textbook formula corresponds to s = 0.)

The actual tolerance of the lambda search in brentZero is 2 |\lambda_{\max}| \epsilon_m + \mathrm{tol}/2, where tol can be set in uniroot.control and \epsilon_m is .Machine$double.eps.

The sum of log-weights is maximised without Taylor expansion, forcing mu to be inside the convex hull of z. If a violation is happening, consider using the chull.fail argument or switching to Euclidean likelihood via [EuL()].

Value

A list with the following elements:

logelr

Logarithm of the empirical likelihood ratio.

lam

The Lagrange multiplier.

wts

Observation weights/probabilities (of the same length as z).

converged

TRUE if the algorithm converged, FALSE otherwise (usually means that mu is not within the range of z, i.e. the one-dimensional convex hull of z).

iter

The number of iterations used (from brentZero).

bracket

The admissible interval for lambda (that is, yielding weights between 0 and 1).

estim.prec

The approximate estimated precision of lambda (from brentZero).

f.root

The value of the derivative of the objective function w.r.t. lambda at the root (from brentZero). Values > sqrt(.Machine$double.eps) indicate convergence problems.

exitcode

An integer indicating the reason of termination.

message

Character string describing the optimisation termination status.

References

\insertAllCited

See Also

[EL()]

Examples

# Figure 2.4 from Owen (2001) -- with a slightly different data point
earth <- c(
  5.5, 5.61, 4.88, 5.07, 5.26, 5.55, 5.36, 5.29, 5.58, 5.65, 5.57, 5.53, 5.62, 5.29,
  5.44, 5.34, 5.79, 5.1, 5.27, 5.39, 5.42, 5.47, 5.63, 5.34, 5.46, 5.3, 5.75, 5.68, 5.85
)
set.seed(1)
system.time(r1 <- replicate(40, EL(sample(earth, replace = TRUE), mu = 5.517)))
set.seed(1)
system.time(r2 <- replicate(40, EL0(sample(earth, replace = TRUE), mu = 5.517)))
plot(apply(r1, 2, "[[", "logelr"), apply(r1, 2, "[[", "logelr") - apply(r2, 2, "[[", "logelr"),
     bty = "n", xlab = "log(ELR) computed via dampened Newthon method",
     main = "Discrepancy between EL and EL0", ylab = "")
abline(h = 0, lty = 2)

# Handling the convex hull violation differently
EL0(1:9, chull.fail = "none")
EL0(1:9, chull.fail = "taylor")
EL0(1:9, chull.fail = "wald")

# Interpolation to well-defined branches outside the convex hull
mu.seq <- seq(-1, 7, 0.1)
wEL1 <- -2*sapply(mu.seq, function(m) EL0(1:9, mu = m, chull.fail = "none")$logelr)
wEL2 <- -2*sapply(mu.seq, function(m) EL0(1:9, mu = m, chull.fail = "taylor")$logelr)
wEL3 <- -2*sapply(mu.seq, function(m) EL0(1:9, mu = m, chull.fail = "wald")$logelr)
plot(mu.seq, wEL1)
lines(mu.seq, wEL2, col = 2)
lines(mu.seq, wEL3, col = 4)

# Warning: depending on the compiler, the discrepancy between EL and EL0
# can be one million (1) times larger than the machine epsilon despite both of them
# being written in pure R
# The results from Apple clang-1400.0.29.202 and Fortran GCC 12.2.0 are different from
# those obtained under Ubuntu 22.04.4 + GCC 11.4.0-1ubuntu1~22.04,
# Arch Linux 6.6.21 + GCC 14.1.1, and Windows Server 2022 + GCC 13.2.0
out1 <- EL(earth, mu = 5.517)[1:4]
out2 <- EL0(earth, mu = 5.517, return.weights = TRUE)[1:4]
print(c(out1$lam, out2$lam), 16)

# Value of lambda                                 EL                  EL0
# aarch64-apple-darwin20         -1.5631313955??????   -1.5631313957?????
# Windows, Ubuntu, Arch           -1.563131395492627   -1.563131395492627

smoothemplik documentation built on Aug. 22, 2025, 1:11 a.m.