library(geex) library(knitr) opts_knit$set(progress = TRUE, verbose = TRUE)
In the basic set-up, M-estimation applies to estimators of the $p \times 1$ parameter $\theta=(\theta_1, \theta_2, \dots, \theta_p)^{\intercal}$ which can be obtained as solutions to an equation of the form
\begin{equation} \label{eq:psi} \sum_{i = 1}^m \psi(O_i, \theta) = 0, \end{equation}
\noindent where $O_1, \ldots, O_m$ are $m$ independent and identically distributed (iid) random variables, and the function $\psi$ returns a vector of length $p$ and does not depend on $i$ or $m$ [@stefanski2002, hereafter SB]. See SB for the case where the $O_i$ are independent but not necessarily identically distributed. The roots of (1) are referred to as M-estimators and denoted by $\hat \theta$. M-estimators can be solved for analytically in some cases or computed numerically in general. Under certain regularity conditions, the asymptotic properties of $\hat{\theta}$ can be derived from Taylor series approximations, the law of large numbers, and the central limit theorem. In brief, let $\theta_0$ be the true parameter value defined by $\int \psi(o, \theta_0) dF(o) = 0$, where $F$ is the distribution function of $O$. Let $\dot{\psi}(o, \theta) = {\partial \psi(O_i, \theta)/\partial \theta}^{\intercal}$, $A(\theta_0) = E[-\dot{\psi}(O_1, \theta_0)]$, and $B(\theta_0) = E[\psi(O_1, \theta_0)\psi(O_1, \theta_0)^{\intercal}]$. Then under suitable regularity assumptions, $\hat{\theta}$ is consistent and asymptotically Normal, i.e.,
\begin{equation} \label{eq:variance} \sqrt{m}(\hat{\theta} - \theta_0) \overset{d}{\to} N(0, V(\theta_0)) \text{ as } m \to \infty, \end{equation}
\noindent where $V(\theta_0) = A(\theta_0)^{-1} B(\theta_0) {A(\theta_0)^{-1} }^{\intercal}$. The sandwich form of $V(\theta_0)$ suggests several possible large sample variance estimators. For some problems, the analytic form of $V(\theta_0)$ can be derived and estimators of $\theta_0$ and other unknowns simply plugged into $V(\theta_0)$. Alternatively, $V(\theta_0)$ can be consistently estimated by the empirical sandwich variance estimator, where the expectations in $A(\theta)$ and $B(\theta)$ are replaced with their empirical counterparts. Let $A_i = - \dot{\psi}(O_i, \theta)|{\theta = \hat{\theta}}$, $A_m = m^{-1} \sum{i = 1}^m A_i$, $B_i = \psi(O_i, \hat{\theta}) \psi(O_i, \hat{\theta})^{\intercal}$, and $B_m = m^{-1} \sum_{i = 1}^m B_i$. The empirical sandwich estimator of the variance of $\hat{\theta}$ is:
\begin{equation} \label{eq:esve} \hat{\Sigma} = A_m^{-1} B_m {A_m^{-1}}^{\intercal}/m . \end{equation}
The geex
package provides an application programming interface (API) for carrying out M-estimation. The analyst provides a function, called estFUN
, corresponding to $\psi(O_i, \theta)$ that maps data $O_i$ to a function of $\theta$. Numerical derivatives approximate $\dot{\psi}$ so that evaluating $\hat{\Sigma}$ is entirely a computational exercise. No analytic derivations are required from the analyst.
Consider estimating the population mean $\theta = E[Y_i]$ using the sample mean $\hat{\theta} = m^{-1} \sum_{i=1}^m Y_i$ of $m$ iid random variables $Y_1, \dots, Y_m$. The estimator $\hat{\theta}$ can be expressed as the solution to the following estimating equation:
[ \sum_{i = 1}^m (Y_i - \theta) = 0. ]
\noindent which is equivalent to solving \eqref{eq:psi} where $O_i = Y_i$ and $\psi(O_i, \theta) = Y_i - \theta$. An estFUN
is a translation of $\psi$ into a function whose first argument is data
and returns a function whose first argument is theta
. An estFUN
corresponding to the estimating equation for the sample mean of $Y$ is:
meanFUN <- function(data){ function(theta){ data$Y - theta } } .
If an estimator fits into the above framework, then the user need only specify estFUN
. No other programming is required to obtain point and variance estimates. The remaining sections provide examples of translating $\psi$ into an estFUN
.
The three examples of M-estimation from SB are presented here for demonstration. For these examples, the data are $O_i = {Y_{1i}, Y_{2i}}$ where $Y_1 \sim N(5, 16)$ and $Y_2 \sim N(2, 1)$ for $m = 100$ and are included in the geexex
dataset.
The first example estimates the population mean ($\theta_1$) and variance ($\theta_2$) of $Y_1$. The solution to the estimating equations below are the sample mean $\hat{\theta}1 = m^{-1} \sum{i = 1}^m Y_{1i}$ and sample variance $\hat{\theta}2 = m^{-1} \sum{i = 1}^m (Y_{1i} - \hat{\theta}_1)^2$.
[ \psi(Y_{1i}, \theta) = \begin{pmatrix} Y_{1i} - \theta_1 \ (Y_{1i} - \theta_1)^2 - \theta_2 \end{pmatrix} ]
SB1_estfun <- function(data){ Y1 <- data$Y1 function(theta){ c(Y1 - theta[1], (Y1 - theta[1])^2 - theta[2]) } }
The primary geex
function is m_estimate
, which requires two inputs: estFUN
(the $\psi$ function), data
(the data frame containing $O_i$ for $i = 1, \dots, m$). The package defaults to rootSolve::multiroot
or estimating roots of \eqref{eq:psi}, though the solver algorithm can be specified in the root_control
argument. Starting values for rootSolve::multiroot
are passed via the root_control
argument; e.g.,
library(geex) results <- m_estimate( estFUN = SB1_estfun, data = geexex, root_control = setup_root_control(start = c(1,1)))
n <- nrow(geexex) A <- diag(1, nrow = 2) B <- with(geexex, { Ybar <- mean(Y1) B11 <- mean( (Y1 - Ybar)^2 ) B12 <- mean( (Y1 - Ybar) * ((Y1 - Ybar)^2 - B11) ) B22 <- mean( ((Y1 - Ybar)^2 - B11)^2 ) matrix( c(B11, B12, B12, B22), nrow = 2 ) }) # closed form roots theta_cls <- c(mean(geexex$Y1), # since var() divides by n - 1, not n var(geexex$Y1) * (n - 1)/ n ) # closed form sigma Sigma_cls <- (solve(A) %*% B %*% t(solve(A))) / n comparison <- list(geex = list(estimates = coef(results), vcov = vcov(results)), cls = list(estimates = theta_cls, vcov = Sigma_cls))
The m_estimate
function returns an object of the S4
class geex
, which contains an estimates
slot and vcov
slot for $\hat{\theta}$ and $\hat{\Sigma}$, respectively. These slots can be accessed by the functions coef
(or roots
) and vcov
. The point estimates obtained for $\theta_1$ and $\theta_2$ are analogous to the base R functions mean
and var
(after multiplying by $m-1/m$ for the latter). SB gave a closed form for $A(\theta_0)$ (an identity matrix) and $B(\theta_0)$ (not shown here) and suggest plugging in sample moments to compute $B(\hat{\theta})$. The point and variance estimates using both geex
and the analytic solutions are shown below}. The maximum absolute difference between either the point or variance estimates is r format(max(abs(comparison$geex$estimates - comparison$cls$estimates), abs(comparison$geex$vcov - comparison$cls$vcov)), digits = 1)
, thus demonstrating excellent agreement between the numerical results obtained from geex
and the closed form solutions for this set of estimating equations and data.
comparison
This example calculates a ratio estimator and illustrates the delta method via M-estimation. The estimating equations target the means of $Y_1$ and $Y_2$, labelled $\theta_1$ and $\theta_2$, as well as the estimand $\theta_3 = \theta_1/ \theta_2$.
[ \psi(Y_{1i}, Y_{2i}, \theta) = \begin{pmatrix} Y_{1i} - \theta_1 \ Y_{2i} - \theta_2 \ \theta_1 - \theta_3\theta_2 \end{pmatrix} ]
\noindent The solution to \eqref{eq:psi} for this $\psi$ function yields $\hat{\theta}3 = \bar{Y}_1 / \bar{Y}_2$, where $\bar{Y}_j$ denotes the sample mean of $Y{j1}, \ldots,Y_{jm}$ for $j=1,2$.
SB2_estfun <- function(data){ Y1 <- data$Y1; Y2 <- data$Y2 function(theta){ c(Y1 - theta[1], Y2 - theta[2], theta[1] - (theta[3] * theta[2]) ) } }
results <- m_estimate( estFUN = SB2_estfun, data = geexex, root_control = setup_root_control(start = c(1, 1, 1)))
# Comparison to an analytically derived sanwich estimator A <- with(geexex, { matrix( c(1 , 0, 0, 0 , 1, 0, -1, mean(Y1)/mean(Y2), mean(Y2)), byrow = TRUE, nrow = 3) }) B <- with(geexex, { matrix( c(var(Y1) , cov(Y1, Y2), 0, cov(Y1, Y2), var(Y2) , 0, 0, 0, 0), byrow = TRUE, nrow = 3) }) ## closed form roots theta_cls <- c(mean(geexex$Y1), mean(geexex$Y2)) theta_cls[3] <- theta_cls[1]/theta_cls[2] ## closed form covariance Sigma_cls <- (solve(A) %*% B %*% t(solve(A))) / nrow(geexex) comparison <- list(geex = list(estimates = coef(results), vcov = vcov(results)), cls = list(estimates = theta_cls, vcov = Sigma_cls))
SB gave closed form expressions for $A(\theta_0)$ and $B(\theta_0)$, into which we plug in appropriate estimates for the matrix components and compare to the results from geex
. The point estimates again show excellent agreement (maximum absolute difference r format(max(abs(comparison$geex$estimates - comparison$cls$estimates)), digits = 2)
), while the covariance estimates differ by the third decimal (maximum absolute difference r format(max(abs(comparison$geex$vcov - comparison$cls$vcov)), scientific = TRUE, digits = 2)
).
comparison
This example extends Example 1 to again illustrate the delta method. The estimating equations target not only the mean ($\theta_1$) and variance ($\theta_2$) of $Y_1$, but also the standard deviation ($\theta_3$) and the log of the variance ($\theta_4$) of $Y_1$.
[ \psi(Y_{1i}, \mathbf{\theta}) = \begin{pmatrix} Y_{1i} - \theta_1 \ (Y_{1i} - \theta_1)^2 - \theta_2 \ \sqrt{\theta_2} - \theta_3 \ \log(\theta_2) - \theta_4 \end{pmatrix} ]
SB3_estfun <- function(data){ Y1 <- data$Y1 function(theta){ c(Y1 - theta[1], (Y1 - theta[1])^2 - theta[2], sqrt(theta[2]) - theta[3], log(theta[2]) - theta[4]) } }
results <- m_estimate( estFUN= SB3_estfun, data = geexex, root_control = setup_root_control(start = rep(2, 4, 4, 4)))
## closed form roots theta_cls <- numeric(4) theta_cls[1] <- mean(geexex$Y1) theta_cls[2] <- sum((geexex$Y1 - theta_cls[1])^2)/nrow(geexex) theta_cls[3] <- sqrt(theta_cls[2]) theta_cls[4] <- log(theta_cls[2]) ## Compare to closed form ## theta2 <- theta_cls[2] mu3 <- moments::moment(geexex$Y1, order = 3, central = TRUE) mu4 <- moments::moment(geexex$Y1, order = 4, central = TRUE) ## closed form covariance Sigma_cls <- matrix( c(theta2, mu3, mu3/(2*sqrt(theta2)), mu3/theta2, mu3, mu4 - theta2^2, (mu4 - theta2^2)/(2*sqrt(theta2)), (mu4 - theta2^2)/theta2, mu3/(2 * sqrt(theta2)), (mu4 - theta2^2)/(2*sqrt(theta2)), (mu4 - theta2^2)/(4*theta2), (mu4 - theta2^2)/(2*theta2^(3/2)), mu3/theta2, (mu4 - theta2^2)/theta2, (mu4 - theta2^2)/(2*theta2^(3/2)), (mu4/theta2^2) - 1) , nrow = 4, byrow = TRUE) / nrow(geexex) ## closed form covariance # Sigma_cls <- (solve(A) %*% B %*% t(solve(A))) / n comparison <- list(geex = list(estimates = coef(results), vcov = vcov(results)), cls = list(estimates = theta_cls, vcov = Sigma_cls))
SB again provided a closed form for $A(\theta_0)$ and $B(\theta_0)$, which we compare to the geex
results. The maximum absolute difference between geex
and the closed form estimates for both the parameters and the covariance is r format(max(abs(comparison$geex$estimates - comparison$cls$estimates), abs(comparison$geex$vcov - comparison$cls$vcov)), digits = 2)
.
comparison
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.