library(geex) library(knitr) opts_knit$set(progress = TRUE, verbose = TRUE)
In the basic set-up, M-estimation applies to estimators of the $p \times 1$ parameter $\theta=(\theta_1, \theta_2, \dots, \theta_p)^{\intercal}$ which can be obtained as solutions to an equation of the form
\begin{equation} \label{eq:psi} \sum_{i = 1}^m \psi(O_i, \theta) = 0, \end{equation}
\noindent where $O_1, \ldots, O_m$ are $m$ independent and identically distributed (iid) random variables, and the function $\psi$ returns a vector of length $p$ and does not depend on $i$ or $m$ [@stefanski2002, hereafter SB]. See SB for the case where the $O_i$ are independent but not necessarily identically distributed. The roots of (1) are referred to as M-estimators and denoted by $\hat \theta$. M-estimators can be solved for analytically in some cases or computed numerically in general. Under certain regularity conditions, the asymptotic properties of $\hat{\theta}$ can be derived from Taylor series approximations, the law of large numbers, and the central limit theorem. In brief, let $\theta_0$ be the true parameter value defined by $\int \psi(o, \theta_0) dF(o) = 0$, where $F$ is the distribution function of $O$. Let $\dot{\psi}(o, \theta) = {\partial \psi(O_i, \theta)/\partial \theta}^{\intercal}$, $A(\theta_0) = E[-\dot{\psi}(O_1, \theta_0)]$, and $B(\theta_0) = E[\psi(O_1, \theta_0)\psi(O_1, \theta_0)^{\intercal}]$. Then under suitable regularity assumptions, $\hat{\theta}$ is consistent and asymptotically Normal, i.e.,
\begin{equation} \label{eq:variance} \sqrt{m}(\hat{\theta} - \theta_0) \overset{d}{\to} N(0, V(\theta_0)) \text{ as } m \to \infty, \end{equation}
\noindent where $V(\theta_0) = A(\theta_0)^{-1} B(\theta_0) {A(\theta_0)^{-1} }^{\intercal}$. The sandwich form of $V(\theta_0)$ suggests several possible large sample variance estimators. For some problems, the analytic form of $V(\theta_0)$ can be derived and estimators of $\theta_0$ and other unknowns simply plugged into $V(\theta_0)$. Alternatively, $V(\theta_0)$ can be consistently estimated by the empirical sandwich variance estimator, where the expectations in $A(\theta)$ and $B(\theta)$ are replaced with their empirical counterparts. Let $A_i = - \dot{\psi}(O_i, \theta)|{\theta = \hat{\theta}}$, $A_m = m^{-1} \sum{i = 1}^m A_i$, $B_i = \psi(O_i, \hat{\theta}) \psi(O_i, \hat{\theta})^{\intercal}$, and $B_m = m^{-1} \sum_{i = 1}^m B_i$. The empirical sandwich estimator of the variance of $\hat{\theta}$ is:
\begin{equation} \label{eq:esve} \hat{\Sigma} = A_m^{-1} B_m {A_m^{-1}}^{\intercal}/m . \end{equation}
The geex
package provides an application programming interface (API) for carrying out M-estimation. The analyst provides a function, called estFUN
, corresponding to $\psi(O_i, \theta)$ that maps data $O_i$ to a function of $\theta$. Numerical derivatives approximate $\dot{\psi}$ so that evaluating $\hat{\Sigma}$ is entirely a computational exercise. No analytic derivations are required from the analyst.
Consider estimating the population mean $\theta = E[Y_i]$ using the sample mean $\hat{\theta} = m^{-1} \sum_{i=1}^m Y_i$ of $m$ iid random variables $Y_1, \dots, Y_m$. The estimator $\hat{\theta}$ can be expressed as the solution to the following estimating equation:
[ \sum_{i = 1}^m (Y_i - \theta) = 0. ]
\noindent which is equivalent to solving \eqref{eq:psi} where $O_i = Y_i$ and $\psi(O_i, \theta) = Y_i - \theta$. An estFUN
is a translation of $\psi$ into a function whose first argument is data
and returns a function whose first argument is theta
. An estFUN
corresponding to the estimating equation for the sample mean of $Y$ is:
meanFUN <- function(data){ function(theta){ data$Y - theta } } .
If an estimator fits into the above framework, then the user need only specify estFUN
. No other programming is required to obtain point and variance estimates. The remaining sections provide examples of translating $\psi$ into an estFUN
.
The three examples of M-estimation from SB are presented here for demonstration. For these examples, the data are $O_i = {Y_{1i}, Y_{2i}}$ where $Y_1 \sim N(5, 16)$ and $Y_2 \sim N(2, 1)$ for $m = 100$ and are included in the geexex
dataset.
The first example estimates the population mean ($\theta_1$) and variance ($\theta_2$) of $Y_1$. The solution to the estimating equations below are the sample mean $\hat{\theta}1 = m^{-1} \sum{i = 1}^m Y_{1i}$ and sample variance $\hat{\theta}2 = m^{-1} \sum{i = 1}^m (Y_{1i} - \hat{\theta}_1)^2$.
[ \psi(Y_{1i}, \theta) = \begin{pmatrix} Y_{1i} - \theta_1 \ (Y_{1i} - \theta_1)^2 - \theta_2 \end{pmatrix} ]
SB1_estfun <- function(data){ Y1 <- data$Y1 function(theta){ c(Y1 - theta[1], (Y1 - theta[1])^2 - theta[2]) } }
The primary geex
function is m_estimate
, which requires two inputs: estFUN
(the $\psi$ function), data
(the data frame containing $O_i$ for $i = 1, \dots, m$). The package defaults to rootSolve::multiroot
or estimating roots of \eqref{eq:psi}, though the solver algorithm can be specified in the root_control
argument. Starting values for rootSolve::multiroot
are passed via the root_control
argument; e.g.,
library(geex) results <- m_estimate( estFUN = SB1_estfun, data = geexex, root_control = setup_root_control(start = c(1,1)))
n <- nrow(geexex) A <- diag(1, nrow = 2) B <- with(geexex, { Ybar <- mean(Y1) B11 <- mean( (Y1 - Ybar)^2 ) B12 <- mean( (Y1 - Ybar) * ((Y1 - Ybar)^2 - B11) ) B22 <- mean( ((Y1 - Ybar)^2 - B11)^2 ) matrix( c(B11, B12, B12, B22), nrow = 2 ) }) # closed form roots theta_cls <- c(mean(geexex$Y1), # since var() divides by n - 1, not n var(geexex$Y1) * (n - 1)/ n ) # closed form sigma Sigma_cls <- (solve(A) %*% B %*% t(solve(A))) / n comparison <- list(geex = list(estimates = coef(results), vcov = vcov(results)), cls = list(estimates = theta_cls, vcov = Sigma_cls))
The m_estimate
function returns an object of the S4
class geex
, which contains an estimates
slot and vcov
slot for $\hat{\theta}$ and $\hat{\Sigma}$, respectively. These slots can be accessed by the functions coef
(or roots
) and vcov
. The point estimates obtained for $\theta_1$ and $\theta_2$ are analogous to the base R functions mean
and var
(after multiplying by $m-1/m$ for the latter). SB gave a closed form for $A(\theta_0)$ (an identity matrix) and $B(\theta_0)$ (not shown here) and suggest plugging in sample moments to compute $B(\hat{\theta})$. The point and variance estimates using both geex
and the analytic solutions are shown below}. The maximum absolute difference between either the point or variance estimates is r format(max(abs(comparison$geex$estimates - comparison$cls$estimates), abs(comparison$geex$vcov - comparison$cls$vcov)), digits = 1)
, thus demonstrating excellent agreement between the numerical results obtained from geex
and the closed form solutions for this set of estimating equations and data.
comparison
This example calculates a ratio estimator and illustrates the delta method via M-estimation. The estimating equations target the means of $Y_1$ and $Y_2$, labelled $\theta_1$ and $\theta_2$, as well as the estimand $\theta_3 = \theta_1/ \theta_2$.
[ \psi(Y_{1i}, Y_{2i}, \theta) = \begin{pmatrix} Y_{1i} - \theta_1 \ Y_{2i} - \theta_2 \ \theta_1 - \theta_3\theta_2 \end{pmatrix} ]
\noindent The solution to \eqref{eq:psi} for this $\psi$ function yields $\hat{\theta}3 = \bar{Y}_1 / \bar{Y}_2$, where $\bar{Y}_j$ denotes the sample mean of $Y{j1}, \ldots,Y_{jm}$ for $j=1,2$.
SB2_estfun <- function(data){ Y1 <- data$Y1; Y2 <- data$Y2 function(theta){ c(Y1 - theta[1], Y2 - theta[2], theta[1] - (theta[3] * theta[2]) ) } }
results <- m_estimate( estFUN = SB2_estfun, data = geexex, root_control = setup_root_control(start = c(1, 1, 1)))
# Comparison to an analytically derived sanwich estimator A <- with(geexex, { matrix( c(1 , 0, 0, 0 , 1, 0, -1, mean(Y1)/mean(Y2), mean(Y2)), byrow = TRUE, nrow = 3) }) B <- with(geexex, { matrix( c(var(Y1) , cov(Y1, Y2), 0, cov(Y1, Y2), var(Y2) , 0, 0, 0, 0), byrow = TRUE, nrow = 3) }) ## closed form roots theta_cls <- c(mean(geexex$Y1), mean(geexex$Y2)) theta_cls[3] <- theta_cls[1]/theta_cls[2] ## closed form covariance Sigma_cls <- (solve(A) %*% B %*% t(solve(A))) / nrow(geexex) comparison <- list(geex = list(estimates = coef(results), vcov = vcov(results)), cls = list(estimates = theta_cls, vcov = Sigma_cls))
SB gave closed form expressions for $A(\theta_0)$ and $B(\theta_0)$, into which we plug in appropriate estimates for the matrix components and compare to the results from geex
. The point estimates again show excellent agreement (maximum absolute difference r format(max(abs(comparison$geex$estimates - comparison$cls$estimates)), digits = 2)
), while the covariance estimates differ by the third decimal (maximum absolute difference r format(max(abs(comparison$geex$vcov - comparison$cls$vcov)), scientific = TRUE, digits = 2)
).
comparison
This example extends Example 1 to again illustrate the delta method. The estimating equations target not only the mean ($\theta_1$) and variance ($\theta_2$) of $Y_1$, but also the standard deviation ($\theta_3$) and the log of the variance ($\theta_4$) of $Y_1$.
[ \psi(Y_{1i}, \mathbf{\theta}) = \begin{pmatrix} Y_{1i} - \theta_1 \ (Y_{1i} - \theta_1)^2 - \theta_2 \ \sqrt{\theta_2} - \theta_3 \ \log(\theta_2) - \theta_4 \end{pmatrix} ]
SB3_estfun <- function(data){ Y1 <- data$Y1 function(theta){ c(Y1 - theta[1], (Y1 - theta[1])^2 - theta[2], sqrt(theta[2]) - theta[3], log(theta[2]) - theta[4]) } }
results <- m_estimate( estFUN= SB3_estfun, data = geexex, root_control = setup_root_control(start = rep(2, 4, 4, 4)))
## closed form roots theta_cls <- numeric(4) theta_cls[1] <- mean(geexex$Y1) theta_cls[2] <- sum((geexex$Y1 - theta_cls[1])^2)/nrow(geexex) theta_cls[3] <- sqrt(theta_cls[2]) theta_cls[4] <- log(theta_cls[2]) ## Compare to closed form ## theta2 <- theta_cls[2] mu3 <- moments::moment(geexex$Y1, order = 3, central = TRUE) mu4 <- moments::moment(geexex$Y1, order = 4, central = TRUE) ## closed form covariance Sigma_cls <- matrix( c(theta2, mu3, mu3/(2*sqrt(theta2)), mu3/theta2, mu3, mu4 - theta2^2, (mu4 - theta2^2)/(2*sqrt(theta2)), (mu4 - theta2^2)/theta2, mu3/(2 * sqrt(theta2)), (mu4 - theta2^2)/(2*sqrt(theta2)), (mu4 - theta2^2)/(4*theta2), (mu4 - theta2^2)/(2*theta2^(3/2)), mu3/theta2, (mu4 - theta2^2)/theta2, (mu4 - theta2^2)/(2*theta2^(3/2)), (mu4/theta2^2) - 1) , nrow = 4, byrow = TRUE) / nrow(geexex) ## closed form covariance # Sigma_cls <- (solve(A) %*% B %*% t(solve(A))) / n comparison <- list(geex = list(estimates = coef(results), vcov = vcov(results)), cls = list(estimates = theta_cls, vcov = Sigma_cls))
SB again provided a closed form for $A(\theta_0)$ and $B(\theta_0)$, which we compare to the geex
results. The maximum absolute difference between geex
and the closed form estimates for both the parameters and the covariance is r format(max(abs(comparison$geex$estimates - comparison$cls$estimates), abs(comparison$geex$vcov - comparison$cls$vcov)), digits = 2)
.
comparison
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.