The design and structure of geex
In geex: An API for M-Estimation

The details below are for those interested in how geex is organized. It is not necessary for using geex.

The Estimating Function

The design of geex starts with the key to M-estimation, the estimating function:

[ \psi(O_i, \theta) . ]

geex composes $\psi$ with two R functions: the "outer" estFUN and the "inner" psiFUN. In pseudocode, $\psi(O_i, \theta) =$:

estFUN <- function(O_i){
  psiFUN <- function(theta){
     psi(O_i, theta)
  }
  return(psiFUN)
}

The reason for composing the $\psi$ function in this way is that in order to do estimation (finding roots) and inference (computing the empirical sandwich variance estimator), $\psi$ needs to be function of $\theta$. M-estimation theory gives the following instructions:

To estimate $\hat{\theta}$, we need to find roots of $G_m = \sum_i \psi(O_i, \theta) = 0$.
To estimate the empirical sandwich variance estimator, we need two quantities for each unit: $A_i = - (\partial \psi(O_i, \theta)/\partial \theta)|_{\theta = \hat{\theta}}$ and $B_i = \psi(O_i, \hat{\theta})\psi(O_i, \theta)^{\intercal}$.

With $\hat{\theta}$ in hand, the quantity $B_i$ is simple to compute. The computational challenges of M-estimation, then, are finding roots of $G_m$ and calculating the derivative $A_i$. By composing $\psi$ of two functions in geex, one can first do all the manipulations of $O_i$ (data) that are independent of $\theta$. In a sense, estFUN "fixes" the data so that numerical routines only need deal with $\theta$ in psiFUN.

M-estimation basis

Before describing the mechanics of how geex finding roots of $G_m$ and computes derivatives of $\psi$, let's look at the m_estimation_basis S4 object which forms the basis of all computations in geex.

An m_estimation_basis object, at a minimum needs two objects: an estFUN and a data.frame. Let's use a simple estFUN that estimates the mean and variance of Y1 in the geexex dataset.

library(geex)
library(dplyr)

myee <- function(data){
  Y1 <- data$Y1
  function(theta){
    c(Y1 - theta[1],
      (Y1 - theta[1])^2 - theta[2])
  }
}

Now we can create a basis:

mybasis <- new("m_estimation_basis",
               .estFUN = myee,
               .data   = geexex)

And look at what this object contains:

slotNames(mybasis)

Two slots are worth examining. First, .psiFUN_list is a list of functions:

mybasis@.psiFUN_list[1:2]

This object is essentially equivalent to:

m <- nrow(geexex)
lapply(split(geexex, f = 1:m), function(O_i){
  myee(O_i)
})

From this list of functions, we can compute $A_i$, and by summing across the list, form $G_m$. The latter is found in:

mybasis@.GFUN

Finding roots

Now that we have $G_m$ as a function of theta, we can found its roots using a root-finding algorithm such as rootSolve::multiroot:

rootSolve::multiroot(
  f = mybasis@.GFUN, 
  start = c(0, 0))

Within geex this is done with the estimate_GFUN_roots function. To illustrate, I first need to update the .control slot in mybasis with starting values for multiroot.

mycontrol <- new('geex_control', .root = setup_root_control(start = c(1, 1)))
mybasis@.control <- mycontrol
roots <- mybasis %>%
  estimate_GFUN_roots()
roots

Note that is bad form to assign S4 slot with someS4object@aslot <- something, but I do so here because I have not created a generic function for setting the .control slot.

Computing the Empirical Sandwich Variance Estimator

In the last section, we found $\hat{\theta}$, which we now use to compute the $A_i$ and $B_i$ matrices.

geex uses the numDeriv::jacobian function to numerically evaluate derivatives. For example, $A_1 = - (\partial \psi(O_1, \theta)/\partial \theta)|_{\theta = \hat{\theta}}$ for this example is:

-numDeriv::jacobian(func = mybasis@.psiFUN_list[[1]], x = roots$root)

geex performs this operation for each $i = 1, \dots, m$ to yield a list of $A_i$ matrices. Then summing across this list yields $A = \sum_i A_i$. The estimate_sandwich_matrices function computes the list of $A_i$, $B_i$ and $A$ and $B$:

mats <- mybasis %>%
  estimate_sandwich_matrices(.theta = roots$root) 

# Compare to the numDeriv computation above
grab_bread_list(mats)[[1]]

Finally, computing $\hat{\Sigma} = A^{-1} B (A^{-1})^{\intercal}$ is accomplished with the compute_sigma function.

mats %>%
  {compute_sigma(A = grab_bread(.), B = grab_meat(.))}

M-estimation with `m_estimate`

All of the operations described above are wrapped and packaged in the m_estimate function:

m_estimate(
  estFUN = myee,
  data   = geexex,
  root_control = setup_root_control(start = c(0, 0))
)

Any scripts or data that you put into this service are public.

geex documentation built on Aug. 8, 2022, 5:05 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

geex
An API for M-Estimation

The design and structure of geex
In geex: An API for M-Estimation

The Estimating Function

M-estimation basis

Finding roots

Computing the Empirical Sandwich Variance Estimator

M-estimation with `m_estimate`

Try the geex package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

geex An API for M-Estimation

The design and structure of geex In geex: An API for M-Estimation

The Estimating Function

M-estimation basis

Finding roots

Computing the Empirical Sandwich Variance Estimator

M-estimation with m_estimate

Try the geex package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

geex
An API for M-Estimation

The design and structure of geex
In geex: An API for M-Estimation

M-estimation with `m_estimate`