In mkvasnicka/stargate: Simple Formatted Regression Tables

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

# library(stargate)

This vignette discusses why and how create a new sg_model() method.

Why new methods for `sg_model()`

Let's see it in the case of fixest package. The package allows us to do several things at once: to estimate fixed effects, to compute heteroskedastic clustered standard errors, etc. Let's see a simple example modified from its vignette. It estimate a Poisson model with four fixed effects, heteroskedastic standard errors clustered on Origin:

library(fixest)
data(trade)
gravity_pois <- fepois(Euros ~ log(dist_km) | Origin + Destination + Product + Year,
                       se = "hetero",
                       trade)

fixest origina function presents all these pieces of information in a nice table:

etable(gravity_pois)

However, these pieces are partially lost with the summary() function:

summary(gravity_pois)

They are completely lost with broom's tidy() function:

library(broom)
tidy(gravity_pois)

As the default method of sg_model() uses broom's tidy() and glance() functions, the information is lost too. This way, we have only three options:

We stick with fixest's etable(). This way we have to use a special presentation function for each estimation package we use---and we cannot mix estimates from different packages in one table. Moreover, most packages simply do not provide such a nice function as fixest does.
We use the default method of sg_model(). This way we can use the same workflow for every estimation package we use (provided that broom has methods for it) and we can combine different models in one table. However, we loose important pieces of information.
We write a new method for sg_model() which keeps all the information.

Moreover, there are two other reasons to write new sg_model() methods:

broom does not implement tidy() and glance for each estimation function in the wild and these packages provide at most a specific method of summary() function. In this case, the new method of sg_model() may be our only chance.
Even when there is a broom method, it does not cover well various ways to compute standard errors.

The structure of sg_model class

A sg_model class is a list with the class attribute set to c("sg_model", "stargate"). Presently, it has three slots---each of them is a tibble

coefs ... the estimated coefficients; in the default method, the result of broom::tidy()
stats ... the model statistics---one statistic per row, each enveloped in a list (i.e.\ the value column is a list); in the default mehtods, the result of broom::glance() enveloped in list and pivotted longer
name ... the model's name ("" if not set)

The look at the default method shows more precisely how it looks like:

sg_model.default <- function(m, name = "", ...) {
    if (inherits(m, "stargate"))
        return(m)
    coefs <- broom::tidy(m)
    stats <- broom::glance(m) |>
        dplyr::mutate(dplyr::across(dplyr::everything(), list)) |>
        tidyr::pivot_longer(dplyr::everything(),
                            names_to = "stat", values_to = "val")
    name <- tibble::tibble(name = name)
    structure(list(coefs = coefs, stats = stats, name = name),
              class = c("sg_model", "stargate"))
}

The present structure is not sufficient to cover the results of fixest's estimation functions. It lacks at least two things:

the ability to include "word" parameters, e.g.\ "Yes" for the fixed effects and
the ability to keep the information on the kind of standard errors the model estimates, e.g.\ "heteroskedasticity-robust"

There are many ways how to handle it:

"word parameters":
- add them to coefs table to the estimate column---it would have to be a list column
- create a new table
- ???
standard error information:
- add them to stats table
- add a new table
- ???

It is necessary to design how to include these things into the data structure. Moreover, it is necessary to find out which other items are presently missing to make the structure rich enough to be able to encompass most of the estimated models without a loss of information.

TODO