In parsnip: A Common API to Modeling and Analysis Functions

#| child: aaa.Rmd
#| include: false

r descr_models("poisson_reg", "gee")

Tuning Parameters

This model has no formal tuning parameters. It may be beneficial to determine the appropriate correlation structure to use, but this typically does not affect the predicted value of the model. It does have an effect on the inferential results and parameter covariance values.

Translation from parsnip to the original package

r uses_extension("poisson_reg", "gee", "regression")

#| label: gee-csl
library(multilevelmod)

poisson_reg(engine = "gee") |> 
  set_engine("gee") |> 
  translate()

multilevelmod::gee_fit() is a wrapper model around gee().

Preprocessing requirements

There are no specific preprocessing needs. However, it is helpful to keep the clustering/subject identifier column as factor or character (instead of making them into dummy variables). See the examples in the next section.

Case weights

#| child: template-no-case-weights.Rmd

Other details

Both gee:gee() and gee:geepack() specify the id/cluster variable using an argument id that requires a vector. parsnip doesn't work that way so we enable this model to be fit using a artificial function id_var() to be used in the formula. So, in the original package, the call would look like:

gee(breaks ~ tension, id = wool, data = warpbreaks, corstr = "exchangeable")

With parsnip, we suggest using the formula method when fitting:

library(tidymodels)

poisson_reg() |> 
  set_engine("gee", corstr = "exchangeable") |> 
  fit(y ~ time + x + id_var(subject), data = longitudinal_counts)

When using tidymodels infrastructure, it may be better to use a workflow. In this case, you can add the appropriate columns using add_variables() then supply the GEE formula when adding the model:

library(tidymodels)

gee_spec <- 
  poisson_reg() |> 
  set_engine("gee", corstr = "exchangeable")

gee_wflow <- 
  workflow() |> 
  # The data are included as-is using:
  add_variables(outcomes = y, predictors = c(time, x, subject)) |> 
  add_model(gee_spec, formula = y ~ time + x + id_var(subject))

fit(gee_wflow, data = longitudinal_counts)

#| child: template-gee-silent.Rmd

Also, because of issues with the gee() function, a supplementary call to glm() is needed to get the rank and QR decomposition objects so that predict() can be used.

References

Liang, K.Y. and Zeger, S.L. (1986) Longitudinal data analysis using generalized linear models. Biometrika, 73 13–22.
Zeger, S.L. and Liang, K.Y. (1986) Longitudinal data analysis for discrete and continuous outcomes. Biometrics, 42 121–130.