knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" )
brglm2 provides tools for the estimation and inference from generalized linear models using various methods for bias reduction. brglm2 supports all generalized linear models supported in R, and provides methods for multinomial logistic regression (nominal responses), adjacent category models (ordinal responses), and negative binomial regression (for potentially overdispered count responses).
Reduction of estimation bias is achieved by solving either the mean-bias reducing adjusted score equations in Firth (1993) and Kosmidis & Firth (2009) or the median-bias reducing adjusted score equations in Kenne et al (2017), or through the direct subtraction of an estimate of the bias of the maximum likelihood estimator from the maximum likelihood estimates as prescribed in Cordeiro and McCullagh (1991). Kosmidis et al (2020) provides a unifying framework and algorithms for mean and median bias reduction for the estimation of generalized linear models.
In the special case of generalized linear models for binomial and multinomial responses (both ordinal and nominal), the adjusted score equations return estimates with improved frequentist properties, that are also always finite, even in cases where the maximum likelihood estimates are infinite (e.g. complete and quasi-complete separation). See, Kosmidis & Firth (2021) for the proof of the latter result in the case of mean bias reduction for logistic regression (and, for more general binomial-response models where the likelihood is penalized by a power of the Jeffreys' invariant prior).
The core model fitters are implemented by the functions brglm_fit()
(univariate generalized linear models), brmultinom()
(baseline
category logit models for nominal multinomial responses), bracl()
(adjacent category logit models for ordinal multinomial responses),
and brnb()
for negative binomial regression.
Install the current version from CRAN:
install.packages("brglm2")
or the development version from github:
# install.packages("remotes") remotes::install_github("ikosmidis/brglm2", ref = "develop")
Below we follow the example of Heinze and Schemper
(2002) and fit a logistic
regression model using maximum likelihood (ML) to analyze data from a
study on endometrial cancer (see ?brglm2::endometrial
for details
and references).
library("brglm2") data("endometrial", package = "brglm2") modML <- glm(HG ~ NV + PI + EH, family = binomial("logit"), data = endometrial) summary(modML)
The ML estimate of the parameter for NV
is
actually infinite, as can be quickly verified using the
detectseparation R package
# install.packages("detectseparation") library("detectseparation") update(modML, method = "detect_separation")
The reported, apparently finite estimate r
round(coef(summary(modML))["NV", "Estimate"], 3)
for NV
is merely
due to false convergence of the iterative estimation procedure for
ML. The same is true for the estimated standard error, and, hence the
value r round(coef(summary(modML))["NV", "z value"], 3)
for the
$z$-statistic cannot be trusted for inference on the size of the
effect for NV
.
As mentioned earlier, many of the estimation methods implemented in brglm2 not only return estimates with improved frequentist properties (e.g. asymptotically smaller mean and median bias than what ML typically delivers), but also estimates and estimated standard errors that are always finite in binomial (e.g. logistic, probit, and complementary log-log regression) and multinomial regression models (e.g. baseline category logit models for nominal responses, and adjacent category logit models for ordinal responses). For example, the code chunk below refits the model on the endometrial cancer study data using mean bias reduction.
summary(update(modML, method = "brglm_fit"))
A quick comparison of the output from mean bias reduction to that from
ML reveals a dramatic change in the $z$-statistic for NV
, now that
estimates and estimated standard errors are finite. In particular, the
evidence against the null of NV
not contributing to the model in the
presence of the other covariates being now stronger.
See ?brglm_fit
and ?brglm_control
for more examples and the other
estimation methods for generalized linear models, including median
bias reduction and maximum penalized likelihood with Jeffreys' prior
penalty. Also do not forget to take a look at the vignettes
(vignette(package = "brglm2")
) for details and more case studies.
See, also ?expo
for a method to estimate the exponential of
regression parameters, such as odds ratios from logistic regression
models, while controlling for other covariate information. Estimation
can be performed using maximum likelihood or various estimators with
smaller asymptotic mean and median bias, that are also guaranteed to
be finite, even if the corresponding maximum likelihood estimates are
infinite. For example, modML
is a logistic regression fit, so the
exponential of each coefficient is an odds ratio while controlling for
other covariates. To estimate those odds ratios using the
correction*
method for mean bias reduction (see ?expo
for details)
we do
expoRB <- expo(modML, type = "correction*") expoRB
The odds ratio between presence of neovasculation and high histology
grade (HG
) is estimated to be r round(coef(expoRB)["NV"], 3)
, while
controlling for PI and EH. So, for each value of PI
and EH
, the
estimated odds of high histology grade are about r round(coef(expoRB)["NV"], 1)
times higher when neovasculation is present. An approximate 95\% interval for the latter odds ratio is
r paste0("(", paste(round(expoRB$ci["NV",], 1), collapse = ", "), ")")
providing evidence of association between NV
and HG
while
controlling for PI
and EH
. Note here that, the maximum likelihood estimate of the odds ratio is not as useful as the correction*
estimate, because it is $+\infty$ with an infinite standard error (see previous section).
The workhorse function in brglm2 is
brglm_fit
(or equivalently brglmFit
if you like camel case), which, as we did
in the example above, can be passed directly to the method
argument
of the glm
function. brglm_fit
implements a quasi Fisher
scoring procedure,
whose special cases result in a range of explicit and implicit bias
reduction methods for generalized linear models for more
details). Bias reduction for multinomial logistic regression (nominal
responses) can be performed using the function brmultinom
, and for
adjacent category models (ordinal responses) using the function
bracl
. Both brmultinom
and bracl
rely on brglm_fit
.
The iteration vignette and Kosmidis et al (2020) present the iteration and give mathematical details for the bias-reducing adjustments to the score functions for generalized linear models.
The classification of bias reduction methods into explicit and implicit is as given in Kosmidis (2014).
brglm2 was presented by Ioannis Kosmidis at the useR! 2016 international conference at University of Stanford on 16 June 2016. The presentation was titled "Reduced-bias inference in generalized linear models".
Motivation, details and discussion on the methods that brglm2 implements are provided in
Kosmidis, I, Kenne Pagui, E C, Sartori N. (2020). Mean and median bias reduction in generalized linear models. Statistics and Computing 30, 43–59.
Please note that the brglm2 project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.