logisregr: Logistic Regression Models for Binary Data

View source: R/wrappers.R

logisregrR Documentation

Logistic Regression Models for Binary Data

Description

Obtains the parameter estimates from logistic regression models with binary data.

Usage

logisregr(
  data,
  rep = "",
  event = "event",
  covariates = "",
  freq = "",
  weight = "",
  offset = "",
  id = "",
  robust = FALSE,
  firth = FALSE,
  flic = FALSE,
  plci = FALSE,
  alpha = 0.05
)

Arguments

data

The input data frame that contains the following variables:

  • rep: The replication for by-group processing.

  • event: The event indicator, 1=event, 0=no event.

  • covariates: The values of baseline covariates.

  • freq: The frequency for each observation.

  • weight: The weight for each observation.

  • offset: The offset for each observation.

  • id: The optional subject ID to group the score residuals in computing the robust sandwich variance.

rep

The name(s) of the replication variable(s) in the input data.

event

The name of the event variable in the input data.

covariates

The vector of names of baseline covariates in the input data.

freq

The name of the frequency variable in the input data. The frequencies must be the same for all observations within each cluster as indicated by the id. Thus freq is the cluster frequency.

weight

The name of the weight variable in the input data.

offset

The name of the offset variable in the input data.

id

The name of the id variable in the input data.

robust

Whether a robust sandwich variance estimate should be computed. In the presence of the id variable, the score residuals will be aggregated for each id when computing the robust sandwich variance estimate.

firth

Whether the firth's bias reducing penalized likelihood should be used. The default is FALSE.

flic

Whether to apply intercept correction to obtain more accurate predicted probabilities. The default is FALSE.

plci

Whether to obtain profile likelihood confidence interval.

alpha

The two-sided significance level.

Details

Fitting a logistic regression model using Firth's bias reduction method is equivalent to penalization of the log-likelihood by the Jeffreys prior. Firth's penalized log-likelihood is given by

l(\beta) + \frac{1}{2} \log(\mbox{det}(I(\beta)))

and the components of the gradient g(\beta) are computed as

g(\beta_j) + \frac{1}{2} \mbox{trace}\left(I(\beta)^{-1} \frac{\partial I(\beta)}{\partial \beta_j}\right)

The Hessian matrix is not modified by this penalty.

Firth's method reduces bias in maximum likelihood estimates of coefficients, but it introduces a bias toward one-half in the predicted probabilities.

A straightforward modification to Firth’s logistic regression to achieve unbiased average predicted probabilities involves a post hoc adjustment of the intercept. This approach, known as Firth’s logistic regression with intercept correction (FLIC), preserves the bias-corrected effect estimates. By excluding the intercept from penalization, it ensures that we don't sacrifice the accuracy of effect estimates to improve the predictions.

Value

A list with the following components:

  • sumstat: The data frame of summary statistics of model fit with the following variables:

    • n: The number of subjects.

    • nevents: The number of events.

    • loglik0: The (penalized) log-likelihood under null.

    • loglik1: The maximum (penalized) log-likelihood.

    • niter: The number of Newton-Raphson iterations.

    • p: The number of parameters, including the intercept, and regression coefficients associated with the covariates.

    • robust: Whether a robust sandwich variance estimate should be computed.

    • firth: Whether the firth's penalized likelihood is used.

    • flic: Whether to apply intercept correction.

    • loglik0_unpenalized: The unpenalized log-likelihood under null.

    • loglik1_unpenalized: The maximum unpenalized log-likelihood.

    • rep: The replication.

  • parest: The data frame of parameter estimates with the following variables:

    • param: The name of the covariate for the parameter estimate.

    • beta: The parameter estimate.

    • sebeta: The standard error of parameter estimate.

    • z: The Wald test statistic for the parameter.

    • expbeta: The exponentiated parameter estimate.

    • vbeta: The covariance matrix for parameter estimates.

    • lower: The lower limit of confidence interval.

    • upper: The upper limit of confidence interval.

    • p: The p-value from the chi-square test.

    • method: The method to compute the confidence interval and p-value.

    • sebeta_naive: The naive standard error of parameter estimate.

    • vbeta_naive: The naive covariance matrix of parameter estimates.

    • rep: The replication.

  • fitted: The data frame with the following variables:

    • linear_predictors: The linear fit on the logit scale.

    • fitted_values: The fitted probabilities of having an event, obtained by transforming the linear predictors by the inverse of the logit link.

    • rep: The replication.

  • p: The number of parameters.

  • param: The parameter names.

  • beta: The parameter estimate.

  • vbeta: The covariance matrix for parameter estimates.

  • vbeta_naive: The naive covariance matrix for parameter estimates.

  • linear_predictors: The linear fit on the logit scale.

  • fitted_values: The fitted probabilities of having an event.

  • terms: The terms object.

  • xlevels: A record of the levels of the factors used in fitting.

  • data: The input data.

  • rep: The name(s) of the replication variable(s).

  • event: The name of the event variable.

  • covariates: The names of baseline covariates.

  • freq: The name of the freq variable.

  • weight: The name of the weight variable.

  • offset: The name of the offset variable.

  • id: The name of the id variable.

  • robust: Whether a robust sandwich variance estimate should be computed.

  • firth: Whether to use the firth's bias reducing penalized likelihood.

  • flic: Whether to apply intercept correction.

  • plci: Whether to obtain profile likelihood confidence interval.

  • alpha: The two-sided significance level.

Author(s)

Kaifeng Lu, kaifenglu@gmail.com

References

David Firth. Bias Reduction of Maximum Likelihood Estimates. Biometrika 1993; 80:27–38.

Georg Heinze and Michael Schemper. A solution to the problem of separation in logistic regression. Statistics in Medicine 2002;21:2409–2419.

Rainer Puhr, Georg Heinze, Mariana Nold, Lara Lusa, and Angelika Geroldinger. Firth's logistic regression with rare events: accurate effect estimates and predictions? Statistics in Medicine 2017; 36:2302-2317.

Examples


(fit1 <- logisregr(
  ingots, event = "NotReady", covariates = "Heat*Soak", freq = "Freq"))


lrstat documentation built on Oct. 18, 2024, 9:06 a.m.