check_low_freq: Check for low frequencies/counts when doing logistic...

View source: R/check-low-freq.R

check_low_freqR Documentation

Check for low frequencies/counts when doing logistic regression

Description

Low counts in certain categories and levels of a variable can cause issues in logistic regression. This function helps to identify low counts that might be problematic. Rule of thumb might be to collapse categories that contain <= 1.0% of your data.

Usage

check_low_freq(fit, data, threshold = 0.01)

check_low_freq2(data, outcome, predictors, threshold = 0.01)

Arguments

fit

An object of class glm inheriting from "glm" which inherits from the class "lm.

data

A tibble or data frame with the full data set.

threshold

The threshold to flag categories with frequencies/counts. Default is 0.01.

outcome

Character string. The dependent variable (outcome) for logistic regression.

predictors

Character vector. Independent variables (predictors/covariates) for univariable and/or multivariable modelling.

Value

A tibble

Examples

## Not run: 
library(epiDisplay)
library(dplyr)

dplyr::glimpse(infert)
model0 <- glm(case ~ induced + spontaneous + education,
              family = binomial,
              data = infert)
summary(model0)


check_low_freq(fit = model0,
               data = infert)

check_low_freq(fit = model0,
               data = infert, 
               threshold = 0.05)

check_low_freq2(data = infert,
                outcome = "case",
                predictors = c("induced", "spontaneous", "education"), 
                threshold = 0.05)


#### Another data set --------------------------------

library(compareGroups)
data(predimed)
dplyr::glimpse(predimed)
predimed <- predimed %>%
  mutate_if(is.double, as.double)

fit = glm(htn ~ sex + bmi + smoke,
          family = binomial(link = "logit"),
          data = predimed)

check_low_freq(fit = fit,
               data = predimed)

check_low_freq(fit = model0,
               data = infert, 
               threshold = 0.05)

check_low_freq2(data = predimed,
                outcome = "htn",
                predictors = c("sex", "bmi", "smoke"), 
                threshold = 0.01)


check_low_freq2(data = predimed,
                outcome = "htn",
                predictors = c("sex", "bmi", "smoke"), 
                threshold = 0.05)




## End(Not run)

emilelatour/latable documentation built on Sept. 14, 2023, 9:32 a.m.