logitgof: Hosmer-Lemeshow Tests for Logistic Regression Models

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

Performs the Hosmer-Lemeshow goodness of fit tests for binary, multinomial and ordinal logistic regression models.

Usage

1
logitgof(obs, exp, g = 10, ord = FALSE)

Arguments

obs

a vector of observed values. See details.

exp

expected values fitted by the model. See details.

g

number of quantiles of risk, 10 by default.

ord

logical indicating whether to run the ordinal version, FALSE by default.

Details

The Hosmer-Lemeshow tests The Hosmer-Lemeshow tests are goodness of fit tests for binary, multinomial and ordinal logistic regression models. logitgof is capable of performing all three. Essentially, they compare observed with expected frequencies of the outcome and compute a test statistic which is distributed according to the chi-squared distribution. The degrees of freedom depend upon the number of quantiles used and the number of outcome categories. A non-significant p value indicates that there is no evidence that the observed and expected frequencies differ (i.e., evidence of good fit).

Binary version If obs is a vector of 1s and 0s or a factor vector with 2 levels, then the binary version of the test is run. exp must be the fitted values obtained from the model, which can be accessed using the fitted() function.

Multinomial version If obs is a factor with three or more levels and ord = FALSE, the multinomial version of the test is run. If using the mlogit package to run a model, ensure outcome = FALSE in the fitted() function. See examples.

Ordinal version If obs is a factor with three or more levels and ord = TRUE, the ordinal version of the test is run. See examples for how to extract fitted values from models constructed using MASS::polr or oridinal::clm.

Note that Fagerland and Hosmer (2013) point out that the model needs to have at least as many covariate patterns as groups. This is achieved easily where there are continuous predictors or several categorical variables. This test will not be valid where there is only one or two categorical predictor variables. Fagerland and Hosmer (2016) also recommend running the Hosmer-Lemeshow test for ordinal models alongisde the Lipsitz test (lipsitz.test) and Pulkstenis-Robinson tests (pulkrob.chisq and pulkrob.deviance), as each detects different types of lack of fit.

Finally, it has been observed that the results from this implementation of the ordinal Hosmer-Lemeshow test and the Lipsitz test are slightly different from the Fagerland and Hosmer (2017) Stata implementation. It is not not yet clear why this is but is under investigation. The discrepancy is ver minor.

Value

A list of class htest containing:

statistic

the value of the relevant test statistic.

parameter

the number of degrees of freedom used.

p.value

the p-value.

method

a character string indicating whether the binary or multinomial version of the test was performed.

data.name

a character string containing the names of the data passed to obs and exp.

observed

a table of observed frequencies with g rows. Either an xtabs generated table (used in the binary version) or a cast generated data frame (multinomial version).

expected

a table of expected frequencies with g rows. Either an xtabs generated table or a cast generated data frame.

stddiffs

a table of the standardised differences. See Hosmer, Lemeshow and Sturdivant (2013), p 162.

Author(s)

Matthew Alexander Jay, with code adapted from ResourceSelection::hoslem.test by Peter Solymos.

References

See Also

lipsitz.test, pulkrob.chisq.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
### The mtcars dataset is a terrible example to use due to its small
### size - some of the models will return warnings as a result.
## Binary model
# 1/0 coding
data(mtcars)
mod1 <- glm(vs ~ cyl + mpg, data = mtcars, family = binomial)
logitgof(mtcars$vs, fitted(mod1))

# factor name coding
mtcars$engine <- factor(ifelse(mtcars$vs==0, "V", "S"), levels = c("V", "S"))
mod2 <- glm(engine ~ cyl + mpg, data = mtcars, family = binomial)
logitgof(mtcars$engine, fitted(mod2))

## Multinomial model
# with nnet
library(nnet)
mod3 <- multinom(gear ~ mpg + cyl, data = mtcars)
logitgof(mtcars$gear, fitted(mod3))

# with mlogit
library(mlogit)
data("Fishing", package = "mlogit")
Fish <- mlogit.data(Fishing, varying = c(2:9), shape = "wide", choice = "mode")
mod4 <- mlogit(mode ~ 0 | income, data = Fish)
logitgof(Fishing$mode, fitted(mod4, outcome = FALSE))

## Ordinal model
# polr in package MASS
mod5 <- polr(as.factor(gear) ~ mpg + cyl, data = mtcars)
logitgof(mtcars$gear, fitted(mod5), g = 5, ord = TRUE)

# clm in package ordinal
library(ordinal)
mtcars$gear <- as.factor(mtcars$gear)
mod6 <- clm(gear ~ mpg + cyl, data = mtcars)
predprob <- data.frame(mpg = mtcars$mpg, cyl = mtcars$cyl)
fv <- predict(mod6, newdata = predprob, type = "prob")$fit
logitgof(mtcars$gear, fv, g = 5, ord = TRUE)

matthewjay15/generalhoslem documentation built on June 6, 2019, 11:24 a.m.