Description Usage Arguments Details Value Methods (by class) Examples
View source: R/hltest_functions.R
hltest
implements a goodness-of-fit test to assess the goodness of fit of
logistic regression models in large samples.
1 2 3 4 5 6 7 8 9 |
... |
Additional arguments (ignored). |
y, prob |
Numeric vectors with binary responses and predicted probabilities to be evaluated. The vectors must have equal length. Missing values are dropped. |
G |
Number of groups to be used in the Hosmer-Lemeshow statistic. By default, |
outsample |
A boolean specifying whether the model has been fit on the data provided
( |
epsilon0 |
Value of the parameter epsilon0, which characterizes the models to be considered as
acceptable in terms of goodness of fit. By default (NULL), epsilon0 is set to the value of epsilon expected from a model attaining a
p-value of the traditional Hosmer-Lemeshow test of 0.05 in a sample of one million observations.
The case |
conf.level |
Confidence level for the confidence interval of epsilon. Equal to |
citype |
Type of confidence interval of epsilon to be computed: one-sided
( |
cimethod |
Method to be used to compute the two-sided confidence interval:
symmetric ( |
glmObject |
In alternative to the vectors |
The modification of the Hosmer-Lemeshow test evaluates the hypotheses:
H0: epsilon <= epsilon0 vs. Ha: epsilon > epsilon0,
where epsilon is a parameter that measures the goodness of fit of a model. This parameter is based on a standardization of the noncentrality parameter that characterizes the distribution of the Hosmer-Lemeshow statistic. The case epsilon=0 corresponds to a model with perfect fit.
Because the null hypothesis of the traditional Hosmer-Lemeshow test is the condition of perfect fit,
it can be interpreted as a test for H0: epsilon = 0 vs. Ha: epsilon > 0. Therefore, the
traditional Hosmer-Lemeshow test can be performed by setting the argument epsilon0=0
.
If epsilon0>0, the implemented test evaluates whether the fit of a model is "acceptable", albeit not perfect. The value of epsilon0 defines what is meant for "acceptable" in terms of goodness of fit. By default, epsilon0 is the value of epsilon expected from a model attaining a p-value of the traditional Hosmer-Lemeshow test of 0.05 in a sample of one million observations. In other words, the test assesses whether the fit of a model is worse than the fit of a model that would be considered as borderline-significant (i.e., attaining a p-value of 0.05) in a sample of one million observations.
The function also estimates the parameter epsilon and constructs its confidence interval. The confidence interval of this parameter is based on the confidence interval of the noncentrality parameter that characterizes the distribution of the Hosmer-Lemeshow statistic, which is noncentral chi-squared. Two types of two-sided confidence intervals are implemented: symmetric (default) and central. See Kent and Hainsworth (1995) for further details.
References:
Kent, J. T., & Hainsworth, T. J. (1995). Confidence intervals for the noncentral chi-squared distribution. Journal of Statistical Planning and Inference, 46(2), 147–159.
Nattino, G., Pennell, M. L., & Lemeshow, S.. Assessing the Goodness of fit of Logistic Regression Models in Large Samples: A Modification of the Hosmer-Lemeshow Test. In preparation.
A list of class htest
containing the following components:
The value of epsilon0 used in the test.
The value of the Hosmer-Lemeshow statistic.
The p-value of the test.
A vector with the parameters of the noncentral chi-squared distribution used to
compute the p-value: degrees of freedom (dof
) and noncentrality
parameter (lambda
).
The estimate of noncentrality parameter lambda.
The estimate of epsilon.
The confidence interval of epsilon.
numeric
: Method for vectors of responses and predicted probabilities.
glm
: Method for result of glm
fit.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | #Generate fake data with two variables: one continuous and one binary.
set.seed(1234)
dat <- data.frame(x1 = rnorm(5e5),
x2 = rbinom(5e5, size=1, prob=.5))
#The true probabilities of the response depend on a negligible interaction
dat$prob <- 1/(1+exp(-(-1 + dat$x1 + dat$x2 + 0.05*dat$x1*dat$x2)))
dat$y <- rbinom(5e5, size = 1, prob = dat$prob)
#Fit an acceptable model (does not include the negligible interaction)
model <- glm(y ~ x1 + x2, data = dat, family = binomial(link="logit"))
#Check: predicted probabilities are very close to true probabilities
dat$phat <- predict(model, type = "response")
boxplot(abs(dat$prob-dat$phat))
#Traditional Hosmer-Lemeshow test: reject H0
hltest(model, epsilon0 = 0)
#Modified Hosmer-Lemeshow test: fail to reject H0
hltest(model)
#Same output with vectors of responses and predicted probabilities
hltest(y=dat$y, prob=dat$phat)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.