Home

/

GitHub

/

gnattino/largesamplehl

/

hltest: Modified Hosmer-Lemeshow Test for Large Samples

hltest: Modified Hosmer-Lemeshow Test for Large Samples
In gnattino/largesamplehl: A Modification of the Hosmer-Lemeshow Test for Large Samples

Description Usage Arguments Details Value Methods (by class) Examples

View source: R/hltest_functions.R

hltest implements a goodness-of-fit test to assess the goodness of fit of logistic regression models in large samples.

hltest(...)

## S3 method for class 'numeric'
hltest(y, prob, G = 10, outsample = FALSE,
  epsilon0 = NULL, conf.level = 0.95, citype = "one.sided",
  cimethod = ifelse(citype == "one.sided", NULL, "symmetric"), ...)

## S3 method for class 'glm'
hltest(glmObject, ...)

`...`	Additional arguments (ignored).
`y, prob`	Numeric vectors with binary responses and predicted probabilities to be evaluated. The vectors must have equal length. Missing values are dropped.
`G`	Number of groups to be used in the Hosmer-Lemeshow statistic. By default, `G=10`
`outsample`	A boolean specifying whether the model has been fit on the data provided (`outsample=FALSE`, default) or if the model has been developed on an external sample (`outsample=TRUE`). The distribution of the Hosmer-Lemeshow statistic is assumed to have `G-2` and `G` degrees of freedom if `outsample=FALSE` and `outsample=TRUE`, respectively.
`epsilon0`	Value of the parameter epsilon0, which characterizes the models to be considered as acceptable in terms of goodness of fit. By default (NULL), epsilon0 is set to the value of epsilon expected from a model attaining a p-value of the traditional Hosmer-Lemeshow test of 0.05 in a sample of one million observations. The case `epsilon0=0` corresponds to the traditional Hosmer-Lemeshow test. See the section "Details" for further information.
`conf.level`	Confidence level for the confidence interval of epsilon. Equal to `.95` by default.
`citype`	Type of confidence interval of epsilon to be computed: one-sided (`citype="one.sided"`, default) or two-sided (`citype="two.sided"`).
`cimethod`	Method to be used to compute the two-sided confidence interval: symmetric (`cimethod="symmetric"`, default) or central (`cimethod="central"`). See section "Details" for further information.
`glmObject`	In alternative to the vectors `y` and `prob`, it is possible to provide the `glm` object with the model to be evaluated.

The modification of the Hosmer-Lemeshow test evaluates the hypotheses:

H0: epsilon <= epsilon0 vs. Ha: epsilon > epsilon0,

where epsilon is a parameter that measures the goodness of fit of a model. This parameter is based on a standardization of the noncentrality parameter that characterizes the distribution of the Hosmer-Lemeshow statistic. The case epsilon=0 corresponds to a model with perfect fit.

Because the null hypothesis of the traditional Hosmer-Lemeshow test is the condition of perfect fit, it can be interpreted as a test for H0: epsilon = 0 vs. Ha: epsilon > 0. Therefore, the traditional Hosmer-Lemeshow test can be performed by setting the argument epsilon0=0.

If epsilon0>0, the implemented test evaluates whether the fit of a model is "acceptable", albeit not perfect. The value of epsilon0 defines what is meant for "acceptable" in terms of goodness of fit. By default, epsilon0 is the value of epsilon expected from a model attaining a p-value of the traditional Hosmer-Lemeshow test of 0.05 in a sample of one million observations. In other words, the test assesses whether the fit of a model is worse than the fit of a model that would be considered as borderline-significant (i.e., attaining a p-value of 0.05) in a sample of one million observations.

The function also estimates the parameter epsilon and constructs its confidence interval. The confidence interval of this parameter is based on the confidence interval of the noncentrality parameter that characterizes the distribution of the Hosmer-Lemeshow statistic, which is noncentral chi-squared. Two types of two-sided confidence intervals are implemented: symmetric (default) and central. See Kent and Hainsworth (1995) for further details.

References:

Kent, J. T., & Hainsworth, T. J. (1995). Confidence intervals for the noncentral chi-squared distribution. Journal of Statistical Planning and Inference, 46(2), 147–159.

Nattino, G., Pennell, M. L., & Lemeshow, S.. Assessing the Goodness of fit of Logistic Regression Models in Large Samples: A Modification of the Hosmer-Lemeshow Test. In preparation.

A list of class htest containing the following components:

null.value: The value of epsilon0 used in the test.
statistic: The value of the Hosmer-Lemeshow statistic.
p.value: The p-value of the test.
parameter: A vector with the parameters of the noncentral chi-squared distribution used to compute the p-value: degrees of freedom (dof) and noncentrality parameter (lambda).
lambdaHat: The estimate of noncentrality parameter lambda.
estimate: The estimate of epsilon.
conf.int: The confidence interval of epsilon.

numeric: Method for vectors of responses and predicted probabilities.
glm: Method for result of glm fit.

#Generate fake data with two variables: one continuous and one binary.
set.seed(1234)
dat <- data.frame(x1 = rnorm(5e5),
                 x2 = rbinom(5e5, size=1, prob=.5))
#The true probabilities of the response depend on a negligible interaction
dat$prob <- 1/(1+exp(-(-1 + dat$x1 + dat$x2 + 0.05*dat$x1*dat$x2)))
dat$y <- rbinom(5e5, size = 1, prob = dat$prob)

#Fit an acceptable model (does not include the negligible interaction)
model <- glm(y ~ x1 + x2, data = dat, family = binomial(link="logit"))

#Check: predicted probabilities are very close to true probabilities
dat$phat <- predict(model, type = "response")
boxplot(abs(dat$prob-dat$phat))

#Traditional Hosmer-Lemeshow test: reject H0
hltest(model, epsilon0 = 0)

#Modified Hosmer-Lemeshow test: fail to reject H0
hltest(model)

#Same output with vectors of responses and predicted probabilities
hltest(y=dat$y, prob=dat$phat)

gnattino/largesamplehl documentation built on March 22, 2021, 3:48 p.m.

gnattino/largesamplehl index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

gnattino/largesamplehl
A Modification of the Hosmer-Lemeshow Test for Large Samples

hltest: Modified Hosmer-Lemeshow Test for Large Samples
In gnattino/largesamplehl: A Modification of the Hosmer-Lemeshow Test for Large Samples

Description

Usage

Arguments

Details

Value

Methods (by class)

Examples

Related to hltest in gnattino/largesamplehl...

R Package Documentation

Browse R Packages

We want your feedback!

gnattino/largesamplehl A Modification of the Hosmer-Lemeshow Test for Large Samples

hltest: Modified Hosmer-Lemeshow Test for Large Samples In gnattino/largesamplehl: A Modification of the Hosmer-Lemeshow Test for Large Samples

Description

Usage

Arguments

Details

Value

Methods (by class)

Examples

Related to hltest in gnattino/largesamplehl...

R Package Documentation

Browse R Packages

We want your feedback!

gnattino/largesamplehl
A Modification of the Hosmer-Lemeshow Test for Large Samples

hltest: Modified Hosmer-Lemeshow Test for Large Samples
In gnattino/largesamplehl: A Modification of the Hosmer-Lemeshow Test for Large Samples