hosmer_test | R Documentation |
Hosmer-Lemeshow Goodness of Fit Test is to check model quality of logistic regression models. Note that this function has a unique way of dividing subgroups. See details.
hosmer_test(model, g = 10, simple = FALSE, force = FALSE)
model |
a |
g |
numeric, the number for how many subgroups the data should be divided into. |
simple |
logical, If |
force |
If |
The Hosmer-Lemeshow Goodness of Fit Test is a method for obtaining statistics by dividing observed and expected values into several arbitrary subgroups.
The method of dividing the observed and expected values into subgroups is generally based on the quantile of the expected value, for example, by taking a decile of the expected value.
This method is used in the hoslem.test()
function of the resouceselection
package and the performance_hosmer()
function of the performance
package.
It has been suggested that it may be more accurate to divide subgroups by quantiles such as decile.
However, there are several variations on how to divide the subgroups, and this function uses a method in which the expected values are ordered from smallest to largest so that each subgroup has the same number of samples as possible.
If simple is TRUE, the process simply divides the expected values in decreasing order by the number of subgroups specified so that they are evenly distributed.
If simple is FALSE, the same expected values are included in the same subgroup, and the calculation is performed with the number of subgroups adjusted so that the minimum number of values in a subgroup is maximized and the variance of the number of values in each group is minimized. In other words, it strives to keep the same number of values in the subgroups as much as possible, while ensuring that the same expected values are in the same subgroups. In this algorithm, the subgroup with the smallest number of expected values in the initial disjoint state is merged with its neighboring subgroups (with smaller or larger expected values) and the one with the smaller variance is adopted to create a new subgroup, and then the subgroup with the smallest number of expected values is merged with its neighboring expected value subgroups and the one with the smaller variance is adopted to create a new subgroup, and so on. The next subgroup with the lowest number of expected values is merged with the subgroup with the lowest variance, and the one with the lowest variance is adopted to create a new subgroup. This procedure will result in a homogeneous number of subgroups as expected when the expected number of subgroups are relatively disparate, but will not create the expected number of subgroups when the expected number of subgroups are nearly homogeneous (e.g., only 1 or 2 of each).
However, this algorithm may not minimize the variance.
For this reason, we can set force
to TRUE
with the value calculated by brute force. However, this would require a large amount of computation and may consume a large amount of memory and slow down the process until the result is obtained.
A list with class "htest
" containing the following components:
statistic |
the value of the chi-squared test statistic, |
parameter |
the degrees of freedom of the approximate chi-squared distribution of the test statistic |
p.value |
the p-value for the test. |
method |
a character string of test performed. |
data.name |
expressions (objects) for which logistic regression analysis has been performed. |
observed |
the observed frequencies in a |
expected |
the expected frequencies in a |
David W. Hosmer, Stanley Lemesbow (1980). Goodness of fit tests for the multiple logistic regression model, Communications in Statistics - Theory and Methods, 9:10, 1043-1069, \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1080/03610928008827941")}
HOSMER, D.W., HOSMER, T., LE CESSIE, S. and LEMESHOW, S. (1997), A COMPARISON OF GOODNESS-OF-FIT TESTS FOR THE LOGISTIC REGRESSION MODEL. Statist. Med., 16: 965-980. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1002/(SICI)1097-0258(19970515)16:9<965::AID-SIM509>3.0.CO;2-O")}
data("Titanic")
df <- data.frame(Titanic)
df <- data.frame(Class = rep(df$Class, df$Freq),
Sex = rep(df$Sex, df$Freq),
Age = rep(df$Age, df$Freq),
Survived = rep(df$Survived, df$Freq))
model <- glm(Survived ~ . ,data = df, family = binomial())
hosmer_test(model)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.