assert_no_separation: Assert no (quasi) separation for binary classifcation model

Description Usage Arguments Details References Examples

View source: R/assert.R

Description

The function throws an error if separation for binary classification problems is detected and thus the non-existence of the maximum likelihood estimate. Otherwise it returns TRUE invisibly.

Usage

1
assert_no_separation(model, solver, ...)

Arguments

model

the classiciation model

solver

a length 1 character vector to control which solvers is used to solve the underlying linear program. Please see the ROI package for a list of available solvers.

...

passed as parameters to ROI_solve.

Details

It formulates a linear programming (LP) model and solves it using the ROI package. The ROI package offers a unified interface towards a range of linear programming solvers (i.e. specialized packages to solve LPs.).

The rational is best described by quoting Kjell Konis (2007) directly:

> The parameter estimates of a binary logistic regression model fit using the method of maximum likelihood sometimes do not converge to finite values.

> This phenomenon (also known as monotone likelihood or infinite parameters) occurs because of a condition among the sample points known as separation. There are two classes of separation.

> When complete separation is present among the sample points, iterative procedures for maximizing the likelihood tend to break down, when it would be clear that there is a problem with the model.

> However, when quasicomplete separation is present among the sample points, the iterative procedures for maximizing the likelihood tend to satisfy their convergence criterion before revealing any indication of separation.

References

Kjell Konis (2007). Linear programming algorithms for detecting separated data in binary logistic regression models. Ph. D. thesis, University of Oxford.

Kjell Konis (2013). safeBinaryRegression: Safe Binary Regression. R package version 0.1-3. https://CRAN.R-project.org/package=safeBinaryRegression

Kurt Hornik, David Meyer, Florian Schwendinger and Stefan Theussl (2019). ROI: R Optimization Infrastructure. R package version 0.3-2. https://CRAN.R-project.org/package=ROI

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
library(ROI.plugin.glpk)
data <- data.frame(
  x = factor(c(1, 1, 1, 2, 2, 2, 3, 3)),
  y = c(1, 1, 0, 1, 1, 0, 1, 1)
)

model <- glm(y ~ -1 + x, data = data, family = "binomial")

# throws an error if the data is separable
try(assert_no_separation(model)) #uses any compatible loaded solver

# or solve it using GLPK with the option presolve
try(assert_no_separation(model, solver = "glpk", presolve = TRUE))

dirkschumacher/losep documentation built on Nov. 10, 2019, 7:03 a.m.