detect_separation: Method for 'glm' that tests for data separation and finds...

Description Usage Arguments Details Note Author(s) References See Also Examples

View source: R/detect_separation.R

Description

detect_separation is a method for glm that tests for the occurrence of complete or quasi-complete separation in datasets for binomial response generalized linear models, and finds which of the parameters will have infinite maximum likelihood estimates. detect_separation relies on the linear programming methods developed in Konis (2007).

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
detect_separation(
  x,
  y,
  weights = rep(1, nobs),
  start = NULL,
  etastart = NULL,
  mustart = NULL,
  offset = rep(0, nobs),
  family = gaussian(),
  control = list(),
  intercept = TRUE,
  singular.ok = TRUE
)

detectSeparation(
  x,
  y,
  weights = rep(1, nobs),
  start = NULL,
  etastart = NULL,
  mustart = NULL,
  offset = rep(0, nobs),
  family = gaussian(),
  control = list(),
  intercept = TRUE,
  singular.ok = TRUE
)

Arguments

x

x is a design matrix of dimension n * p.

y

y is a vector of observations of length n.

weights

an optional vector of ‘prior weights’ to be used in the fitting process. Should be NULL or a numeric vector.

start

currently not used.

etastart

currently not used.

mustart

currently not used.

offset

this can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be NULL or a numeric vector of length equal to the number of cases. One or more offset terms can be included in the formula instead or as well, and if more than one is specified their sum is used. See model.offset.

family

a description of the error distribution and link function to be used in the model. For glm this can be a character string naming a family function, a family function or the result of a call to a family function. For glm.fit only the third option is supported. (See family for details of family functions.)

control

a list of parameters controlling separation detection. See detect_separation_control for details.

intercept

logical. Should an intercept be included in the null model?

singular.ok

logical. If FALSE, a singular model is an error.

Details

For the definition of complete and quasi-complete separation, see Albert and Anderson (1984).

detect_separation is a wrapper to the separator function from the **safeBinaryRegression** R package, that can be passed directly as a method to the glm function. See, examples.

The interface to separator was designed by Ioannis Kosmidis after correspondence with Kjell Konis, and a port of separator has been included in **brglm2** under the permission of Kjell Konis.

detectSeparation is an alias for detect_separation.

Note

detect_separation will be removed from brglm2 at version 0.8. A new version of detect_separation is now maintained in the detectseparation R package at https://cran.r-project.org/package=detectseparation. In order to use the version in detect_separation load first brglm2 and then detectseparation, i.e. library(brglm2); library(detectseparation).

Author(s)

Ioannis Kosmidis [aut, cre] ioannis.kosmidis@warwick.ac.uk, Kjell Konis [ctb] kjell.konis@me.com

References

Konis K (2007). *Linear Programming Algorithms for Detecting Separated Data in Binary Logistic Regression Models*. DPhil. University of Oxford. https://ora.ox.ac.uk/objects/uuid:8f9ee0d0-d78e-4101-9ab4-f9cbceed2a2a

Konis K (2013). safeBinaryRegression: Safe Binary Regression. R package version 0.1-3. https://CRAN.R-project.org/package=safeBinaryRegression

Kosmidis I, Firth D (2020). Jeffreys-prior penalty, finiteness and shrinkage in binomial-response generalized linear models. *Biometrika* doi: 10.1093/biomet/asaa052

See Also

brglm_fit, glm.fit and glm

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
## endometrial data from Heinze \& Schemper (2002) (see ?endometrial)
data("endometrial", package = "brglm2")
endometrial_sep <- glm(HG ~ NV + PI + EH, data = endometrial,
                       family = binomial("logit"),
                       method = "detect_separation")
endometrial_sep
## The maximum likelihood estimate for NV is infinite
summary(update(endometrial_sep, method = "glm.fit"))

## Not run: 
## Example inspired by unpublished microeconometrics lecture notes by
## Achim Zeileis https://eeecon.uibk.ac.at/~zeileis/
## The maximum likelihood estimate of sourhernyes is infinite
data("MurderRates", package = "AER")
murder_sep <- glm(I(executions > 0) ~ time + income +
                  noncauc + lfp + southern, data = MurderRates,
                  family = binomial(), method = "detect_separation")
murder_sep
## which is also evident by the large estimated standard error for NV
murder_glm <- update(murder_sep, method = "glm.fit")
summary(murder_glm)
## and is also reveal by the divergence of the NV column of the
## result from the more computationally intensive check
check_infinite_estimates(murder_glm)
## Mean bias reduction via adjusted scores results in finite estimates
update(murder_glm, method = "brglm_fit")

## End(Not run)

ikosmidis/brglm2 documentation built on Feb. 10, 2021, 3:27 a.m.