diagsep: Detailed separation diagnostic for all categorical outcomes.

View source: R/diagsep.R

diagsepR Documentation

Detailed separation diagnostic for all categorical outcomes.

Description

This function checks whether there is (quasi-) complete separation, which type if any, gives the dimension of the recession cone, lists the number of columns in the design matrix that give rise to the separation as well as the columns names and lists the rows in X/S for which we have separation.

Usage

diagsep(
  y,
  X,
  S,
  rational = FALSE,
  model = c("bcl", "b", "cl", "acl", "sl", "osm"),
  backend = c("rcdd", "ROI"),
  solver = NULL
)

diagnose_separation(..., rational, backend, solver)

## Default S3 method:
diagnose_separation(
  y,
  X,
  rational = FALSE,
  backend = c("rcdd", "ROI"),
  solver = NULL,
  ...
)

## S3 method for class 'factor'
diagnose_separation(
  y,
  X,
  rational = FALSE,
  backend = c("rcdd", "ROI"),
  solver = NULL,
  ...
)

## S3 method for class 'matrix'
diagnose_separation(
  S,
  rational = FALSE,
  backend = c("rcdd", "ROI"),
  solver = NULL,
  ...
)

## S3 method for class 'formula'
diagnose_separation(
  formula,
  data,
  model = c("bcl", "b", "cl", "acl", "osm", "sl"),
  rational = FALSE,
  contrasts = NULL,
  backend = c("rcdd", "ROI"),
  solver = NULL,
  ...
)

## S3 method for class 'osm'
diagnose_separation(
  object,
  rational = FALSE,
  backend = c("rcdd", "ROI"),
  solver = NULL,
  ...
)

## S3 method for class 'clm'
diagnose_separation(
  object,
  rational = FALSE,
  backend = c("rcdd", "ROI"),
  solver = NULL,
  ...
)

## S3 method for class 'polr'
diagnose_separation(
  object,
  rational = FALSE,
  backend = c("rcdd", "ROI"),
  solver = NULL,
  ...
)

## S3 method for class 'multinom'
diagnose_separation(
  object,
  rational = FALSE,
  backend = c("rcdd", "ROI"),
  solver = NULL,
  ...
)

## S3 method for class 'glm'
diagnose_separation(
  object,
  rational = FALSE,
  backend = c("rcdd", "ROI"),
  solver = NULL,
  ...
)

## S3 method for class 'bracl'
diagnose_separation(
  object,
  rational = FALSE,
  backend = c("rcdd", "ROI"),
  solver = NULL,
  ...
)

## S3 method for class 'brmultinom'
diagnose_separation(
  object,
  rational = FALSE,
  backend = c("rcdd", "ROI"),
  solver = NULL,
  ...
)

Arguments

y

the outcome variable. Can be binary, categorial or ordinal. Works best if it is an ordered or unordered factor but can also be numeric, boolean or character. If y is not a factor, it is treated as a nominal (categorical) outcome.

X

a design matrix, e.g. generated via a call to 'model.matrix'. This means we expect that X already contains the desired contrasts for factors (e.g., dummies) and any other expanded columns (e.g., for polynomials).

S

a matrix of structure vectors

rational

should rational arithmetic be used

model

model string. One of "bcl", "b", "cl", "acl", "osm", "sl".

backend

which backend to use for the linear program. Can be "rcdd" (default and only option for rational=TRUE) or "ROI".

solver

the solver to be used in the backend. Defaults to "DualSimplex" for "rcdd" and the first LP solver returned by 'ROI_applicable_solver()' for "ROI".

...

arguments for the generic: For pre-fit y, X with y a vector of type factor, character, logical, numeric or integer. This is the y argument of diagsep. In this case one also needs to supply the argument X and optional but recommended a model. One can also supply a matrix S, in which case we treat it as the S argument to diagsep. For post-fit this can currently be an object of class glm, polr, clm, osm or nnet.

formula

An object of class ‘"formula"’ (or one that can be coerced to that class): a symbolic description of the model to be fitted. The details of model specification are given under ‘Details’ in glm.

data

Either a standard data frame, list or environment (or object coercible by as.data.frame to a data frame) containing variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which the function is called. Alternatively, data can be a data frame or matrix containing rational numbers as per the definition in rcdd (i.e. columns are characters, the entries are either integer numbers or ratios of integer numbers, e.g. "1", or "-234/19008". This is checked internally; see the Details for what happens when this structure is discovered.

contrasts

contrasts: an optional list. See the contrasts.arg of model.matrix.default. Only effective for standard data frames.

object

model object

Details

The function uses either a response vector y and a design matrix X, or a structure vector matrix S. If S is given, y and X and model are ignored.

diagnose_separation is S3 generic. For developers: If a method should be provided for the generic, it is best to have that method create a matrix of structure vectors S and use the low-level function diagsep with it.

The 'formula' method is for standard data frames and formulas that work the same way as when used with glm. It does not support extended formulas, and may not work for functions that do formula processing differently. For a data frame/matrix given as rational numbers in the rcdd definition this is recognized but the formula does not get expanded and is taken literally, so e.g. variables in formula must match exactly with the column names in data, or factors need to be converted to dummies before that (wouldn't be possible in the rational format in any other way anyway).

Value

an object of class 'sepmod' that is a list with the components:

  • separation boolean whether there is separation ('TRUE' means separation)

  • septype which type of separation (or not). A string of either "Overlap", "Quasi-Complete Separation" or "Complete Separation".

  • reccdim dimension of recession cone

  • offrows offending rows in X

  • nr.offcols number of columns of the design matrix that have separation

  • offcols columns of the design matrix that have separation. It is given as category::effect.

Examples

data(qcsepdatm)
y<-qcsepdatm$y
X<-cbind(1,qcsepdatm[,2:ncol(qcsepdatm)])
diagsep(y,X,model="bcl")

data(qcsepdatm)

#pre fit
y<-as.factor(qcsepdatm$y)
X<-cbind("(Intercept)"=1,qcsepdatm[,2:ncol(qcsepdatm)])
diagnose_separation(y, X=X, model="bcl") 

#post fit
if (require('nnet')) {
m1 <- nnet::multinom(y~x1+x2,data=qcsepdatm)
diagnose_separation(m1)
}

divoRce documentation built on April 28, 2026, 3:01 a.m.

Related to diagsep in divoRce...