logbin: Log-Binomial Regression
In logbin: Relative Risk Regression Using the Log-Binomial Model

logbin

R Documentation

Log-Binomial Regression

Description

logbin fits relative risk (log-link) binomial regression models.

Usage

logbin(formula, mono = NULL, data, subset, na.action, start = NULL,
       offset, control = list(...), model = TRUE, 
       method = c("cem", "em", "glm", "glm2", "ab"),
       accelerate = c("em", "squarem", "pem", "qn"), 
       control.method = list(), warn = TRUE, ...)

Arguments

`formula`	an object of class `"formula"` (or one that can be coerced into that class): a symbolic description of the model to be fitted. The details of model specification are given under "Details". Note that the model must contain an intercept, and 2nd-order terms (such as interactions) or above are currently not supported by the `"cem"` and `"em"` methods — see "Note".
`mono`	a vector indicating which terms in `formula` should be restricted to have a monotonically non-decreasing relationship with the outcome. May be specified as names or indices of the terms. `method = "glm"` and `"glm2"` cannot impose monotonicity constraints, and they are not currently supported for `method = "ab"`.
`data`	an optional data frame, list or environment (or object coercible by `as.data.frame` to a data frame) containing the variables in the model. If not found in `data`, the variables are taken from `environment(formula)`, typically the environment from which `logbin` is called.
`subset`	an optional vector specifying a subset of observations to be used in the fitting process.
`na.action`	a function which indicates what should happen when the data contain `NA`s. The default is set be the `na.action` setting of `options`, and is `na.fail` if that is unset. The ‘factory-fresh’ default is `na.omit`. Another possible value is `NULL`, no action. Value `na.exclude` can be useful.
`start`	starting values for the parameters in the linear predictor.
`offset`	this can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be `NULL` or a non-positive numeric vector of length equal to the number of cases. One or more `offset` terms can be included in the formula instead or as well, and if more than one is specified their sum is used. See `model.offset`.
`control`	a list of parameters for controlling the fitting process, passed to `logbin.control`. With `method = "cem"`, `epsilon` should be smaller than `bound.tol`.
`model`	a logical value indicating whether the model frame should be included as a component of the returned value.
`method`	a character string that determines which algorithm to use to find the MLE. The main purpose of `logbin` is the implementation of stable EM-type algorithms: `"cem"` for the combinatorial EM algorithm, which cycles through a sequence of constrained parameter spaces, or `"em"` for a single EM algorithm based on an overparameterised model. `"ab"` implements an adaptive barrier method, using the `constrOptim` function. `"glm"` or `"glm2"` may be used to compare the results from the usual IWLS algorithms on the same model.
`accelerate`	for the `"cem"` and `"em"` methods, a character string that determines the acceleration algorithm to be used, (partially) matching one of `"em"` (no acceleration — the default), `"squarem"`, `"pem"` or `"qn"`. See `turboem` for further details. Note that `"decme"` is not permitted.
`control.method`	a list of control parameters for the fitting algorithm. This is passed to the `control.method` argument of `turboem` if `method = "cem"` or `"em"`. If `method = "ab"`, this is passed to the `control` argument of `constrOptim` (and hence to `optim` — see this documentation for full details). Note that the `trace` and `maxit` elements are ignored and the equivalent items from the supplied `logbin.control` argument are used instead. May also contain element `method` (default `"BFGS"`), which is passed to the `method` argument of `constrOptim`. If any items are not specified, the defaults are used.
`warn`	a logical indicating whether or not warnings should be provided for non-convergence or boundary values.
`...`	arguments to be used to form the default `control` argument if it is not supplied directly.

Details

logbin fits a generalised linear model (GLM) with a binomial error distribution and log link function. Predictors are assumed to be continuous, unless they are of class factor, or are character or logical (in which case they are converted to factors). Specifying a predictor as monotonic using the mono argument means that for continuous terms, the associated coefficient will be restricted to be non-negative, and for categorical terms, the coefficients will be non-decreasing in the order of the factor levels. This allows semi-parametric monotonic regression functions, in the form of unsmoothed step-functions. For smooth regression functions see logbin.smooth.

As well as allowing monotonicity constraints, the function is useful when a standard GLM routine, such as glm, fails to converge with a log-link binomial model. For convenience in comparing convergence on the same model, logbin can be used as a wrapper function to glm and glm2 through the method argument.

If glm does achieve successful convergence, and logbin converges to an interior point, then the two results will be identical. However, as illustrated in one of the examples below, glm may still experience convergence problems even when logbin converges to an interior point. Note that if logbin converges to a boundary point, then it may differ slightly from glm even if glm successfully converges, because of differences in the definition of the parameter space. logbin produces valid fitted values for covariate values within the Cartesian product of the observed range of covariate values, whereas glm produces valid fitted values just for the observed covariate combinations (assuming it successfully converges). This issue is only relevant when logbin converges to a boundary point. The adaptive barrier approach defines the parameter space in the same way as glm, so the same comments apply when comparing its results to those from method = "cem" or "em".

The main computational method is an EM-type algorithm which accommodates the parameter contraints in the model and is more stable than iteratively reweighted least squares. This is done in one of two ways, depending on the choice of the method argument.

method = "cem" implements a CEM algorithm (Marschner, 2014), in which a collection of restricted parameter spaces is defined that covers the full parameter space, and an EM algorithm is applied within each restricted parameter space in order to find a collection of restricted maxima of the log-likelihood function, from which can be obtained the global maximum over the full parameter space. See Marschner and Gillett (2012) for further details.

method = "em" implements a single EM algorithm on an overparameterised model, and the MLE of this model is transformed back to the original parameter space.

Acceleration of the EM algorithm in either case can be achieved through the methods of the turboem package, specified through the accelerate argument. However, note that these methods do not have the guaranteed convergence of the standard EM algorithm, particularly when the MLE is on the boundary of its (possibly constrained) parameter space.

Alternatively, an adaptive barrier method can be used by specifying method = "ab", which maximises the likelihood subject to constraints on the fitted values.

Value

logbin returns an object of class "logbin", which inherits from classes "glm" and "lm". The function summary.logbin can be used to obtain or print a summary of the results.

The generic accessor functions coefficients, fitted.values and residuals can be used to extract various useful features of the value returned by logbin. Note that effects will not work.

An object of class "logbin" is a list containing the same components as an object of class "glm" (see the "Value" section of glm). It also includes:

`loglik`	the maximised log-likelihood.
`aic.c`	a small-sample corrected version of Akaike's An Information Criterion (Hurvich, Simonoff and Tsai, 1998). This is used by `logbin.smooth` to choose the optimal number of knots for smooth terms.
`xminmax`	the minimum and maximum observed values for each of the continuous covariates, to help define the covariate space of the model.

As well as:

`np.coefficients`	estimated coefficients associated with the non-positive parameterisation corresponding to the MLE.
`nn.x`	non-negative model matrix associated with `np.coefficients`.
`coefhist`	(if `control$coeftrace = TRUE`), a matrix or list of matrices containing the coefficient estimates after each EM iteration.

Note

Due to the way in which the covariate space is defined in the CEM algorithm, models that include terms that are functionally dependent on one another — such as interactions and polynomials — may give unexpected results. Categorical covariates should always be entered directly as factors rather than dummy variables. 2-way interactions between factors can be included by calculating a new factor term that has levels corresponding to all possible combinations of the factor levels (see the Example). Non-linear relationships can be included by using logbin.smooth.

Author(s)

Mark W. Donoghoe markdonoghoe@gmail.com

References

Hurvich, C. M., J. S. Simonoff and C.-L. Tsai (1998). Smoothing parameter selection in non-parametric regression using an improved Akaike information criterion. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 60(2): 271–293.

Donoghoe, M. W. and I. C. Marschner (2018). logbin: An R package for relative risk regression using the log-binomial model. Journal of Statistical Software 86(9): 1–22.

Marschner, I. C. (2014). Combinatorial EM algorithms. Statistics and Computing 24(6): 921–940.

Marschner, I. C. and A. C. Gillett (2012). Relative risk regression: reliable and flexible methods for log-binomial models. Biostatistics 13(1): 179–192.

Examples

require(glm2)
data(heart)

#======================================================
#  Model with periodic non-convergence when glm is used
#======================================================

start.p <- sum(heart$Deaths) / sum(heart$Patients)

fit.glm <- glm(cbind(Deaths, Patients-Deaths) ~ factor(AgeGroup) + factor(Severity) +
  factor(Delay) + factor(Region), family = binomial(log), 
  start = c(log(start.p), rep(c(0.2, 0.4), 4)), data = heart,
  trace = TRUE, maxit = 100)

fit.logbin <- logbin(formula(fit.glm), data = heart, 
                     start = c(log(start.p), rep(c(0.2, 0.4), 4)),
                     trace = 1)
summary(fit.logbin)

# Speed up convergence by using single EM algorithm
fit.logbin.em <- update(fit.logbin, method = "em")

# Speed up convergence by using acceleration methods
fit.logbin.acc <- update(fit.logbin, accelerate = "squarem")
fit.logbin.em.acc <- update(fit.logbin.em, accelerate = "squarem")


#=============================
#  Model with interaction term
#=============================

heart$AgeSev <- 10 * heart$AgeGroup + heart$Severity

fit.logbin.int <- logbin(cbind(Deaths, Patients-Deaths) ~ factor(AgeSev) +
  factor(Delay) + factor(Region), data = heart, trace = 1, maxit = 100000)
  
summary(fit.logbin.int)
vcov(fit.logbin.int)
confint(fit.logbin.int)
summary(predict(fit.logbin.int, type = "response"))
anova(fit.logbin, fit.logbin.int, test = "Chisq")

logbin documentation built on April 12, 2025, 1:12 a.m.