pmlr: Penalized maximum likelihood estimation for multinomial...
In pmlr: Penalized Multinomial Logistic Regression

Description Usage Arguments Details Value Note Note References Examples

View source: R/pmlr.R

Extends the approach proposed by Firth (1993) for bias reduction of MLEs in exponential family models to the multinomial logistic regression model with general covariate types. Modification of the logistic regression score function to remove first-order bias is equivalent to penalizing the likelihood by the Jeffreys prior, and yields penalized maximum likelihood estimates (PLEs) that always exist. Hypothesis testing is conducted via likelihood ratio statistics. Profile confidence intervals (CI) are constructed for the PLEs.

1 2	pmlr(formula, data, weights = NULL, alpha = 0.05, penalized = TRUE, method = c("likelihood", "wald")[1], joint = FALSE)

`formula`	an object of class `formula` (or one that can be coerced to that class): a symbolic description of the model to be fitted. Typically, `formula` has the form `response ~ terms` where `response` is either a factor with J+1 levels (the first is used as the baseline category) or a J-column indicator matrix, and `terms` is a series of terms which specifies a linear predictor for the response.
`data`	a data frame containing the variables in the model
`weights`	an optional vector of weights to be used in the fitting process. Should be a numeric vector of counts in the case of frequency data.
`alpha`	the significance level (default is α = 0.05).
`penalized`	a logical variable indicating whether penalized maximum likelihood should be used (default)
`method`	a character string specifying whether p-values and confidence intervals should be based on the profile likelihood or the Wald method. Must be one of “likelihood” (default) or “wald”.
`joint`	a logical variable indicating whether joint hypothesis tests should be performed in addition to individual parameter tests. If `TRUE`, H_0: β_{1i} = \cdots = β_{Ji} = 0 and H_0: β_{1i} = \cdots = β_{Ji} are also tested for covariate i, i = 1, …, P.

Logistic regression is one of the most widely used regression models in practice, but alternatives to conventional maximum likelihood estimation methods may be more appropriate for small or sparse samples. It is well known that the usual maximum likelihood estimates (MLEs) of the log odds ratios are biased in finite samples, and there is a non-zero probability that an MLE is infinite (i.e., does not exist). This corresponds to the problem of separation (Lesaffre and Albert, 1989).

In this package, we extend the approach proposed by Firth (1993) for bias reduction of MLEs in exponential family models to the multinomial logistic regression model with general covariate types. Modification of the logistic regression score function to remove first order bias is equivalent to penalizing the likelihood by the Jeffreys prior, and yields penalized likelihood estimates (PLEs) that always exist, even in samples in which MLEs are infinite. PLEs are an attractive alternative in small to moderate-sized samples, and are preferred to exact conditional MLEs when there are continuous covariates.

We consider a multicategory outcome y that is a multinomial variable with J + 1 categories. For each category j (j = 1, …, J) there is a regression function in which the log odds of response in category j, relative to category 0, is a linear function of regression parameters and a vector \bold{x} of P covariates (including a constant): \log\{\mathrm{prob}(y = j| \bold{x})/\mathrm{prob}(y = 0 | \bold{x})\} = \bold{β}_{j}^T \bold{x}. Let \bold{y}_i be a J \times 1 vector of indicators for the observed response category for observation i, with the corresponding J \times 1 vector of probabilities \bold{Θ}_{i} = (Θ_{i1}, …, Θ_{iJ})^T. The vector of MLEs, \hat{B} = vec[(\bold{\hat{β}}_1, …, \bold{\hat{β}}_J)^T], is estimated from observations (\bold{y}_i,\bold{x}_i), i = 1, …, n, by solving the score equations of the log-likelihood l(B). We denote the score function by U(B) and the PJ \times PJ Fisher information matrix by A = A(B).

The order n^{-1} bias of estimates based on the usual likelihood L(B) is removed by applying the penalty |A|^{1/2}, and basing estimation on the penalized likelihood L^*(B) = L(B) |A|^{1/2}. The vector \hat{B}^* of penalized estimates (PLEs) is the solution to the score equations of the penalized log-likelihood l^*(B) = l(B) + \frac{1}{2} \log{|A|}. The introduction of bias into the score function through the penalty removes the leading term in the asymptotis bias of the MLEs. The modified score function proposed by Firth for the binomial logistic model extends directly to the multinomial model as U^*(B) = U(B) - A \, b(B). The bias term b(B) is the leading term in the asymptotic bias of the multinomial MLEs, obtained from the Taylor series expansion of the log-likelihood (Cox and Snell, 1968) and is a function of the matrix of third derivatives of l(B) with respect to B (Bull et al., 2002).

The PLEs are obtained by a modified Fisher scoring algorithm. Using t to denote the iteration number, the modified iterative updating equations are:

B^*_{(t+1)} = B^*_{(t)} + A^{-1}_{(t)}U^*(B^*_{(t)}) = B^*_{(t)} + b(B^*_{(t)}) + A^{-1}_{(t)}U(B^*_{(t)}).

Thus, in comparison to the usual updating equation used to obtain the MLEs, the score function modification operates by applying the asymptotic bias correction at each step in the iterative process. This prevents estimates from going off to infinity and failing to converge when there is separation in the data.

Symmetric Wald-type CIs for β_{jp} can be constructed using Var^{* 1/2}(\hat{β}^*_{jp}), obtained from the inverse of A^*, however, performance is expected to be poor in situations where separation is likely to occur. Asymmetric CIs for the PLEs can be constructed from the profile log-likelihood for β_{jp}, which is the function l^*_0(B(s)), where B(s) is the argument that maximizes l^* under the single-parameter constraint H_0: β_{jp} = s. The 100(1 - α)\% CI for β_{jp} is given by all parameter values that are compatible with the data (i.e., all s such that the likelihood ratio statistic LR_P(s) ≤ q, where q is the 1 - α percentile of the χ^2 distribution). This is equivalent to l^*_0(B(s)) ≥ l^*(\hat{B}^*) - \frac{1}{2} q. The endpoints of the interval are then found by numerically solving the equality for values of s. Based on the algorithm employed in SAS PROC LOGISTIC for MLEs, our method for finding these roots does not require computing l^*_0(B(s)), which in itself would involve maximizing l^*(B), but proceeds directly by solving the constrained optimization problem: maximize l^*(B) such that l^*_(B) = l^*(\hat{B}^*) - \frac{1}{2} q and β_{jp} = s. We, however, modify the starting values in the iterative scheme used by SAS to obtain a new algorithm that is slower, but simple and more robust (see Bull et al. 2007 for details).

pmlr returns an object of class “pmlr” with the following components:

`coefficients`	an array containing the coefficients of the P parameters for the J categories
`var`	an array containing the variance-covariance matrices for the J categories
`stat`	an array containing the test statistics from the individual parameter tests
`pval`	an array containing the p-values from the individual parameter tests
`CI.lower`	an array containing the lower confidence limits from the individual parameter tests
`CI.upper`	an array containing the upper confidence limits from the individual parameter tests
`separation`	an array indicating coefficients for which separation occured
`stat.joint`	an array containing the test statistics from the joint tests
`pval.joint`	an array containing the p-values from the joint tests
`call`	the matched call
`method`	a character string indicating the method used to compute p-values and confidence limits
`joint`	a logical variable indicating whether the joint hypothesis tests were performed

This implementation is not a true scoring or Newton-type algorithm because it updates with the inverse of A, the Fisher information matrix for the MLEs, rather than the information for the PLEs, A^*, which includes an additional term corresponding to the second derivatives of the penalty: \frac{1}{2} \log |A|. As a result, convergence using the modified scoring algorithm for the PLEs is slower than a scoring algorithm based on A^*, which converges at a quadratic rate. For well behaved and larger datasets this usually means no more than 2 or 3 steps beyond that required for the MLEs. Starting values of β{jp} = 0 are used and are generally satisfactory. For smaller datasets (i.e., less than 50 observations) and especially datasets in which there are infinite MLEs, convergence is slower and could take up to 35 or 40 iterations. In datasets with separations, starting values other than 0 can lead to divergence problems.

The first argument to the pmlr function is a formula of the form response ~ terms where response can be either a J-column indicator matrix or a factor with J + 1 levels. In the case of frequency data (e.g., the hepatitis dataset) the baseline category is determined by the J-column indicator matrix with baseline category 0. In the case of data with individual records (e.g., the enzymes dataset) the baseline category is determined by the outcome, which is coded as a factor. The first level (i.e., levels(x)[1] is taken to be the baseline. Note that the default levels attribute of a factor is the unique set of values taken as.character(x), sorted into increasing order of x. For example, the enzymes dataset has categories 1, 2, and 3 with baseline category 1. The baseline category can be changed to category 2 via enzymes$Group <- factor(enzymes$Group, levels-c("2","1","3")).

Bull, S. B., Greenwood, C. M. T., and Hauck, W. W. (1997) Jacknife bias reduction for polychotomous logistic regression. Statistics in Medicine, 16, 545–560; Bull, S. B. (1997) Correction. Statistics in Medicine, 16, 2928.

Bull, S. B., Mak, C. and Greenwood, C. M. T. (2002) A modified score function estimator for multinomial logistic regression in small samples. Computational Statistics & Data Analysis, 39, 57–74.

Bull, S. B., Lewinger, J. P., Lee, S. S. F. (2005) Penalized maximum likelihood estimation for multinomial logistic regression using the Jeffreys prior. Technical Report No. 0505, Department of Statistics, University of Toronto.

Bull, S. B., Lewinger, J. P. and Lee, S. S. F. (2007) Confidence intervals for multinomial logistic regression in sparse data. Statistics in Medicine, 26, 903–918.

Cox, D. R. and Snell, E. J. (1968) A general definition of residuals. Journal of the Royal Statistical Society, Series B, 30, 248–275.

Firth, D. (1993) Bias reduction of maximum likelihood estimates. Biometrika, 80, 27–38.

Lesaffre, E. and Albert, A. (1989) Partial separation in logistic discrimination. Journal of the Royal Statistical Society, Series B , 51, 109–116.

SAS Institute Inc. (1999) SAS OnlineDoc, Version 8, The LOGISTIC Procedure, Confidence Intervals for Parameters, Chapter 39, Section 26, Cary, NC.

# As reported in Bull et al. (2007)

data(hepatitis)
fit <- pmlr(cbind(HCV, nonABC) ~ group + time + group:time, 
  data = hepatitis, weights = counts)
summary(fit)

data(enzymes)
# Exclude patients in Group 4 (post-necrotic cirrhosis)
enzymes <- enzymes[enzymes$Group != 4,]
# Center and scale covariates
AST <- scale(log(enzymes$AST))
ALT <- scale(log(enzymes$ALT))
GLDH <- scale(log(enzymes$GLDH))
OCT <- scale(log(enzymes$OCT))
enzymes <- data.frame(Patient = enzymes$Patient,
  Group = enzymes$Group, AST, ALT, GLDH, OCT)
# Remove 10 observations to create separation
enzymes <- enzymes[-c(9, 18, 33, 58, 61, 77, 94, 97, 99, 100),]

# Multinomial: acute viral hepatitis and aggressive chronic hepatits
# vs. persistent chronic hepatitis
# Assign Group 2 (persistent chronic hepatitis) as baseline category
enzymes$Group <- factor(enzymes$Group, levels=c("2","1","3"))
fit <- pmlr(Group ~ AST + GLDH, data = enzymes)
summary(fit)

# Binomial: Acute viral hepatitis vs. persistent chronic hepatitis
# Exclude patients in Group 3 (agressive chronic hepatitis)
enzymes.1vs2 <- enzymes[enzymes$Group != 3,]
# Assign Group 2 (persistent chronic hepatitis) as baseline category
enzymes.1vs2$Group <- factor(enzymes.1vs2$Group, levels=c("2","1"))
fit <- pmlr(Group ~ AST + GLDH, data = enzymes.1vs2)
summary(fit)

# Binomial: Aggressive chronic hepatitis vs. persistent chronic hepatitis
# Exclude patients in Group 1 (acute viral hepatitis)
enzymes.3vs2 <- enzymes[enzymes$Group != 1,]
# Assign Group 2 (persistent chronic hepatitis) as baseline category
enzymes.3vs2$Group <- factor(enzymes.3vs2$Group, levels=c("2","3"))
fit <- pmlr(Group ~ AST + GLDH, data = enzymes.3vs2)
summary(fit)