pmlr: Penalized maximum likelihood estimation for multinomial...

Description Usage Arguments Details Value Note References See Also Examples

View source: R/pmlr.R

Description

Extends the approach proposed by Firth (1993) for bias reduction of MLEs in exponential family models to the multinomial logistic regression model with general covariate types. Modification of the logistic regression score function to remove first-order bias is equivalent to penalizing the likelihood by the Jeffreys prior, and yields penalized maximum likelihood estimates (PLEs) that always exist. Constrained hypothesis testing is conducted via likelihood ratio statistics. Profile or Wald confidence intervals (CI) are constructed for the PLEs.

Usage

1
2
3
pmlr(formula, data, weights = NULL, penalized = TRUE,
  method = c("likelihood", "wald", "score", "none")[1], joint = FALSE,
  CI.calculate = FALSE, CI.alpha = 0.05, tol = 1e-06, verbose = FALSE)

Arguments

formula

an object of class formula (or one that can be coerced to that class): a symbolic description of the model to be fitted. Typically, formula has the form response ~ terms where response is either a factor with J+1 levels (the first is used as the baseline category) or a J-column indicator matrix, and terms is a series of terms which specifies a linear predictor for the response.

data

a data frame containing the variables in the model

weights

an optional vector of weights to be used in the fitting process. Should be a numeric vector of counts in the case of frequency data.

penalized

a logical variable indicating whether penalized maximum likelihood should be used (default)

method

a character string specifying whether p-values and confidence intervals should be based on the profile likelihood, score or the Wald method. Must be one of “likelihood” (default), “wald”, “score” or “none”. Note that the “none” option provides parameter estimates and Wald standard errors only, without confidence intervals or hypothesis tests. This option may be useful for diagnostic purposes. In addition, the “score” option provides hypothesis tests only due to the rarity of score based confidence limits in practice.

joint

a logical variable indicating whether joint hypothesis tests should be performed in addition to individual parameter tests. If TRUE, the following three (constrained) hypotheses are tested additionally for covariates p = 1, ..., P:

H[0]: β[p1] = ... = β[pJ] = 0;

H[0]: β[p1] = ... = β[pJ];

H[0]: b[p,J] = J b[p,1], b[p,J-1] = (J-1) b[p,1], ... , b[p,2] = 2 b[p,1],

where b[p,j] is equivalent to β[pj]. This is meaningful only when the number J+1 of response categories is greater than 2. Thus, an error will occur if this is set to TRUE and binomial data is entered into the function.

CI.calculate

a logical variable whether the profile (if method="likelihood") or Wald (if method="wald") confidence limits should be computed. Default is FALSE.

CI.alpha

the significance level for the profile or Wald confidence intervals (default is 0.05). Meaningful only if CI.calcuate=TRUE.

tol

The tolerance for convergence can be specified for diagnostic purposes. The default is 1e-6. Note that this does not correspond (exactly) to the precision in each estimate, as the tolerance is defined as the sum of the absolute value of the elements of the ‘step’ =(A^-1[(t)])U(B^*[(t)]), where A^-1 [(t)] is inverse of the Fisher's information matrix, and B^* [(t)] is the vector of the penalized coefficient estimates, both evaluated at the t-th iteration in the modified Newton-Raphson algorithm. Nonetheless, higher tolerance values may reduce computation times and facilitate convergence in some cases.

verbose

Logical: TRUE displays parameter values after each iteration and states when completion of each routine has occurred. Default is FALSE.

Details

Logistic regression is one of the most widely used regression models in practice, but alternatives to conventional maximum likelihood estimation methods may be more appropriate for small or sparse samples. It is well known that the usual maximum likelihood estimates (MLEs) of the log odds ratios are biased in finite samples, and there is a non-zero probability that an MLE is infinite (i.e., does not exist). This corresponds to the problem of separation (Lesaffre and Albert, 1989).

In this package, we extend the approach proposed by Firth (1993) for bias reduction of MLEs in exponential family models to the multinomial logistic regression model with general covariate types. Modification of the logistic regression score function to remove first order bias is equivalent to penalizing the likelihood by the Jeffreys prior, and yields penalized likelihood estimates (PLEs) that always exist, even in samples in which MLEs are infinite. PLEs are an attractive alternative in small to moderate-sized samples, and are preferred to exact conditional MLEs when there are continuous covariates.

We consider a multicategory outcome y that is a multinomial variable with J + 1 categories. For each category j (j = 1, …, J) there is a regression function in which the log odds of response in category j, relative to category 0, is a linear function of regression parameters and a vector x of P covariates (including a constant): log{prob(y = j | x)/prob(y = 0 | x)} = β[j]^T x. Let y[i] be a Jx1 vector of indicators for the observed response category for observation i, with the corresponding Jx1 vector of probabilities Θ[j] = (Θ[1j], ..., Θ[Pj])^T. The vector of MLEs, \hat{B} = vec[(\hat{β}[1], ..., \hat{β}[J])^T], is estimated from observations (y[i],x[i]), i = 1, ..., n, by solving the score equations of the log-likelihood l(B). We denote the score function by U(B) and the PJxPJ Fisher information matrix by A = A(B).

The order n^-1 bias of estimates based on the usual likelihood L(B) is removed by applying the penalty |A|^{1/2}, and basing estimation on the penalized likelihood L^*(B) = L(B) |A|^{1/2}. The vector \hat{B}^* of penalized estimates (PLEs) is the solution to the score equations of the penalized log-likelihood l^*(B) = l(B) + 1/2 log{|A|}. The introduction of bias into the score function through the penalty removes the leading term in the asymptotis bias of the MLEs. The modified score function proposed by Firth for the binomial logistic model extends directly to the multinomial model as U^*(B) = U(B) - A b(B). The bias term b(B) is the leading term in the asymptotic bias of the multinomial MLEs, obtained from the Taylor series expansion of the log-likelihood (Cox and Snell, 1968) and is a function of the matrix of third derivatives of l(B) with respect to B (Bull et al., 2002).

The PLEs are obtained by a modified Fisher scoring algorithm. Using t to denote the iteration number, the modified iterative updating equations are:

B^* [(t+1)] = B^* [(t)] + A^-1 [(t)] U^*(B^* [(t)]) = B^*_[(t)] + b(B^* [(t)]) + A^-1 [(t)]U(B^* [(t)])

Thus, in comparison to the usual updating equation used to obtain the MLEs, the score function modification operates by applying the asymptotic bias correction at each step in the iterative process. This prevents estimates from going off to infinity and failing to converge when there is separation in the data.

Symmetric Wald-type CIs for β[jp] can be constructed using Var^{* 1/2}(hat{β}^* [jp]), for p = 1, ..., P and j = 1, ..., J, obtained from the inverse of A^*; however, performance is expected to be poor in situations, where separation is likely to occur. Asymmetric CIs for the PLEs can be constructed from the profile log-likelihood for β[jp], which is the function l^*[0](B(s)), where B(s) is the argument that maximizes l^* under the single-parameter constraint H[0]: β[jp] = s. The 100(1 - α)% CI for β[jp] is given by all parameter values that are compatible with the data (i.e., all s such that the likelihood ratio statistic LR[P](s) <= q, where q is the 1 - α percentile of the χ^2 distribution). This is equivalent to l^*[0](B(s)) >= l^*(\hat{B}^*) - (1/2)q. The endpoints of the interval are then found by numerically solving the equality for values of s. Based on the algorithm employed in SAS PROC LOGISTIC for MLEs, our method for finding these roots does not require computing l^*[0](B(s)), which in itself would involve maximizing l^*(B), but proceeds directly by solving the constrained optimization problem: maximize l^*(B) such that l^*_(B) = l^*(\hat{B}^*) - (1/2)q and β_[jp] = s. We, however, modify the starting values in the iterative scheme used by SAS to obtain a new algorithm that is slower, but simple and more robust (see Bull et al. 2007 for details).

Value

pmlr returns an object of class “pmlr”, a list with the following components:

call

the matched call.

method

a character string indicating the method used to carry out hypothesis testing and/or confidence limit calcuation.

separation

an array indicating coefficients for which separation occured; NA returned in the absense of separation.

converged

a logical value indicating whehter the fitting algorithm judged to have converged.

coefficients

an array containing the coefficients of the p parameters for the J categories.

var

an array containing the variance-covariance matrices for the J categories. If penalized=TRUE, var is obtained based on A^*, the information matrix for the PLEs.

CI.calculate

a logical value whether the confidence limits were computed.

CI.lower

returned only if CI.calculate=TRUE: an array containing the lower confidence limits from the individual parameter tests.

CI.upper

returned only if CI.calculate=TRUE: an array containing the upper confidence limits from the individual parameter tests.

statistic

an array containing the test statistics from the individual parameter tests.

pvalue

an array containing the p-values from the individual parameter tests.

logLik

the value of the log-likelihood function for the fitted model (under no constraints).

df

the degrees of freedom, i.e., the number of estimated parameters in the model.

converged.H0

an array containing the logical values whether the fitting algorithm for the null model for each inividual parameter is jusdged to have converged.

logLik.H0

an array containing the value of the log-likelihood function for the fitted model under the null hypothesis for each individual parameter.

joint

a logical value indicating whether the joint hypothesis tests were performed.

beta0all0

When a joint likelihood test of all betas = 0 is called, the estimated betas are provided. This is for information only and not displayed in the output. Not returned when method="wald".

var0all0

When a joint likelihood test of all betas = 0 is called, the estimated variances are also provided. This is for information only and not displayed in the output. Not returned when method="wald".

beta0allequal

same as beta0all0 except for the null hypothesis that all betas are equal.

var0allequal

same as var0all0 except for the null hypothesis that all betas are equal.

beta0proportion

same as beta0all0 except for the null hypothesis that all betas are proportional.

var0proportion

same as var0all0 except for the null hypothesis that all betas are proportional.

logLik.C

array contatining the values of log likelihood function under constraints: betas = 0; all betas are equal; and betas are proportional. This is for information only and not displayed in the output.

joint.test.all0

a list containing the following three components evaluated for the constrained hypothesis that all betas = 0.

joint.test.all0$h0

a character string describing the constrained hypothesis

joint.test.all0$converged

an array contatining whether the fitting algorithm achieved convergence under the constrained hypothesis for each term.

joint.test.all0$test.h0

an array contatining the test statistics and p-values from constrained hypothesis tests that all betas = 0.

joint.test.allequal

same as 'joint.test.all0' except for that all betas are equal, and for the additional component 'test.all0.vs.constraint' (see below).

joint.test.allequal$test.all0.vs.constraint

a data frame containing the test statistics and p-values obtained from comparing the likelihoods for all betas are equal vs. all betas = 0.

joint.test.proportion

same as 'joint.test.allequal' except for that the constrained hypothesis is that betas are proportional.

Note

This implementation is not a true scoring or Newton-type algorithm because it updates with the inverse of A, the Fisher information matrix for the MLEs, rather than the information for the PLEs, A^*, which includes an additional term corresponding to the second derivatives of the penalty: (1/2)log(|A|). As a result, convergence using the modified scoring algorithm for the PLEs is slower than a scoring algorithm based on A^*, which converges at a quadratic rate. For well behaved and larger datasets this usually means no more than 2 or 3 steps beyond that required for the MLEs. Starting values of β[pj] = 0 are used and are generally satisfactory. For smaller datasets (i.e., less than 50 observations) and especially datasets in which there are infinite MLEs, convergence is slower and could take up to 35 or 40 iterations. In datasets with separations, starting values other than 0 can lead to divergence problems.

The first argument to the pmlr function is a formula of the form response ~ terms where response can be either a J-column indicator matrix or a factor with J + 1 levels. In the case of frequency data (e.g., the hepatitis dataset) the baseline category is determined by the J-column indicator matrix with baseline category 0. In the case of data with individual records (e.g., the enzymes dataset) the baseline category is determined by the outcome, which is coded as a factor. The first level (i.e., levels(x)[1] is taken to be the baseline. Note that the default levels attribute of a factor is the unique set of values taken as.character(x), sorted into increasing order of x. For example, the enzymes dataset has categories 1, 2, and 3 with baseline category 1. The baseline category can be changed to category 2 via enzymes$Group <- factor(enzymes$Group, levels-c("2","1","3")).

References

Bull, S. B., Greenwood, C. M. T., and Hauck, W. W. (1997) Jacknife bias reduction for polychotomous logistic regression. Statistics in Medicine, 16, 545–560; Bull, S. B. (1997) Correction. Statistics in Medicine, 16, 2928.

Bull, S. B., Mak, C. and Greenwood, C. M. T. (2002) A modified score function estimator for multinomial logistic regression in small samples. Computational Statistics & Data Analysis, 39, 57–74.

Bull, S. B., Lewinger, J. P., Lee, S. S. F. (2005) Penalized maximum likelihood estimation for multinomial logistic regression using the Jeffreys prior. Technical Report No. 0505, Department of Statistics, University of Toronto.

Bull, S. B., Lewinger, J. P. and Lee, S. S. F. (2007) Confidence intervals for multinomial logistic regression in sparse data. Statistics in Medicine, 26, 903–918.

Cox, D. R. and Snell, E. J. (1968) A general definition of residuals. Journal of the Royal Statistical Society, Series B, 30, 248–275.

Firth, D. (1993) Bias reduction of maximum likelihood estimates. Biometrika, 80, 27–38.

Lesaffre, E. and Albert, A. (1989) Partial separation in logistic discrimination. Journal of the Royal Statistical Society, Series B, 51, 109–116.

SAS Institute Inc. (1999) SAS OnlineDoc, Version 8, The LOGISTIC Procedure, Confidence Intervals for Parameters, Chapter 39, Section 26, Cary, NC.

See Also

summary.pmlr

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
data(hepatitis)

# As reported in Bull et al. (2007)
fit <- pmlr(cbind(HCV, nonABC) ~ group + time + group:time, data = hepatitis,
weights = counts, method="score")
summary(fit)

data(enzymes)
# Exclude patients in Group 4 (post-necrotic cirrhosis)
enzymes <- enzymes[enzymes$Group != 4,]

# Center and scale covariates
AST <- scale(log(enzymes$AST))
ALT <- scale(log(enzymes$ALT))
GLDH <- scale(log(enzymes$GLDH))
OCT <- scale(log(enzymes$OCT))
enzymes <- data.frame(Patient = enzymes$Patient,
                      Group = enzymes$Group, AST, ALT, GLDH, OCT)

# Remove 10 observations to create separation
enzymes <- enzymes[-c(9, 18, 33, 58, 61, 77, 94, 97, 99, 100),]

# Multinomial: Acute viral hepatitis and aggressive chronic hepatits
# vs. persistent chronic hepatitis
# Assign Group 2 (persistent chronic hepatitis) as baseline category
enzymes$Group <- factor(enzymes$Group, levels=c("2","1","3"))
fit <- pmlr(Group ~ AST + GLDH, data = enzymes, method="wald", CI.calculate=TRUE)
summary(fit)

# Binomial: Acute viral hepatitis vs. persistent chronic hepatitis
# Exclude patients in Group 3 (agressive chronic hepatitis)
enzymes.1vs2 <- enzymes[enzymes$Group != 3,]
# Assign Group 2 (persistent chronic hepatitis) as baseline category
enzymes.1vs2$Group <- factor(enzymes.1vs2$Group, levels=c("2","1"))
fit <- pmlr(Group ~ AST + GLDH, data = enzymes.1vs2, method="none")
summary(fit)

# Binomial: Aggressive chronic hepatitis vs. persistent chronic hepatitis
# Exclude patients in Group 1 (acute viral hepatitis)
enzymes.3vs2 <- enzymes[enzymes$Group != 1,]
# Assign Group 2 (persistent chronic hepatitis) as baseline category
enzymes.3vs2$Group <- factor(enzymes.3vs2$Group, levels=c("2","3"))
fit <- pmlr(Group ~ AST + GLDH, data = enzymes.3vs2, method="none")
summary(fit)

jshinb/pmlr documentation built on May 20, 2019, 2:08 a.m.