# logistf: Firth's Bias-Reduced Logistic Regression In logistf: Firth's Bias-Reduced Logistic Regression

## Description

Implements Firth's bias-Reduced penalized-likelihood logistic regression.

## Usage

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15``` ```logistf( formula, data, pl = TRUE, alpha = 0.05, control, plcontrol, firth = TRUE, init, weights, plconf = NULL, flic = FALSE, model = TRUE, ... ) ```

## Arguments

 `formula` A formula object, with the response on the left of the operator, and the model terms on the right. The response must be a vector with 0 and 1 or `FALSE` and `TRUE` for the outcome, where the higher value (1 or `TRUE`) is modeled. It is possible to include contrasts, interactions, nested effects, cubic or polynomial splines and all S features as well, e.g. Y ~ X1*X2 + ns(X3, df=4). `data` A data.frame where the variables named in the formula can be found, i. e. the variables containing the binary response and the covariates. `pl` Specifies if confidence intervals and tests should be based on the profile penalized log likelihood (`pl=TRUE`, the default) or on the Wald method (`pl=FALSE`). `alpha` The significance level (1-α the confidence level, 0.05 as default). `control` Controls Newton-Raphson iteration. Default is ```control= logistf.control(maxstep, maxit, maxhs, lconv, gconv, xconv)``` `plcontrol` Controls Newton-Raphson iteration for the estimation of the profile likelihood confidence intervals. Default is ```plcontrol= logistpl.control(maxstep, maxit, maxhs, lconv, xconv, ortho, pr)``` `firth` Use of Firth's penalized maximum likelihood (`firth=TRUE`, default) or the standard maximum likelihood method (`firth=FALSE`) for the logistic regression. Note that by specifying `pl=TRUE` and `firth=FALSE` (and probably a lower number of iterations) one obtains profile likelihood confidence intervals for maximum likelihood logistic regression parameters. `init` Specifies the initial values of the coefficients for the fitting algorithm `weights` specifies case weights. Each line of the input data set is multiplied by the corresponding element of weights `plconf` specifies the variables (as vector of their indices) for which profile likelihood confidence intervals should be computed. Default is to compute for all variables `flic` If `TRUE`, intercept is altered such that the predicted probabilities become unbiased while keeping all other coefficients constant `model` If TRUE the corresponding components of the fit are returned. `...` Further arguments to be passed to `logistf`

## Details

`logistf` is the main function of the package. It fits a logistic regression model applying Firth's correction to the likelihood. The following generic methods are available for logistf's output object: ```print, summary, coef, vcov, confint, anova, extractAIC, add1, drop1, profile, terms, nobs, predict```. Furthermore, forward and backward functions perform convenient variable selection. Note that anova, extractAIC, add1, drop1, forward and backward are based on penalized likelihood ratios.

## Value

The object returned is of the class `logistf` and has the following attributes:

 `coefficients` the coefficients of the parameter in the fitted model. `alpha` the significance level (1- the confidence level) as specified in the input. `terms` the column names of the design matrix `var` the variance-covariance-matrix of the parameters. `df` the number of degrees of freedom in the model. `loglik` a vector of the (penalized) log-likelihood of the restricted and the full models. `iter` the number of iterations needed in the fitting process. `n` the number of observations. `y` the response-vector, i. e. 1 for successes (events) and 0 for failures. `formula` the formula object. `call` the call object. `terms` the model terms (column names of design matrix). `linear.predictors` a vector with the linear predictor of each observation. `predict` a vector with the predicted probability of each observation. `hat.diag` a vector with the diagonal elements of the Hat Matrix. `conv` the convergence status at last iteration: a vector of length 3 with elements: last change in log likelihood, max(abs(score vector)), max change in beta at last iteration. `method` depending on the fitting method 'Penalized ML' or ‘Standard ML’. `method.ci` the method in calculating the confidence intervals, i.e. ‘profile likelihood’ or ‘Wald’, depending on the argument pl and plconf. `ci.lower` the lower confidence limits of the parameter. `ci.upper` the upper confidence limits of the parameter. `prob` the p-values of the specific parameters. `pl.iter` only if pl==TRUE: the number of iterations needed for each confidence limit. `betahist` only if pl==TRUE: the complete history of beta estimates for each confidence limit. `pl.conv` only if pl==TRUE: the convergence status (deviation of log likelihood from target value, last maximum change in beta) for each confidence limit. `control` a copy of the control parameters. `flic` logical, is TRUE if intercept was altered such that the predicted probabilities become unbiased while keeping all other coefficients constant. According to input of logistf. `model` if requested (the default), the model frame used.

## Author(s)

Georg Heinze and Meinhard Ploner

## References

Firth D (1993). Bias reduction of maximum likelihood estimates. Biometrika 80, 27-38. Heinze G, Schemper M (2002). A solution to the problem of separation in logistic regression. Statistics in Medicine 21: 2409-2419.

Heinze G, Ploner M (2003). Fixing the nonconvergence bug in logistic regression with SPLUS and SAS. Computer Methods and Programs in Biomedicine 71: 181-187.

Heinze G, Ploner M (2004). Technical Report 2/2004: A SAS-macro, S-PLUS library and R package to perform logistic regression without convergence problems. Section of Clinical Biometrics, Department of Medical Computer Sciences, Medical University of Vienna, Vienna, Austria. http://www.meduniwien.ac.at/user/georg.heinze/techreps/tr2_2004.pdf

Heinze G (2006). A comparative investigation of methods for logistic regression with separated or nearly separated data. Statistics in Medicine 25: 4216-4226.

Venzon DJ, Moolgavkar AH (1988). A method for computing profile-likelihood based confidence intervals. Applied Statistics 37:87-94.

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30``` ```data(sex2) fit<-logistf(case ~ age+oc+vic+vicl+vis+dia, data=sex2) summary(fit) nobs(fit) drop1(fit) plot(profile(fit,variable="dia")) extractAIC(fit) fit1<-update(fit, case ~ age+oc+vic+vicl+vis) extractAIC(fit1) anova(fit,fit1) data(sexagg) fit2<-logistf(case ~ age+oc+vic+vicl+vis+dia, data=sexagg, weights=COUNT) summary(fit2) # simulated SNP example set.seed(72341) snpdata<-rbind( matrix(rbinom(2000,2,runif(2000)*0.3),100,20), matrix(rbinom(2000,2,runif(2000)*0.5),100,20)) colnames(snpdata)<-paste("SNP",1:20,"_",sep="") snpdata<-as.data.frame(snpdata) for(i in 1:20) snpdata[,i]<-as.factor(snpdata[,i]) snpdata\$case<-c(rep(0,100),rep(1,100)) fitsnp<-logistf(data=snpdata, formula=case~1, pl=FALSE) add1(fitsnp, scope=paste("SNP",1:20,"_",sep="")) fitf<-forward(fitsnp, scope = paste("SNP",1:20,"_",sep="")) fitf ```