# logistf: Firth's Bias-Reduced Logistic Regression In logistf: Firth's Bias-Reduced Logistic Regression

## Description

Implements Firth's bias-Reduced penalized-likelihood logistic regression.

## Usage

 ```1 2 3``` ```logistf(formula = attr(data, "formula"), data = sys.parent(), pl = TRUE, alpha = 0.05, control, plcontrol, firth = TRUE, init, weights, plconf = NULL, dataout = TRUE, ...) ```

## Arguments

 `formula` a formula object, with the response on the left of the operator, and the model terms on the right. The response must be a vector with 0 and 1 or FALSE and TRUE for the outcome, where the higher value (1 or TRUE) is modeled. It is possible to include contrasts, interactions, nested effects, cubic or polynomial splines and all S features as well, e.g. `Y ~ X1*X2 + ns(X3, df=4)`. From version 1.10, you may also include offset() terms. `data` a data.frame where the variables named in the formula can be found, i. e. the variables containing the binary response and the covariates. `pl` specifies if confidence intervals and tests should be based on the profile penalized log likelihood (`pl=TRUE`, the default) or on the Wald method (`pl=FALSE`). `alpha` the significance level (1-α the confidence level, 0.05 as default). `control` Controls Newton-Raphson iteration. Default is `control=` `logistf.control(maxstep,` `maxit, maxhs, lconv, gconv, xconv`) `plcontrol` Controls Newton-Raphson iteration for the estimation of the profile likelihood confidence intervals. Default is `plcontrol=` `logistpl.control(maxstep,` `maxit,` `maxhs, lconv, xconv, ortho, pr`) `firth` use of Firth's penalized maximum likelihood (`firth=TRUE`, default) or the standard maximum likelihood method (`firth=FALSE`) for the logistic regression. Note that by specifying `pl=TRUE` and `firth=FALSE` (and probably a lower number of iterations) one obtains profile likelihood confidence intervals for maximum likelihood logistic regression parameters. `init` specifies the initial values of the coefficients for the fitting algorithm. `weights` specifies case weights. Each line of the input data set is multiplied by the corresponding element of `weights`. `plconf` specifies the variables (as vector of their indices) for which profile likelihood confidence intervals should be computed. Default is to compute for all variables. `dataout` If TRUE, copies the `data` set to the output object. `...` Further arguments to be passed to logistf.

## Details

`logistf` is the main function of the package. It fits a logistic regression model applying Firth's correction to the likelihood. The following generic methods are available for `logistf`'s output object: `print`, `summary`, `coef`, `vcov`, `confint`, `anova`, `extractAIC`, `add1`, `drop1`, `profile`, `terms`, `nobs`. Furthermore, `forward` and `backward` functions perform convenient variable selection. Note that `anova`, `extractAIC`, `add1`, `drop1`, `forward` and `backward` are based on penalized likelihood ratios.

## Value

The object returned is of the class logistf and has the following attributes:

 `coefficients` the coefficients of the parameter in the fitted model. `alpha` the significance level (1- the confidence level) as specified in the input. `terms` the column names of the design matrix `var` the variance-covariance-matrix of the parameters. `df` the number of degrees of freedom in the model. `loglik` a vector of the (penalized) log-likelihood of the full and the restricted models. `iter` the number of iterations needed in the fitting process. `n` the number of observations. `y` the response-vector, i. e. 1 for successes (events) and 0 for failures. `formula` the formula object. `call` the call object. `terms` the model terms (column names of design matrix). `linear.predictors` a vector with the linear predictor of each observation. `predict` a vector with the predicted probability of each observation. `hat.diag` a vector with the diagonal elements of the Hat Matrix. `conv` the convergence status at last iteration: a vector of length 3 with elements: last change in log likelihood, max(abs(score vector)), max change in beta at last iteration. `method` depending on the fitting method ‘Penalized ML’ or ‘Standard ML’. `method.ci` the method in calculating the confidence intervals, i.e. ‘profile likelihood’ or ‘Wald’, depending on the argument pl. `ci.lower` the lower confidence limits of the parameter. `ci.upper` the upper confidence limits of the parameter. `prob` the p-values of the specific parameters. `pl.iter` only if pl==TRUE: the number of iterations needed for each confidence limit. `betahist` only if pl==TRUE: the complete history of beta estimates for each confidence limit. `pl.conv` only if pl==TRUE: the convergence status (deviation of log likelihood from target value, last maximum change in beta) for each confidence limit.

If `dataout=TRUE`, additionally:

 `data` a copy of the input data set `weights` the weights variable (if applicable)

## Author(s)

Georg Heinze and Meinhard Ploner

## References

Firth D (1993). Bias reduction of maximum likelihood estimates. Biometrika 80, 27–38.

Heinze G, Schemper M (2002). A solution to the problem of separation in logistic regression. Statistics in Medicine 21: 2409-2419.

Heinze G, Ploner M (2003). Fixing the nonconvergence bug in logistic regression with SPLUS and SAS. Computer Methods and Programs in Biomedicine 71: 181-187.

Heinze G, Ploner M (2004). Technical Report 2/2004: A SAS-macro, S-PLUS library and R package to perform logistic regression without convergence problems. Section of Clinical Biometrics, Department of Medical Computer Sciences, Medical University of Vienna, Vienna, Austria. http://www.meduniwien.ac.at/user/georg.heinze/techreps/tr2_2004.pdf

Heinze G (2006). A comparative investigation of methods for logistic regression with separated or nearly separated data. Statistics in Medicine 25: 4216-4226.

Venzon DJ, Moolgavkar AH (1988). A method for computing profile-likelihood based confidence intervals. Applied Statistics 37:87-94.

`drop1.logistf` `add1.logistf` `anova.logistf`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35``` ```data(sex2) fit<-logistf(case ~ age+oc+vic+vicl+vis+dia, data=sex2) summary(fit) nobs(fit) drop1(fit) plot(profile(fit,variable="dia")) extractAIC(fit) fit1<-update(fit, case ~ age+oc+vic+vicl+vis) extractAIC(fit1) anova(fit,fit1) data(sexagg) fit2<-logistf(case ~ age+oc+vic+vicl+vis+dia, data=sexagg, weights=COUNT) summary(fit2) # simulated SNP example # not run set.seed(72341) snpdata<-rbind( matrix(rbinom(2000,2,runif(2000)*0.3),100,20), matrix(rbinom(2000,2,runif(2000)*0.5),100,20)) colnames(snpdata)<-paste("SNP",1:20,"_",sep="") snpdata<-as.data.frame(snpdata) for(i in 1:20) snpdata[,i]<-as.factor(snpdata[,i]) snpdata\$case<-c(rep(0,100),rep(1,100)) fitsnp<-logistf(data=snpdata, formula=case~1, pl=FALSE) add1(fitsnp) fitf<-forward(fitsnp) fitf ```