RidgeBinaryLogistic: Ridge Binary Logistic Regression for Binary data

View source: R/RidgeBinaryLogistic.R

RidgeBinaryLogisticR Documentation

Ridge Binary Logistic Regression for Binary data

Description

This function performs a logistic regression between a dependent binary variable y and some independent variables x, solving the separation problem in this type of regression using ridge penalization.

Usage

RidgeBinaryLogistic(y, X = NULL, data = NULL, freq = NULL, 
tolerance = 1e-05, maxiter = 100, penalization = 0.2, 
cte = FALSE, ref = "first", bootstrap = FALSE, nmB = 100, 
RidgePlot = FALSE, MinLambda = 0, MaxLambda = 2, StepLambda = 0.1)

Arguments

y

A binary dependent variable or a formula

X

A set of independent variables when y is not a formula.

data

data frame for the formula

freq

frequencies for each observation (usually 1)

tolerance

Tolerance for convergence

maxiter

Maximum number of iterations

penalization

Ridige penalization: a non negative constant. Penalization used in the diagonal matrix to avoid singularities.

cte

Should the model have a constant?

ref

Category of reference

bootstrap

Should bootstrap confidence intervals be calculated?

nmB

Number of bootstrap samples.

RidgePlot

Should the ridge plot be plotted?

MinLambda

Minimum value of lambda for the rigge plot

MaxLambda

Maximum value of lambda for the rigge plot

StepLambda

Step for increasing the values of lambda

Details

Logistic Regression is a widely used technique in applied work when a binary, nominal or ordinal response variable is available, due to the fact that classical regression methods are not applicable to this kind of variables. The method is available in most of the statistical packages, commercial or free. Maximum Likelihood together with a numerical method as Newton-Raphson, is used to estimate the parameters of the model. In logistic regression, when in the space generated by the independent variables there are hyperplanes that separate among the individuals belonging to the different groups defined by the response, maximum likelihood does not converge and the estimations tend to the infinity. That is known in the literature as the separation problem in logistic regression. Even when the separation is not complete, the numerical solution of the maximum likelihood has stability problems. From a practical point of view, that means the estimated model is not accurate precisely when there should be a perfect, or almost perfect, fit to the data.

The problem of the existence of the estimators in logistic regression can be seen in Albert (1984), a solution for the binary case, based on the Firth method, Firth (1993) is proposed by Heinze(2002). The extension to nominal logistic model was made by Bull (2002). All the procedures were initially developed to remove the bias but work well to avoid the problem of separation. Here we have chosen a simpler solution based on ridge estimators for logistic regression Cessie(1992).

Rather than maximizing {L_j}(\left. {\bf{G}} \right|{{\bf{b}}_{j0}},{{\bf{B}}_j}) we maximize

{{L_j}(\left. {\bf{G}} \right|{{\bf{b}}_{j0}},{{\bf{B}}_j})} - \lambda \left( {\left\| {{{\bf{b}}_{j0}}} \right\| + \left\| {{{\bf{B}}_j}} \right\|} \right)

Changing the values of \lambda we obtain slightly different solutions not affected by the separation problem.

Value

An object of class RidgeBinaryLogistic with the following components

beta

Estimates of the coefficients

fitted

Fitted probabilities

residuals

Residuals of the model

Prediction

Predictions of presences and absences

Covariances

Covariances among the estimates

Deviance

Deviance of the current model

NullDeviance

Deviance of the null model

Dif

Difference between the deviances of the cirrent and null models

df

Degrees of freedom of the difference

p

p-value

CoxSnell

Cox-Snell pseudo R-squared

Nagelkerke

Nagelkerke pseudo R-squared

MacFaden

MacFaden pseudo R-squared

R2

Pseudo R-squared using the residuals

Classification

Classification table

PercentCorrect

Percentage of correct classification

Author(s)

Jose Luis Vicente Villardon

References

Agresti, A. (1990) An Introduction to Categorical Data Analysis. John Wiley and Sons, Inc.

Albert, A. and Anderson, J. A. (1984) On the existence of maximum likelihood estimates in logistic regression models. Biometrika, 71(1): 1-10.

Anderson, J. A. (1972), Separate sample logistic discrimination. Biometrika, 59(1): 19-35.

Anderson, J. A. & Philips P. R. (1981) Regression, discrimination and measurement models for ordered categorical variables. Appl. Statist, 30: 22-31.

Bull, S. B., Mk, C. & Greenwood, C. M. (2002) A modified score function for multinomial logistic regression. Computational Statistics and data Analysis, 39: 57-74.

Cortinhas Abrantes, J. & Aerts, M. (2012) A solution to separation for clustered binary data. Statistical Modelling, 12 (1): 3-27.

Cox, D. R. (1970), Analysis of Binary Data. Methuen. London.

Demey, J., Vicente-Villardon, J. L., Galindo, M.P. AND Zambrano, A. (2008) Identifying Molecular Markers Associated With Classification Of Genotypes Using External Logistic Biplots. Bioinformatics, 24(24): 2832-2838.

Firth D, (1993) Bias Reduction of Maximum Likelihood Estimates, Biometrika, Vol, 80, No, 1, (Mar,, 1993), pp, 27-38.

Fox, J. (1984) Linear Statistical Models and Related Methods. Wiley. New York.

Harrell, F. E. (2012). rms: Regression Modeling Strategies. R package version 3.5-0. http://CRAN.R-project.org/package=rms

Harrell, F. E. (2001). Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis (Springer Series in Statistics). Springer. New York.

Heinze G, and Schemper M, (2002) A solution to the problem of separation in logistic regresion. Statist. Med., 21:2409-2419

Heinze G. and Ploner M. (2004) Fixing the nonconvergence bug in logistic regression with SPLUS and SAS. Computer Methods and Programs in Biomedicine 71 p, 181-187

Heinze, G. (2006) A comparative investigation of methods for logistic regression with separated or nearly separated data. Statist. Med., 25:4216-4226.

Heinze, G. and Puhr, R. (2010) Bias-reduced and separation-proof conditional logistic regression with small or sparse data sets. Statist. Med. 29: 770-777.

Hoerl, A. E. and Kennard, R.W. (1971) Rige Regression: biased estimators for nonorthogonal problems. Technometrics, 21: 55 67.

Sun, H. and Wang S. Penalized logistic regression for high-dimensional DNA methylation data with case-control studies. Bioinformatics. 28 (10): 1368-1375.

Hosmer, D. and Lemeshow, L. (1989) Applied Logistic Regression. John Wiley and Sons. Inc.

Le Cessie, S. and Van Houwelingen, J.C. (1992) Ridge Estimators in Logistic Regression. Appl. Statist. 41 (1): 191-201.

Malo, N., Libiger, O. and Schork, N. J. (2008) Accommodating Linkage Disequilibrium in Genetic-Association Analyses via Ridge Regression. Am J Hum Genet. 82(2): 375-385.

Silvapulle, M. J. (1981) On the existence of maximum likelihood estimates for the binomial response models. J. R. Statist. Soc. B 43: 310-3.

Vicente-Villardon, J. L., Galindo, M. P. and Blazquez, A. (2006) Logistic Biplots. In Multiple Correspondence AnĂ¡lisis And Related Methods. Grenacre, M & Blasius, J, Eds, Chapman and Hall, Boca Raton.

Walter, S. and Duncan, D. (1967) Estimation of the probability of an event as a function of several variables. Biometrika. 54:167-79.

Wedderburn, R. W. M. (1976) On the existence and uniqueness of the maximum likelihood estimates for certain generalized linear models. Biometrika 63, 27-32.

Zhu, J. and Hastie, T. (2004) Classification of gene microarrays by penalized logistic regression. Biostatistics. 5(3):427-43.

Examples

# not yet

MultBiplotR documentation built on Nov. 21, 2023, 5:08 p.m.