Logistic regression with a quadratic penalization on the coefficients

Share:

Description

This function fits a logistic regression model penalizing the size of the L2 norm of the coefficients.

Usage

1
2
3
  plr(x, y, weights = rep(1,length(y)),
      offset.subset = NULL, offset.coefficients = NULL,
      lambda = 1e-4, cp = "bic")

Arguments

x

matrix of features

y

binary response

weights

optional vector of weights for observations

offset.subset

optional vector of indices for the predictors for which the coefficients are preset to offset.coefficients. If offset.coefficients is not NULL, offset.subset must be provided.

offset.coefficients

optional vector of preset coefficient values for the predictors in offset.subset. If offset.coefficient is not NULL, offset.coefficients must be provided.

lambda

regularization parameter for the L2 norm of the coefficients. The minimizing criterion in plr is -log-likelihood+λ*\|β\|^2. Default is lambda=1e-4.

cp

complexity parameter to be used when computing the score. score=deviance+cp*df. If cp="aic" or cp="bic", these are converted to cp=2 or cp=log(sample size), respectively. Default is cp="bic".

Details

We proposed using logistic regression with a quadratic penalization on the coefficients for detecting gene interactions as described in "Penalized Logistic Regression for Detecting Gene Interactions (2008)" by Park and Hastie. However, this function plr may be used for a general purpose.

Value

A plr object is returned. predict, print, and summary functions can be applied.

coefficients

vector of the coefficient estimates

covariance

sandwich estimate of the covariance matrix for the coefficients

deviance

deviance of the fitted model

null.deviance

deviance of the null model

df

degrees of freedom of the fitted model

score

deviance + cp*df

nobs

number of observations

cp

complexity parameter used when computing the score

fitted.values

fitted probabilities

linear.predictors

linear predictors computed with the estimated coefficients

level

If any categorical factors are input, level - the list of level sets - is automatically generated and returned. See step.plr for details of how it is generated.

Author(s)

Mee Young Park and Trevor Hastie

References

Mee Young Park and Trevor Hastie (2008) Penalized Logistic Regression for Detecting Gene Interactions

See Also

predict.plr, step.plr

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
n <- 100

p <- 10
x <- matrix(rnorm(n*p),nrow=n)
y <- sample(c(0,1),n,replace=TRUE)
fit <- plr(x,y,lambda=1)

p <- 3
z <- matrix(sample(seq(3),n*p,replace=TRUE),nrow=n)
x <- data.frame(x1=factor(z[ ,1]),x2=factor(z[ ,2]),x3=factor(z[ ,3]))
y <- sample(c(0,1),n,replace=TRUE)
fit <- plr(x,y,lambda=1)
# 'level' is automatically generated. Check 'fit$level'.