lassoscore: Lasso penalized score test

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

Test for the association between y and each column of X, adjusted for the other columns using a lasso regression, as described in Voorman et al (2014).

Usage

1
2
3
lassoscore(y,X, lambda=0, family=c("gaussian","binomial","poisson"), 
    tol = .Machine$double.eps, maxit=1000, 
    resvar = NULL, verbose=FALSE, subset = NULL)

Arguments

y

outcome variable

X

matrix of predictors

lambda

tuning parameter value (see details)

family

The family, for the likelihood.

tol,maxit

convergence tolerance and maximum number of iterations in glmnet

resvar

value for the residual variance, for "gaussian" family. If not specified, the residual variance from lasso regression on all features is used (see details).

verbose

whether or not to print progress bars (defaults to FALSE)

subset

a subset of columns to test

Details

For each column of X, denoted by x*, this function computes the score statistic

T_λ = x*^T(y- yhat)/√ n,

where yhat are the fitted values from lasso regression of y on X[,-x*] (see Note 2).

The variance of the score statistic is estimated in 4 ways:

(i) a model-based estimate

(ii) a sandwich varaince

(iii/iv) conservative versions of (i) and (ii), which do not depend on the selected model

Note 1: in lasso regression of y on X, the coefficient of x* is non-zero if and only if

| T_λ | > λ √ n

Note 2: For lasso regression of y on X, we minimize -l(b) + lambda*||b||_1 over vectors b, where l(b) is either RSS/(2n) (for the "gaussian" family), or the log-likelihood for a generalized linear model. See the details of glmnet for more information.

Note 3:Each feature x is rescaled to have mean zero and x^Tx/n = 1, y is centered, but not rescaled.

Value

Object of class ‘lassoscore’, which is an R ‘list’, with elements:

fit

Elements of the fitted lasso regression of y on X (see glmnet for details.)

scores

the score statistics

resvar

the value used for the residual variance

scorevar.model

the variance of the score statistics, estimated using a model-based approximation

scorevar.sand

the variance of the score statistcs, using an model-agnostic, or sandwich formula

scorevar.model.cons,scorevar.sand.cons

conservative versions of the variances

p.model

p-value, using a model-based variance

p.sand

p-value, using sandwich variance

p.model.cons,p.sand.cons

p-value, using conservative variance formulas

Author(s)

Arend Voorman voorma@uw.edu

References

Voorman, A, Shojaie, A, and Witten D. Inference in high dimensions with the penalized score test. http://arxiv.org/abs/1401.2678.

See Also

glassoscore, qqpval

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
#Simulation from Voorman et al (2014)
set.seed(20)
n <- 300
p <- 100
q <- 10

set.seed(20)
beta <- numeric(p)
beta[sample(p,q)] <- 0.4

Sigma <- forceSymmetric(t(0.5^outer(1:p,1:p,"-")))
cSigma <- chol(Sigma)

x <- scale(replicate(p,rnorm(n))%*%cSigma)
y <- rnorm(n,x%*%beta,1)

mod <- lassoscore(y,x,0.02)
summary(mod)
plot(mod,type="all")

#test only features 10:20:
mod0 <- lassoscore(y,x,0.02, subset = 10:20)

######## Diabetes data set:
#Test features in the diabetes data set, using 2 different values of `lambda', 
#and compare results:
resvar <- with(lm(y~x,data=diabetes), sum(residuals^2)/df.residual)

mod2 <- with(diabetes,lassoscore(y,x,lambda=4,resvar=resvar))
mod3 <- with(diabetes,lassoscore(y,x,lambda=0.5,resvar=resvar))
data.frame(
  "variable"=colnames(diabetes$x),
  "lambda_4"=format(mod2$p.model,digits=2),
  "lambda_0.5"=format(mod3$p.model,digits=2))

Example output

Loading required package: glasso
Loading required package: glmnet
Loading required package: Matrix
Loading required package: foreach
Loaded glmnet 2.0-16

An object of class `lassoscore'
based on n = 300  observations on d = 100 
 features, with 66 non-zero coefficients in regression of `y' on `X'

Model-based p-values:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.0000  0.1951  0.4020  0.4499  0.7106  0.9970 

Sandwich p-values:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.0000  0.1597  0.3964  0.4462  0.7034  0.9973 
   variable lambda_4 lambda_0.5
1       age  8.7e-01    9.3e-01
2       sex  1.9e-03    1.6e-04
3       bmi  5.1e-22    3.2e-16
4       map  2.6e-08    5.5e-07
5        tc  1.4e-01    2.8e-03
6       ldl  2.1e-01    9.4e-01
7       hdl  3.6e-15    1.0e-02
8       tch  1.7e-01    3.1e-01
9       ltg  2.6e-18    2.0e-13
10      glu  5.7e-02    2.4e-01

lassoscore documentation built on May 2, 2019, 5:12 a.m.