Description Usage Arguments Details Value Author(s) References Examples
Compute p-values and confidence intervals for the lasso estimate, at a fixed value of the tuning parameter lambda
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
x |
Matrix of predictors (n by p); |
y |
Vector of outcomes (length n) |
beta |
Estimated lasso coefficients (e.g., from glmnet). This is of length p (so the intercept is not included as the first component). Be careful! This function uses the "standard" lasso objective 1/2 \|y - x β\|_2^2 + λ \|β\|_1. In contrast, glmnet multiplies the first term by a factor of 1/n.
So after running glmnet, to extract the beta corresponding to a value lambda,
you need to use |
lambda |
Value of lambda used to compute beta. See the above warning |
family |
Response type: "gaussian" (default), "binomial", or "cox" (for censored survival data) |
sigma |
Estimate of error standard deviation. If NULL (default), this is estimated
using the mean squared residual of the full least squares fit when n >= 2p, and
using the standard deviation of y when n < 2p. In the latter case, the user
should use |
alpha |
Significance level for confidence intervals (target is miscoverage alpha/2 in each tail) |
intercept |
Was the lasso problem solved (e.g., by glmnet) with an intercept in the model? Default is TRUE. Must be TRUE for "binomial" family. Not used for 'cox" family, where no intercept is assumed. |
add.targets |
Optional vector of predictors to be included as targets of inference, regardless of whether or not they are selected by the lasso. Default is NULL. |
status |
Censoring status for Cox model; 1=failurem 0=censored |
type |
Contrast type for p-values and confidence intervals: default is "partial"—meaning that the contrasts tested are the partial population regression coefficients, within the active set of predictors; the alternative is "full"—meaning that the full population regression coefficients are tested. The latter does not make sense when p > n. |
tol.beta |
Tolerance for determining if a coefficient is zero |
tol.kkt |
Tolerance for determining if an entry of the subgradient is zero |
gridrange |
Grid range for constructing confidence intervals, on the standardized scale |
bits |
Number of bits to be used for p-value and confidence interval calculations. Default is
NULL, in which case standard floating point calculations are performed. When not NULL,
multiple precision floating point calculations are performed with the specified number
of bits, using the R package |
verbose |
Print out progress along the way? Default is FALSE |
linesearch.try |
When running type="full" (i.e. debiased LASSO) how many attempts in the line search? |
This function computes selective p-values and confidence intervals for the lasso,
given a fixed value of the tuning parameter lambda.
Three different response types are supported: gaussian, binomial and Cox.
The confidence interval construction involves numerical search and can be fragile:
if the observed statistic is too close to either end of the truncation interval
(vlo and vup, see references), then one or possibly both endpoints of the interval of
desired coverage cannot be computed, and default to +/- Inf. The output tailarea
gives the achieved Gaussian tail areas for the reported intervals—these should be close
to alpha/2, and can be used for error-checking purposes.
Important!: Before running glmnet (or some other lasso-solver) x should be centered, that is x <- scale(X,TRUE,FALSE). In addition, if standardization of the predictors is desired, x should be scaled as well: x <- scale(x,TRUE,TRUE). Then when running glmnet, set standardize=F. See example below.
The penalty.factor facility in glmmet– allowing different penalties lambda for each predictor, is not yet implemented in fixedLassoInf. However you can finesse this— see the example below. One caveat- using this approach, a penalty factor of zero (forcing a predictor in) is not allowed.
Note that the coefficients and standard errors reported are unregularized. Eg for the Gaussian, they are the usual least squares estimates and standard errors for the model fit to the active set from the lasso.
type |
Type of coefficients tested (partial or full) |
lambda |
Value of tuning parameter lambda used |
pv |
One-sided P-values for active variables, uses the fact we have conditioned on the sign. |
ci |
Confidence intervals |
tailarea |
Realized tail areas (lower and upper) for each confidence interval |
vlo |
Lower truncation limits for statistics |
vup |
Upper truncation limits for statistics |
vmat |
Linear contrasts that define the observed statistics |
y |
Vector of outcomes |
vars |
Variables in active set |
sign |
Signs of active coefficients |
alpha |
Desired coverage (alpha/2 in each tail) |
sigma |
Value of error standard deviation (sigma) used |
call |
The call to fixedLassoInf |
Ryan Tibshirani, Rob Tibshirani, Jonathan Taylor, Joshua Loftus, Stephen Reid
Jason Lee, Dennis Sun, Yuekai Sun, and Jonathan Taylor (2013). Exact post-selection inference, with application to the lasso. arXiv:1311.6238.
Jonathan Taylor and Robert Tibshirani (2016) Post-selection inference for L1-penalized likelihood models. arXiv:1602.07358
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 | set.seed(43)
n = 50
p = 10
sigma = 1
x = matrix(rnorm(n*p),n,p)
x = scale(x,TRUE,TRUE)
beta = c(3,2,rep(0,p-2))
y = x%*%beta + sigma*rnorm(n)
# first run glmnet
gfit = glmnet(x,y,standardize=FALSE)
# extract coef for a given lambda; note the 1/n factor!
# (and we don't save the intercept term)
lambda = .8
beta = coef(gfit, x=x, y=y, s=lambda/n, exact=TRUE)[-1]
# compute fixed lambda p-values and selection intervals
out = fixedLassoInf(x,y,beta,lambda,sigma=sigma)
out
## as above, but use lar function instead to get initial
## lasso fit (should get same results)
lfit = lar(x,y,normalize=FALSE)
beta = coef(lfit, s=lambda, mode="lambda")
out2 = fixedLassoInf(x, y, beta, lambda, sigma=sigma)
out2
## mimic different penalty factors by first scaling x
set.seed(43)
n = 50
p = 10
sigma = 1
x = matrix(rnorm(n*p),n,p)
x=scale(x,TRUE,TRUE)
beta = c(3,2,rep(0,p-2))
y = x%*%beta + sigma*rnorm(n)
pf=c(rep(1,7),rep(.1,3)) #define penalty factors
pf=p*pf/sum(pf) # penalty factors should be rescaled so they sum to p
xs=scale(x,FALSE,pf) #scale cols of x by penalty factors
# first run glmnet
gfit = glmnet(xs, y, standardize=FALSE)
# extract coef for a given lambda; note the 1/n factor!
# (and we don't save the intercept term)
lambda = .8
beta_hat = coef(gfit, x=xs, y=y, s=lambda/n, exact=TRUE)[-1]
# compute fixed lambda p-values and selection intervals
out = fixedLassoInf(xs,y,beta_hat,lambda,sigma=sigma)
#rescale conf points to undo the penalty factor
out$ci=t(scale(t(out$ci),FALSE,pf[out$vars]))
out
#logistic model
set.seed(43)
n = 50
p = 10
sigma = 1
x = matrix(rnorm(n*p),n,p)
x=scale(x,TRUE,TRUE)
beta = c(3,2,rep(0,p-2))
y = x%*%beta + sigma*rnorm(n)
y=1*(y>mean(y))
# first run glmnet
gfit = glmnet(x,y,standardize=FALSE,family="binomial")
# extract coef for a given lambda; note the 1/n factor!
# (and here we DO include the intercept term)
lambda = .8
beta_hat = coef(gfit, x=x, y=y, s=lambda/n, exact=TRUE)
# compute fixed lambda p-values and selection intervals
out = fixedLassoInf(x,y,beta_hat,lambda,family="binomial")
out
# Cox model
set.seed(43)
n = 50
p = 10
sigma = 1
x = matrix(rnorm(n*p), n, p)
x = scale(x, TRUE, TRUE)
beta = c(3,2,rep(0,p-2))
tim = as.vector(x%*%beta + sigma*rnorm(n))
tim= tim-min(tim)+1
status=sample(c(0,1),size=n,replace=TRUE)
# first run glmnet
y = Surv(tim,status)
gfit = glmnet(x, y, standardize=FALSE, family="cox")
# extract coef for a given lambda; note the 1/n factor!
lambda = 1.5
beta_hat = as.numeric(coef(gfit, x=x, y=y, s=lambda/n, exact=TRUE))
# compute fixed lambda p-values and selection intervals
out = fixedLassoInf(x, tim, beta_hat, lambda, status=status, family="cox")
out
# Debiased lasso or "full"
n = 50
p = 100
sigma = 1
x = matrix(rnorm(n*p),n,p)
x = scale(x,TRUE,TRUE)
beta = c(3,2,rep(0,p-2))
y = x%*%beta + sigma*rnorm(n)
# first run glmnet
gfit = glmnet(x, y, standardize=FALSE, intercept=FALSE)
# extract coef for a given lambda; note the 1/n factor!
# (and we don't save the intercept term)
lambda = 2.8
beta = coef(gfit, x=x, y=y, s=lambda/n, exact=TRUE)[-1]
# compute fixed lambda p-values and selection intervals
out = fixedLassoInf(x, y, beta, lambda, sigma=sigma, type='full', intercept=FALSE)
out
# When n > p and "full" we use the full inverse
# instead of Javanmard and Montanari's approximate inverse
n = 200
p = 50
sigma = 1
x = matrix(rnorm(n*p),n,p)
x = scale(x,TRUE,TRUE)
beta = c(3,2,rep(0,p-2))
y = x%*%beta + sigma*rnorm(n)
# first run glmnet
gfit = glmnet(x, y, standardize=FALSE, intercept=FALSE)
# extract coef for a given lambda; note the 1/n factor!
# (and we don't save the intercept term)
lambda = 2.8
beta = coef(gfit, x=x, y=y, s=lambda/n, exact=TRUE)[-1]
# compute fixed lambda p-values and selection intervals
out = fixedLassoInf(x, y, beta, lambda, sigma=sigma, type='full', intercept=FALSE)
out
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.