qut: Fit a low dimensional GLM or Square-root lasso using the...
In qut: Quantile Universal Threshold

Description Usage Arguments Value Author(s) References See Also Examples

View source: R/qut.R

Variable selection with GLM-lasso or Square-root lasso choosing the penalty parameter λ with the Quantile Universal Threshold. The procedure goes towards sparse estimation of the coefficients for good selection of the important predictors.

qut(y,X,fit,family=gaussian,alpha.level=0.05,M=1000,qut.standardize=TRUE,
intercept=TRUE,offset=NULL,bootstrap=TRUE,sigma=ifelse(n>2*p,'ols','qut'),beta0='iterglm',
estimator='unbiased',type=c('glmnet','lars','flare'),lambda.seq=0,penalty.factor=rep(1,p),
lambda.min.ratio=ifelse(n<p,0.01,0.0001),nlambda=ifelse(type=='flare',2,100),
lambda=NULL,...)

`y`	response variable. Quantitative for family=`gaussian`, or family=`poisson` (non-negative counts). For family=`binomial` should be a factor with two levels.
`X`	input matrix, of dimension n x p; each row is an observation vector.
`fit`	a user supplied `glmnet` or lars object. Typical usage is to leave it empty so that the program computes the regularization path using the algorithm selected in `type`. WARNING: use with care, if supplied, object options must match with user supplied options.
`family`	response type (see above). Default is `gaussian`.
`alpha.level`	level, such that quantile τ=(1-`alpha.level`). Default is 0.05.
`M`	number of Monte Carlo Simulations to estimate the distribution Λ. Default is 1000.
`qut.standardize`	standardize matrix X with a quantile-based standardization. Default is TRUE. It is not used for sqrt-lasso.
`intercept`	should intercept(s) be fitted (default=TRUE) or set to zero (FALSE).
`offset`	a vector of length `n` that is included in the linear predictor. Useful for the `poisson` family (e.g. log of exposure time), or for refining a model by starting at a current fit. Default is NULL.
`bootstrap`	set TRUE if it is desired to bootstrap matrix X when computing the Quantile Universal Threshold (Random scenario). Default is TRUE.
`sigma`	standard deviation of the Gaussian errors. Used only if family=`gaussian`. When sigma = 'qut', it is estimated based on the Quantile Universal Threshold (default if `n <= 2p`); when sigma = 'rcv', it is estimated using Refitted Cross Validation in Fan et al. 2012; and when sigma = 'cv', it is estimated using cross validation as in Reid et al. 2013. If `sigma` is a positive real number, then that value is used for the standard deviation. If `n>p` and sigma='ols' it is estimated using the ordinary least squeares estimator (default if `n>2p`)
`beta0`	coefficients of the unpenalized covariates for generating the null data for the Quantile Universal Threshold. When is 'iterglm' (Default) or 'iter', it is estimated using one step iteration of the entire procedure with maximum likelihood estimation or the lasso estimation, respectively. If 'noiter' then it is estimated without iterating. If it is desired to set `beta0` in advance, then it should be a vector of size the number of unpenalized covariates including the intercept if `intercept=TRUE`, in the same order. If there are not unpenalized covariates and `intercept=TRUE`, then it must be a real number.
`estimator`	type of estimation of sigma when `sigma` = 'qut'. It can be equal to 'unbiased' (standard unbiased formula), or 'mle' (maximum likelihood formula).
`type`	algorithm for solving the optimization problem. It can be `lars` (`type`='lars') or `glmnet` (`type`='glmnet') for GLM-lasso, or `flare` (`type`='flare') for Square-root lasso. For GLM-lasso, if family is not gaussian, penalty.factor is different from default, or offset different from NULL, glmnet will be always used. Default is 'glmnet'.
`lambda.seq`	preset lambda sequence when type = 'glmnet'. If `lambda.seq`<2 the sequence of lambdas goes decreasing from `lambda.max` to `lambda.qut`. If `lambda.seq`= 0, `lambda` sequence is equispaced. If `lambda.seq`= 1, `lambda` sequence is equispaced in the log scale. Use `lambda.seq`=2 for `glmnet` default options. Default is 0.
`penalty.factor`	separate penalty factors can be applied to each coefficient. This is a number that multiplies lambda to allow differential shrinkage. Can be 0 for some variables, which implies no shrinkage, and that variable is always included in the model. Default is 1 for all variables (and implicitly infinity for variables listed in exclude). Note: the penalty factors are internally rescaled to sum to n, and the lambda sequence will reflect this change.
`lambda.min.ratio`	smallest value for lambda, as a fraction of `lambda.max`. As in `glmnet`.
`nlambda`	the number of `lambda`. As in `glmnet`. Default is 100.
`lambda`	a user supplied `lambda` sequence. As in `glmnet`. Not used when `type`='flare'.
`...`	`glmnet` or `lars` options.

`lambda`	value of the Quantile Universal Threshold.
`fit`	object fitted by `glmnet` or `lars`.
`beta`	coefficients obtained with the Quantile Universal Threshold.
`betaglm`	coefficients obtained fitting GLM with the non zero coefficients in `beta`.
`beta0`	estimated value of the intercept when family is not `gaussian`.
`family`	response type
`sigma`	standard deviation estimate of the errors (when family=`gaussian`)
`scale.factor`	scale factor used for standardizing X.

Jairo Diaz Rodriguez

C. Giacobino, J. Diaz, S. Sardy, N. Hengartner. Quantile universal threshold for model selection. 2016 Jianqing Fan, Shaojun Guo and Ning Hao. Variance estimation using refitted cross-validation in ultrahigh dimensional regression. Journal of the Royal Statistical Society: Series B. 2012 Stephen Reid, Robert Tibshirani, and Jerome Friedman. A Study of Error Variance Estimation in Lasso Regression. 2013

lambdaqut

set.seed(1234)
X=matrix(rnorm(50*500),50,500)
beta=c(rep(10,5),rep(0,500-5))
y=X %*% beta+rnorm(50)

outqut=qut(y,X,type='glmnet',family=gaussian,sigma=1) #Fitting with qut
betaqut=outqut$beta[-1]

outcv=cv.glmnet(X,y,family='gaussian') #fitting with Cross-Validation
betacv=coef(outcv$glmnet.fit,s=outcv$lambda.min)[-1]

results=rbind( 	c(sum(betaqut[1:5]!=0),sum(betaqut[-(1:5)]!=0)),
				c(sum( betacv[1:5]!=0), sum(betacv[-(1:5)]!=0)) )
colnames(results)=c('True Positive','False Positive')
rownames(results)=c('qut','cv')

print(results)