d2wlasso: d2wlasso package
In rakheon/d2wlasso: Structured variable selection with weighted lasso

Description Usage Arguments Value References Examples

This package provides functions to perform variable selection with weighted lasso for both linear regression and the Cox proportional hazards regression. The weights are chosen to direct the variable selection procedure so that covariates that are highly associated with the response are likely to be selected and covariates that weakly associated with the response are less likely to be selected. Association between the response and the covariates is based on results from simpler linear/Cox regressions between the response and each covariate, and include, for example, q-values, partial correlation coefficients, and t-statistics of regression coefficients.

Performs variable selection with covariates multiplied by weights that direct which variables are likely to be associated with the response.

d2wlasso(
  x,
  z,
  y,
  cox.delta = NULL,
  factor.z = TRUE,
  regression.type = c("linear", "cox")[1],
  weight.type = c("one", "corr.estimate", "corr.pvalue", "corr.bh.pvalue", "corr.tstat",
    "corr.qvalue", "parcor.estimate", "parcor.pvalue", "parcor.bh.pvalue", "parcor.tstat",
    "parcor.qvalue", "exfrequency.random.partition.aic",
    "exfrequency.random.partition.bic", "exfrequency.kmeans.partition.aic",
    "exfrequency.kmeans.partition.bic", "exfrequency.kquartiles.partition.aic",
    "exfrequency.kquartiles.partition.bic", "exfrequency.ksorted.partition.aic",
    "exfrequency.ksorted.partition.bic")[1],
  weight_fn = function(x) {     x },
  ttest.pvalue = TRUE,
  q_opt_tuning_method = c("bootstrap", "smoother")[2],
  qval.alpha = 0.15,
  alpha.bh = 0.05,
  robust = TRUE,
  show.plots = FALSE,
  pi0.known = FALSE,
  pi0.val = 0.9,
  penalty.choice = c("cv.mse", "cv.penalized.loss", "penalized.loss",
    "deviance.criterion")[3],
  est.MSE = c("est.var", "step")[1],
  cv.folds = 10,
  mult.cv.folds = 0,
  penalized.loss.delta = 2,
  nboot = 100,
  k.split = 4,
  step.direction = "backward"
)

`x`	(n by m) matrix of main covariates where m is the number of covariates and n is the sample size.
`z`	(n by 1) matrix of additional fixed covariate affecting response variable. This covariate should always be selected. Can be NULL.
`y`	(n by 1) a matrix corresponding to the response variable. If `regression.type` is "cox", `y` contains the observed event times.
`cox.delta`	(n by 1) a matrix that denotes censoring when `regression.type` is "cox" (1 denotes survival event is observed, 0 denotes the survival event is censored). Can be NULL.
`factor.z`	logical. If TRUE, the fixed variable z is a factor variable.
`regression.type`	a character indicator that is either "linear" for linear regression or "cox" for Cox proportional hazards regression. Default is "linear".
`weight.type`	Character value denoting which weights to be used for the weighted lasso, where each covariate in `x` is multiplied by a scalar weight. Options include one:The scalar weight is one. corr.estimate:The scalar weight for covariate x_j is the Pearson correlation between x_j and y. corr.pvalue:The scalar weight for covariate x_j is the p-value of the coefficient of x_j in the regression of y on x_j corr.bh.pvalue:The scalar weight for covariate x_j is the Benjanmini-Hocbherg adjusted p-value from `corr.pvalue`. corr.qvalue:The scalar weight for covariate x_j is the q-value transform of the p-value from `corr.pvalue`. corr.tstat:The scalar weight for covariate x_j is the t-statistic associated with testing the significance of x_j in the regression of y on x_j. parcor.estimate:The scalar weight for covariate x_j is the partial correlation between x_j and y after adjustment for z. parcor.pvalue:The scalar weight for covariate x_j is the p-value of the coefficient of x_j in the regression of y on z and x_j parcor.bh.pvalue:The scalar weight for covariate x_j is the Benjanmini-Hocbherg adjusted p-value from `parcor.pvalue`. parcor.qvalue:The scalar weight for covariate x_j is the q-value transform of the p-value from `parcor.pvalue`. parcor.tstat:The scalar weight for covariate x_j is the t-statistic associated with testing the significance of x_j in the regression of y on z and x_j. exfrequency.random.partition.aic:The scalar weight for covariate x_j is an exclusion frequency. The exclusion frequency is obtained as follows: we first partition the covariates into `k.split` random groups, and we apply a stepwise linear/Cox regression of the response on each partition set of covariate. The final model is selected using an AIC criterion, and we track if x_j is excluded from the final model. We repeat this procedure `nboot` times and the exclusion frequency is the average number of times x_j is excluded. exfrequency.random.partition.bic:The scalar weight for covariate x_j is computed as in exfrequency.random.partition.aic, except that the final model within each stepwise regression is selected using a BIC criterion. exfrequency.kmeans.partition.aic:The scalar weight for covariate x_j is an exclusion frequency. The exclusion frequency is obtained as follows: we apply ridge regression of the response on all covariates and obtain ridge regression coefficients for each covariate. We then partitioned the covariates into `k.split` groups using a K-means criterion on the ridge regression coefficients, and we applied a stepwise linear/Cox regression of the response on each partition set of covariate. The final model is selected using an AIC criterion, and we track if x_j is excluded from the final model. We repeat this procedure `nboot` times and the exclusion frequency is the average number of times x_j is excluded. exfrequency.kmeans.partition.bic:The scalar weight for covariate x_j is computed as in exfrequency.kmeans.partition.aic, except that the final model within each stepwise regression is selected using a BIC criterion. exfrequency.kquartile.partition.aic:The scalar weight for covariate x_j is an exclusion frequency. The exclusion frequency is obtained as follows: we apply ridge regression of the response on all covariates and obtain ridge regression coefficients for each covariate. We then partitioned the covariates into `k.split` groups using k-quantiles of the ridge regression coefficients, and we applied a stepwise linear/Cox regression of the response on each partition set of covariate. The final model is selected using an AIC criterion, and we track if x_j is excluded from the final model. We repeat this procedure `nboot` times and the exclusion frequency is the average number of times x_j is excluded. exfrequency.kquartiles.partition.bic:The scalar weight for covariate x_j is computed as in exfrequency.kquartiles.partition.aic, except that the final model within each stepwise regression is selected using a BIC criterion. exfrequency.ksorted.partition.aic:The scalar weight for covariate x_j is an exclusion frequency. The exclusion frequency is obtained as follows: we apply ridge regression of the response on all covariates and obtain ridge regression coefficients for each covariate. We then partitioned the covariates into `k.split` groups by first ordering the ridge regression coefficients in descending order and splitting them into `k.split` groups. We then applied a stepwise linear/Cox regression of the response on each partition set of covariate. The final model is selected using an AIC criterion, and we track if x_j is excluded from the final model. We repeat this procedure `nboot` times and the exclusion frequency is the average number of times x_j is excluded. exfrequency.ksorted.partition.bic:The scalar weight for covariate x_j is computed as in exfrequency.ksorted.partition.aic, except that the final model within each stepwise regression is selected using a BIC criterion.
`weight_fn`	A user-defined function to be applied to the weights for the weighted lasso. Default is an identify function.
`ttest.pvalue`	logical indicator used when `weight.type` is "corr.pvalue","corr.bh.pvalue", "corr.qvalue", "parcor.pvalue","parcor.bh.pvalue","parcor.qvalue". If TRUE, p-value for each covariate is computed from univariate linear/cox regression of the response on each covariate. If FALSE, the p-value is computed from correlation coefficients between the response and each covariate. Default is FALSE.
`q_opt_tuning_method`	character indicator used when `weight.type` is "corr.qvalue" or "parcor.qvalue". Options are "bootstrap" or "smoother" to specify how the optimal tuning parameter is obtained when computing q-values from Storey and Tibshirani (2003). Default is "smoother" (smoothing spline).
`qval.alpha`	scalar value used when `weight.type` is "corr.qvalue" or "parcor.qvalue". The choice of `qval.alpha` indicates the cut-off for q-values used to obtain the result `threshold.selection` The result `threshold.selection` contains all covariates for which their q-value is less than `qval.alpha`.
`alpha.bh`	scalar value used when `weight.type` is "corr.pvalue","corr.bh.pvalue", "parcor.pvalue", "parcor.bh.pvalue". The choice of `alpha.bh` indicates the cut-off for p-values used to obtain the result in `threshold.selection`. The result `threshold.selection` contains all covariates for which their p-value is less than `alpha.bh`.
`robust`	logical indicator used when `weight.type` is "corr.qvalue" or "parcor.qvalue". If TRUE, q-values computed as in Storey and Tibshirani (2003) are robust for small p-values.
`show.plots`	logical indicator. When `weight.type` is "corr.qvalue" or "parcor.qvalue", `show.plots` refers to figures associated with q-value computations as proposed in Storey and Tibshirani (2003). If `show.plots` is TRUE, we display the density histogram of original p-values, density histogram of the q-values, scatter plot of \hatπ versus λ in the computation of q-values, and scatter plot of significant tests versus q-value cut-off. When `penalty.choice` is "penalized.loss", `show.plots` refers to plots associated with the penalized loss criterion. If TRUE, a plot of the penalized loss criterion versus steps in the LARS algorithm of Efron et al (2004) is displayed. Default of `show.plots` is FALSE.
`pi0.known`	logical indicator used when `weight.type` is "corr.qvalue" or "parcor.qvalue". If TRUE, when computing q-values, the estimate of the true proportion of the null hypothesis is set to the value of pi0.val given by the user. If FALSE, the estimate of the true proportion of the null hypothesis is computed by bootstrap or smoothing spline as proposed in Storey and Tibshirani (2003). Default is FALSE.
`pi0.val`	scalar used when `weight.type` is "corr.qvalue" or "parcor.qvalue". A user supplied estimate of the true proportion of the null hypothesis. Used only when pi0.known is TRUE. Default is 0.9.
`penalty.choice`	character that indicates the variable selection criterion. Options are "cv.mse" for the K-fold cross-validated mean squared prediction error, "penalized.loss" for the penalized loss criterion which requires specification of the penalization parameter `penalized.loss.delta`, "cv.penalized.loss" for the K-fold cross-validated criterion to determine delta in the penalized loss criterion, and "deviance.criterion" for optimizing the Cox proportional hazards deviance (only available when `regression.type` is "cox".) Defalt is "penalized.loss".
`est.MSE`	character that indicates how the mean squared error is estimated in the penalized loss criterion when `penalty.choice` is "penalized.loss" or "cv.penalized.loss". Options are "est.var" which means the MSE is sd(y) * sqrt(n/(n-1)) where n is the sample size, and "step" which means we use the MSE from forward stepwise regression with AIC as the selection criterion. Default is "est.var".
`cv.folds`	scalar denoting the number of folds for cross-validation when `penalty.choice` is "cv.mse" or "cv.penalized.loss". Default is 10.
`mult.cv.folds`	scalar denoting the number of times we repeat the cross-validation procedures of `penalty.choice` being "cv.mse" or "cv.penalized.loss". Default is 0.
`penalized.loss.delta`	scalar to indicate the choice of the penalization parameter delta in the penalized loss criterion when `penalty.choice` is "penalized.loss".
`nboot`	scalar denoting the number of bootstrap samples obtained for exclusion frequency weights when `weight.type` is "exfrequency.random.partition.aic", "exfrequency.random.partition.bic", "exfrequency.kmeans.partition.aic", "exfrequency.kmeans.partition.bic","exfrequency.kquartiles.partition.aic", "exfrequency.kquartiles.partition.bic","exfrequency.ksorted.partition.aic","exfrequency.ksorted.partition.bic". Default is 100.
`k.split`	scalar that indicates the number of partitions used to compute the exclusion frequency weights when `weight.type` is "exfrequency.random.partition.aic", "exfrequency.random.partition.bic", "exfrequency.kmeans.partition.aic", "exfrequency.kmeans.partition.bic","exfrequency.kquartiles.partition.aic", "exfrequency.kquartiles.partition.bic","exfrequency.ksorted.partition.aic","exfrequency.ksorted.partition.bic". Default is 4.
`step.direction`	character that indicates the direction of stepwise regression used to compute the exclusion frequency weights when `weight.type` is "exfrequency.random.partition.aic", "exfrequency.random.partition.bic", "exfrequency.kmeans.partition.aic", "exfrequency.kmeans.partition.bic","exfrequency.kquartiles.partition.aic", "exfrequency.kquartiles.partition.bic","exfrequency.ksorted.partition.aic","exfrequency.ksorted.partition.bic". One of "both", "forward" or "backward". Default is "backward".

weights: weights used in the weighted Lasso. Weights computed depend on weight.type selected.
weighted.lasso.results: variable selection results from the LASSO when the covariates are multiplied by weights as specified by weight.type.
threshold.selection: variable selection results when weights are below a specified threshold. Results are reported only when weight.type are "corr.pvalue","corr.bh.pvalue", "corr.qvalue","parcor.pvalue","parcor.bh.pvalue","parcor.qvalue".

Garcia, T.P. and M¨uller, S. (2016). Cox regression with exclusion frequency-based weights to identify neuroimaging markers relevant to Huntington’s disease onset. Annals of Applied Statistics, 10, 2130-2156.

Garcia, T.P. and M¨uller, S. (2014). Influence of measures of significance-based weights in the weighted Lasso. Journal of the Indian Society of Agricultural Statistics (Invited paper), 68, 131-144.

Garcia, T.P., Mueller, S., Carroll, R.J., Dunn, T.N., Thomas, A.P., Adams, S.H., Pillai, S.D., and Walzem, R.L. (2013). Structured variable selection with q-values. Biostatistics, DOI:10.1093/biostatistics/kxt012.

Efron, B., Hastie, T., Johnstone, I. AND Tibshirani, R. (2004). Least angle regression. Annals of Statistics 32, 407–499.

Garcia, T.P. and M¨uller, S. (2014). Influence of measures of significance-based weights in the weighted Lasso. Journal of the Indian Society of Agricultural Statistics (Invited paper), 68, 131-144.

Storey, J. D. and Tibshirani, R. (2003). Statistical significance for genomewide studies. Proceedings of the National Academy of Sciences 100, 9440-9445.

x = matrix(rnorm(100*5, 0, 1),100,5)
z = matrix(rbinom(100, 1, 0.5),100,1)
y = matrix(z[,1] + 2*x[,1] - 2*x[,2] + rnorm(100, 0, 1), 100)

dwl0 = d2wlasso(x,z,y)
dwl1 = d2wlasso(x,z=NULL,y,weight.type="corr.pvalue")
dwl2 = d2wlasso(x,z,y,weight.type="parcor.qvalue")
dwl3 = d2wlasso(x,z,y,weight.type="parcor.bh.pvalue")
dwl4 = d2wlasso(x,z,y,weight.type="parcor.qvalue",mult.cv.folds=100)
dwl5 = d2wlasso(x,z,y,weight.type="exfrequency.random.partition.aic")
dwl6 = d2wlasso(x,z,y,weight.type="exfrequency.kmeans.partition.aic")
dwl7 = d2wlasso(x,z,y,weight.type="exfrequency.kquartiles.partition.aic")
dwl8 = d2wlasso(x,z,y,weight.type="exfrequency.ksorted.partition.aic")

## Cox model
x = matrix(rnorm(100*5, 0, 1),100,5)
z = matrix(rbinom(100, 1, 0.5),100,1)
y = matrix(exp(z[,1] + 2*x[,1] - 2*x[,2] + rnorm(100, 0, 2)), 100)
cox.delta = matrix(1,nrow=length(y),ncol=1)
dwl0.cox = d2wlasso(x,z,y,cox.delta,regression.type="cox",penalty.choice="cv.mse")
dwl1.cox = d2wlasso(x,z=NULL,y,cox.delta,
  regression.type="cox",weight.type="corr.pvalue",penalty.choice="cv.mse")
dwl2.cox = d2wlasso(x,z,y,cox.delta,
  regression.type="cox",weight.type="parcor.qvalue",penalty.choice="cv.mse")
dwl3.cox = d2wlasso(x,z,y,cox.delta,
  regression.type="cox",weight.type="parcor.bh.pvalue",penalty.choice="cv.mse")
dwl4.cox = d2wlasso(x,z,y,cox.delta,
  regression.type="cox",weight.type="parcor.qvalue",
  mult.cv.folds=100,penalty.choice="cv.mse")
dwl5.cox = d2wlasso(x,z,y,cox.delta,regression.type="cox",
  weight.type="exfrequency.random.partition.aic",penalty.choice="cv.mse")
dwl6.cox = d2wlasso(x,z,y,cox.delta,regression.type="cox",
  weight.type="exfrequency.kmeans.partition.aic",penalty.choice="cv.mse")
dwl7.cox = d2wlasso(x,z,y,cox.delta,regression.type="cox",
  weight.type="exfrequency.kquartiles.partition.aic",penalty.choice="cv.mse")
dwl8.cox = d2wlasso(x,z,y,cox.delta,regression.type="cox",
  weight.type="exfrequency.ksorted.partition.aic",penalty.choice="cv.mse")