View source: R/prioritylasso.R
prioritylasso | R Documentation |
Fits successive Lasso models for several ordered blocks of (omics) data and takes the predicted values as an offset for the next block.
prioritylasso(
X,
Y,
weights,
family = c("gaussian", "binomial", "cox"),
type.measure,
blocks,
max.coef = NULL,
block1.penalization = TRUE,
lambda.type = "lambda.min",
standardize = TRUE,
nfolds = 10,
foldid,
cvoffset = FALSE,
cvoffsetnfolds = 10,
mcontrol = missing.control(),
scale.y = FALSE,
return.x = TRUE,
...
)
X |
a (nxp) matrix of predictors with observations in rows and predictors in columns. |
Y |
n-vector giving the value of the response (either continuous, numeric-binary 0/1, or |
weights |
observation weights. Default is 1 for each observation. |
family |
should be "gaussian" for continuous |
type.measure |
accuracy/error measure computed in cross-validation. It should be "class" (classification error) or "auc" (area under the ROC curve) if |
blocks |
list of the format |
max.coef |
vector with integer values which specify the number of maximal coefficients for each block. The first entry is omitted if |
block1.penalization |
whether the first block should be penalized. Default is TRUE. |
lambda.type |
specifies the value of lambda used for the predictions. |
standardize |
logical, whether the predictors should be standardized or not. Default is TRUE. |
nfolds |
the number of CV procedure folds. |
foldid |
an optional vector of values between 1 and nfold identifying what fold each observation is in. |
cvoffset |
logical, whether CV should be used to estimate the offsets. Default is FALSE. |
cvoffsetnfolds |
the number of folds in the CV procedure that is performed to estimate the offsets. Default is 10. Only relevant if |
mcontrol |
controls how to deal with blockwise missing data. For details see below or |
scale.y |
determines if y gets scaled before passed to glmnet. Can only be used for |
return.x |
logical, determines if the input data should be returned by |
... |
other arguments that can be passed to the function |
For block1.penalization = TRUE
, the function fits a Lasso model for each block. First, a standard Lasso for the first entry of blocks
(block of priority 1) is fitted.
The predictions are then taken as an offset in the Lasso fit of the block of priority 2, etc.
For block1.penalization = FALSE
, the function fits a model without penalty to the block of priority 1 (recommended as a block with clinical predictors where p < n
).
This is either a generalized linear model for family "gaussian" or "binomial", or a Cox model. The predicted values are then taken as an offset in the following Lasso fit of the block with priority 2, etc.
The first entry of blocks
contains the indices of variables of the block with priority 1 (first block included in the model).
Assume that blocks = list(1:100, 101:200, 201:300)
then the block with priority 1 consists of the first 100 variables of the data matrix.
Analogously, the block with priority 2 consists of the variables 101 to 200 and the block with priority 3 of the variables 201 to 300.
standardize = TRUE
leads to a standardisation of the covariables (X
) in glmnet
which is recommend by glmnet
.
In case of an unpenalized first block, the covariables for the first block are not standardized.
Please note that the returned coefficients are rescaled to the original scale of the covariates as provided in X
.
Therefore, new data in predict.prioritylasso
should be on the same scale as X
.
To use the method with blockwise missing data, one can set handle.missingdata = ignore
.
Then, to calculate the coefficients for a given block only the observations with values for this blocks are used.
For the observations with missing values, the result from the previous block is used as the offset for the next block.
Crossvalidated offsets are not supported with handle.missingdata = ignore
.
Please note that dealing with single missing values is not supported.
Normally, every observation gets a unique foldid which stays the same across all blocks for the call to cv.glmnet
.
However when handle.missingdata != none
, the foldid is set new for every block.
object of class prioritylasso
with the following elements. If these elements are lists, they contain the results for each penalized block.
lambda.ind
list with indices of lambda for lambda.type
.
lambda.type
type of lambda which is used for the predictions.
lambda.min
list with values of lambda for lambda.type
.
min.cvm
list with the mean cross-validated errors for lambda.type
.
nzero
list with numbers of non-zero coefficients for lambda.type
.
glmnet.fit
list of fitted glmnet
objects.
name
a text string indicating type of measure.
block1unpen
if block1.penalization = FALSE
, the results of either the fitted glm
or coxph
object corresponding to best.blocks
.
coefficients
vector of estimated coefficients. If block1.penalization = FALSE
and family = gaussian
or binomial
, the first entry contains an intercept.
call
the function call.
X
the original data used for the calculation or NA
if return.x = FALSE
missing.data
list with logical entries for every block which observation is missing (TRUE
means missing)
imputation.models
if handle.missingdata = "impute.offsets"
, it contains the used imputation models
blocks.used.for.imputation
if handle.missingdata = "impute.offsets"
, it contains the blocks which were used for the imputation model for every block
y.scale.param
if scale.y = TRUE
, then it contains the mean and sd used for scaling.
blocks
list with the description which variables belong to which block
mcontrol
the missing control settings used
family
the family of the fitted data
dim.x
the dimension of the used training data
The function description and the first example are based on the R package ipflasso
. The second example is inspired by the example of cv.glmnet
from the glmnet
package.
Simon Klau, Roman Hornung, Alina Bauer
Maintainer: Roman Hornung (hornung@ibe.med.uni-muenchen.de)
Klau, S., Jurinovic, V., Hornung, R., Herold, T., Boulesteix, A.-L. (2018). Priority-Lasso: a simple hierarchical approach to the prediction of clinical outcome using multi-omics data. BMC Bioinformatics 19, 322
pl_data
, cvm_prioritylasso
, cvr.ipflasso
, cvr2.ipflasso
, missing.control
# gaussian
prioritylasso(X = matrix(rnorm(50*500),50,500), Y = rnorm(50), family = "gaussian",
type.measure = "mse", blocks = list(bp1=1:75, bp2=76:200, bp3=201:500),
max.coef = c(Inf,8,5), block1.penalization = TRUE,
lambda.type = "lambda.min", standardize = TRUE, nfolds = 5, cvoffset = FALSE)
## Not run:
# cox
# simulation of survival data:
n <- 50;p <- 300
nzc <- trunc(p/10)
x <- matrix(rnorm(n*p), n, p)
beta <- rnorm(nzc)
fx <- x[, seq(nzc)]%*%beta/3
hx <- exp(fx)
# survival times:
ty <- rexp(n,hx)
# censoring indicator:
tcens <- rbinom(n = n,prob = .3,size = 1)
library(survival)
y <- Surv(ty, 1-tcens)
blocks <- list(bp1=1:20, bp2=21:200, bp3=201:300)
# run prioritylasso:
prioritylasso(x, y, family = "cox", type.measure = "deviance", blocks = blocks,
block1.penalization = TRUE, lambda.type = "lambda.min", standardize = TRUE,
nfolds = 5)
# binomial
# using pl_data:
prioritylasso(X = pl_data[,1:1028], Y = pl_data[,1029], family = "binomial", type.measure = "auc",
blocks = list(bp1=1:4, bp2=5:9, bp3=10:28, bp4=29:1028), standardize = FALSE)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.