prioritylasso: Patient outcome prediction based on multi-omics data taking...
In prioritylasso: Analyzing Multiple Omics Data with an Offset Approach

prioritylasso

R Documentation

Patient outcome prediction based on multi-omics data taking practitioners' preferences into account

Description

Fits successive Lasso models for several ordered blocks of (omics) data and takes the predicted values as an offset for the next block.

Usage

prioritylasso(
  X,
  Y,
  weights,
  family = c("gaussian", "binomial", "cox"),
  type.measure,
  blocks,
  max.coef = NULL,
  block1.penalization = TRUE,
  lambda.type = "lambda.min",
  standardize = TRUE,
  nfolds = 10,
  foldid,
  cvoffset = FALSE,
  cvoffsetnfolds = 10,
  mcontrol = missing.control(),
  scale.y = FALSE,
  return.x = TRUE,
  ...
)

Arguments

`X`	a (nxp) matrix of predictors with observations in rows and predictors in columns.
`Y`	n-vector giving the value of the response (either continuous, numeric-binary 0/1, or `Surv` object).
`weights`	observation weights. Default is 1 for each observation.
`family`	should be "gaussian" for continuous `Y`, "binomial" for binary `Y`, "cox" for `Y` of type `Surv`.
`type.measure`	accuracy/error measure computed in cross-validation. It should be "class" (classification error) or "auc" (area under the ROC curve) if `family="binomial"`, "mse" (mean squared error) if `family="gaussian"` and "deviance" if `family="cox"` which uses the partial-likelihood.
`blocks`	list of the format `list(bp1=...,bp2=...,)`, where the dots should be replaced by the indices of the predictors included in this block. The blocks should form a partition of 1:p.
`max.coef`	vector with integer values which specify the number of maximal coefficients for each block. The first entry is omitted if `block1.penalization = FALSE`. Default is `NULL`.
`block1.penalization`	whether the first block should be penalized. Default is TRUE.
`lambda.type`	specifies the value of lambda used for the predictions. `lambda.min` gives lambda with minimum cross-validated errors. `lambda.1se` gives the largest value of lambda such that the error is within 1 standard error of the minimum. Note that `lambda.1se` can only be chosen without restrictions of `max.coef`.
`standardize`	logical, whether the predictors should be standardized or not. Default is TRUE.
`nfolds`	the number of CV procedure folds.
`foldid`	an optional vector of values between 1 and nfold identifying what fold each observation is in.
`cvoffset`	logical, whether CV should be used to estimate the offsets. Default is FALSE.
`cvoffsetnfolds`	the number of folds in the CV procedure that is performed to estimate the offsets. Default is 10. Only relevant if `cvoffset=TRUE`.
`mcontrol`	controls how to deal with blockwise missing data. For details see below or `missing.control`.
`scale.y`	determines if y gets scaled before passed to glmnet. Can only be used for `family = 'gaussian'`.
`return.x`	logical, determines if the input data should be returned by `prioritylasso`. Default is `TRUE`.
`...`	other arguments that can be passed to the function `cv.glmnet`.

Details

For block1.penalization = TRUE, the function fits a Lasso model for each block. First, a standard Lasso for the first entry of blocks (block of priority 1) is fitted. The predictions are then taken as an offset in the Lasso fit of the block of priority 2, etc. For block1.penalization = FALSE, the function fits a model without penalty to the block of priority 1 (recommended as a block with clinical predictors where p < n). This is either a generalized linear model for family "gaussian" or "binomial", or a Cox model. The predicted values are then taken as an offset in the following Lasso fit of the block with priority 2, etc.

The first entry of blocks contains the indices of variables of the block with priority 1 (first block included in the model). Assume that blocks = list(1:100, 101:200, 201:300) then the block with priority 1 consists of the first 100 variables of the data matrix. Analogously, the block with priority 2 consists of the variables 101 to 200 and the block with priority 3 of the variables 201 to 300.

standardize = TRUE leads to a standardisation of the covariables (X) in glmnet which is recommend by glmnet. In case of an unpenalized first block, the covariables for the first block are not standardized. Please note that the returned coefficients are rescaled to the original scale of the covariates as provided in X. Therefore, new data in predict.prioritylasso should be on the same scale as X.

To use the method with blockwise missing data, one can set handle.missingdata = ignore. Then, to calculate the coefficients for a given block only the observations with values for this blocks are used. For the observations with missing values, the result from the previous block is used as the offset for the next block. Crossvalidated offsets are not supported with handle.missingdata = ignore. Please note that dealing with single missing values is not supported. Normally, every observation gets a unique foldid which stays the same across all blocks for the call to cv.glmnet. However when handle.missingdata != none, the foldid is set new for every block.

Value

object of class prioritylasso with the following elements. If these elements are lists, they contain the results for each penalized block.

lambda.ind: list with indices of lambda for lambda.type.
lambda.type: type of lambda which is used for the predictions.
lambda.min: list with values of lambda for lambda.type.
min.cvm: list with the mean cross-validated errors for lambda.type.
nzero: list with numbers of non-zero coefficients for lambda.type.
glmnet.fit: list of fitted glmnet objects.
name: a text string indicating type of measure.
block1unpen: if block1.penalization = FALSE, the results of either the fitted glm or coxph object corresponding to best.blocks.
coefficients: vector of estimated coefficients. If block1.penalization = FALSE and family = gaussian or binomial, the first entry contains an intercept.
call: the function call.
X: the original data used for the calculation or NA if return.x = FALSE
missing.data: list with logical entries for every block which observation is missing (TRUE means missing)
imputation.models: if handle.missingdata = "impute.offsets", it contains the used imputation models
blocks.used.for.imputation: if handle.missingdata = "impute.offsets", it contains the blocks which were used for the imputation model for every block
y.scale.param: if scale.y = TRUE, then it contains the mean and sd used for scaling.
blocks: list with the description which variables belong to which block
mcontrol: the missing control settings used
family: the family of the fitted data
dim.x: the dimension of the used training data

Note

The function description and the first example are based on the R package ipflasso. The second example is inspired by the example of cv.glmnet from the glmnet package.

Author(s)

Simon Klau, Roman Hornung, Alina Bauer
Maintainer: Roman Hornung (hornung@ibe.med.uni-muenchen.de)

References

Klau, S., Jurinovic, V., Hornung, R., Herold, T., Boulesteix, A.-L. (2018). Priority-Lasso: a simple hierarchical approach to the prediction of clinical outcome using multi-omics data. BMC Bioinformatics 19, 322

Examples

# gaussian
  prioritylasso(X = matrix(rnorm(50*500),50,500), Y = rnorm(50), family = "gaussian",
                type.measure = "mse", blocks = list(bp1=1:75, bp2=76:200, bp3=201:500),
                max.coef = c(Inf,8,5), block1.penalization = TRUE,
                lambda.type = "lambda.min", standardize = TRUE, nfolds = 5, cvoffset = FALSE)
## Not run: 
  # cox
  # simulation of survival data:
  n <- 50;p <- 300
  nzc <- trunc(p/10)
  x <- matrix(rnorm(n*p), n, p)
  beta <- rnorm(nzc)
  fx <- x[, seq(nzc)]%*%beta/3
  hx <- exp(fx)
  # survival times:
  ty <- rexp(n,hx)
  # censoring indicator:
  tcens <- rbinom(n = n,prob = .3,size = 1)
  library(survival)
  y <- Surv(ty, 1-tcens)
  blocks <- list(bp1=1:20, bp2=21:200, bp3=201:300)
  # run prioritylasso:
  prioritylasso(x, y, family = "cox", type.measure = "deviance", blocks = blocks,
                block1.penalization = TRUE, lambda.type = "lambda.min", standardize = TRUE,
                nfolds = 5)

  # binomial
  # using pl_data:
  prioritylasso(X = pl_data[,1:1028], Y = pl_data[,1029], family = "binomial", type.measure = "auc",
                blocks = list(bp1=1:4, bp2=5:9, bp3=10:28, bp4=29:1028), standardize = FALSE)
## End(Not run)

prioritylasso documentation built on April 11, 2023, 6:02 p.m.