prioritylasso: Patient outcome prediction based on multi-omics data taking...

View source: R/prioritylasso.R

prioritylassoR Documentation

Patient outcome prediction based on multi-omics data taking practitioners' preferences into account

Description

Fits successive Lasso models for several ordered blocks of (omics) data and takes the predicted values as an offset for the next block.

Usage

prioritylasso(
  X,
  Y,
  weights,
  family = c("gaussian", "binomial", "cox"),
  type.measure,
  blocks,
  max.coef = NULL,
  block1.penalization = TRUE,
  lambda.type = "lambda.min",
  standardize = TRUE,
  nfolds = 10,
  foldid,
  cvoffset = FALSE,
  cvoffsetnfolds = 10,
  mcontrol = missing.control(),
  scale.y = FALSE,
  return.x = TRUE,
  ...
)

Arguments

X

a (nxp) matrix of predictors with observations in rows and predictors in columns.

Y

n-vector giving the value of the response (either continuous, numeric-binary 0/1, or Surv object).

weights

observation weights. Default is 1 for each observation.

family

should be "gaussian" for continuous Y, "binomial" for binary Y, "cox" for Y of type Surv.

type.measure

accuracy/error measure computed in cross-validation. It should be "class" (classification error) or "auc" (area under the ROC curve) if family="binomial", "mse" (mean squared error) if family="gaussian" and "deviance" if family="cox" which uses the partial-likelihood.

blocks

list of the format list(bp1=...,bp2=...,), where the dots should be replaced by the indices of the predictors included in this block. The blocks should form a partition of 1:p.

max.coef

vector with integer values which specify the number of maximal coefficients for each block. The first entry is omitted if block1.penalization = FALSE. Default is NULL.

block1.penalization

whether the first block should be penalized. Default is TRUE.

lambda.type

specifies the value of lambda used for the predictions. lambda.min gives lambda with minimum cross-validated errors. lambda.1se gives the largest value of lambda such that the error is within 1 standard error of the minimum. Note that lambda.1se can only be chosen without restrictions of max.coef.

standardize

logical, whether the predictors should be standardized or not. Default is TRUE.

nfolds

the number of CV procedure folds.

foldid

an optional vector of values between 1 and nfold identifying what fold each observation is in.

cvoffset

logical, whether CV should be used to estimate the offsets. Default is FALSE.

cvoffsetnfolds

the number of folds in the CV procedure that is performed to estimate the offsets. Default is 10. Only relevant if cvoffset=TRUE.

mcontrol

controls how to deal with blockwise missing data. For details see below or missing.control.

scale.y

determines if y gets scaled before passed to glmnet. Can only be used for family = 'gaussian'.

return.x

logical, determines if the input data should be returned by prioritylasso. Default is TRUE.

...

other arguments that can be passed to the function cv.glmnet.

Details

For block1.penalization = TRUE, the function fits a Lasso model for each block. First, a standard Lasso for the first entry of blocks (block of priority 1) is fitted. The predictions are then taken as an offset in the Lasso fit of the block of priority 2, etc. For block1.penalization = FALSE, the function fits a model without penalty to the block of priority 1 (recommended as a block with clinical predictors where p < n). This is either a generalized linear model for family "gaussian" or "binomial", or a Cox model. The predicted values are then taken as an offset in the following Lasso fit of the block with priority 2, etc.

The first entry of blocks contains the indices of variables of the block with priority 1 (first block included in the model). Assume that blocks = list(1:100, 101:200, 201:300) then the block with priority 1 consists of the first 100 variables of the data matrix. Analogously, the block with priority 2 consists of the variables 101 to 200 and the block with priority 3 of the variables 201 to 300.

standardize = TRUE leads to a standardisation of the covariables (X) in glmnet which is recommend by glmnet. In case of an unpenalized first block, the covariables for the first block are not standardized. Please note that the returned coefficients are rescaled to the original scale of the covariates as provided in X. Therefore, new data in predict.prioritylasso should be on the same scale as X.

To use the method with blockwise missing data, one can set handle.missingdata = ignore. Then, to calculate the coefficients for a given block only the observations with values for this blocks are used. For the observations with missing values, the result from the previous block is used as the offset for the next block. Crossvalidated offsets are not supported with handle.missingdata = ignore. Please note that dealing with single missing values is not supported. Normally, every observation gets a unique foldid which stays the same across all blocks for the call to cv.glmnet. However when handle.missingdata != none, the foldid is set new for every block.

Value

object of class prioritylasso with the following elements. If these elements are lists, they contain the results for each penalized block.

lambda.ind

list with indices of lambda for lambda.type.

lambda.type

type of lambda which is used for the predictions.

lambda.min

list with values of lambda for lambda.type.

min.cvm

list with the mean cross-validated errors for lambda.type.

nzero

list with numbers of non-zero coefficients for lambda.type.

glmnet.fit

list of fitted glmnet objects.

name

a text string indicating type of measure.

block1unpen

if block1.penalization = FALSE, the results of either the fitted glm or coxph object corresponding to best.blocks.

coefficients

vector of estimated coefficients. If block1.penalization = FALSE and family = gaussian or binomial, the first entry contains an intercept.

call

the function call.

X

the original data used for the calculation or NA if return.x = FALSE

missing.data

list with logical entries for every block which observation is missing (TRUE means missing)

imputation.models

if handle.missingdata = "impute.offsets", it contains the used imputation models

blocks.used.for.imputation

if handle.missingdata = "impute.offsets", it contains the blocks which were used for the imputation model for every block

y.scale.param

if scale.y = TRUE, then it contains the mean and sd used for scaling.

blocks

list with the description which variables belong to which block

mcontrol

the missing control settings used

family

the family of the fitted data

dim.x

the dimension of the used training data

Note

The function description and the first example are based on the R package ipflasso. The second example is inspired by the example of cv.glmnet from the glmnet package.

Author(s)

Simon Klau, Roman Hornung, Alina Bauer
Maintainer: Roman Hornung (hornung@ibe.med.uni-muenchen.de)

References

Klau, S., Jurinovic, V., Hornung, R., Herold, T., Boulesteix, A.-L. (2018). Priority-Lasso: a simple hierarchical approach to the prediction of clinical outcome using multi-omics data. BMC Bioinformatics 19, 322

See Also

pl_data, cvm_prioritylasso, cvr.ipflasso, cvr2.ipflasso, missing.control

Examples

# gaussian
  prioritylasso(X = matrix(rnorm(50*500),50,500), Y = rnorm(50), family = "gaussian",
                type.measure = "mse", blocks = list(bp1=1:75, bp2=76:200, bp3=201:500),
                max.coef = c(Inf,8,5), block1.penalization = TRUE,
                lambda.type = "lambda.min", standardize = TRUE, nfolds = 5, cvoffset = FALSE)
## Not run: 
  # cox
  # simulation of survival data:
  n <- 50;p <- 300
  nzc <- trunc(p/10)
  x <- matrix(rnorm(n*p), n, p)
  beta <- rnorm(nzc)
  fx <- x[, seq(nzc)]%*%beta/3
  hx <- exp(fx)
  # survival times:
  ty <- rexp(n,hx)
  # censoring indicator:
  tcens <- rbinom(n = n,prob = .3,size = 1)
  library(survival)
  y <- Surv(ty, 1-tcens)
  blocks <- list(bp1=1:20, bp2=21:200, bp3=201:300)
  # run prioritylasso:
  prioritylasso(x, y, family = "cox", type.measure = "deviance", blocks = blocks,
                block1.penalization = TRUE, lambda.type = "lambda.min", standardize = TRUE,
                nfolds = 5)

  # binomial
  # using pl_data:
  prioritylasso(X = pl_data[,1:1028], Y = pl_data[,1029], family = "binomial", type.measure = "auc",
                blocks = list(bp1=1:4, bp2=5:9, bp3=10:28, bp4=29:1028), standardize = FALSE)
## End(Not run)


prioritylasso documentation built on April 11, 2023, 6:02 p.m.