prioritylasso: Patient outcome prediction based on multi-omics data taking...

Description Usage Arguments Details Value Note Author(s) References See Also Examples

View source: R/prioritylasso.R

Description

Fits successive Lasso models for several ordered blocks of (omics) data and takes the predicted values as an offset for the next block.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
prioritylasso(
  X,
  Y,
  weights,
  family,
  type.measure,
  blocks,
  max.coef = NULL,
  block1.penalization = TRUE,
  lambda.type = "lambda.min",
  standardize = TRUE,
  nfolds = 10,
  foldid,
  cvoffset = FALSE,
  cvoffsetnfolds = 10,
  ...
)

Arguments

X

a (nxp) matrix of predictors with observations in rows and predictors in columns.

Y

n-vector giving the value of the response (either continuous, numeric-binary 0/1, or Surv object).

weights

observation weights. Default is 1 for each observation.

family

should be "gaussian" for continuous Y, "binomial" for binary Y, "cox" for Y of type Surv.

type.measure

accuracy/error measure computed in cross-validation. It should be "class" (classification error) or "auc" (area under the ROC curve) if family="binomial", "mse" (mean squared error) if family="gaussian" and "deviance" if family="cox" which uses the partial-likelihood.

blocks

list of the format list(bp1=...,bp2=...,), where the dots should be replaced by the indices of the predictors included in this block. The blocks should form a partition of 1:p.

max.coef

vector with integer values which specify the number of maximal coefficients for each block. The first entry is omitted if block1.penalization = FALSE. Default is NULL.

block1.penalization

whether the first block should be penalized. Default is TRUE.

lambda.type

specifies the value of lambda used for the predictions. lambda.min gives lambda with minimum cross-validated errors. lambda.1se gives the largest value of lambda such that the error is within 1 standard error of the minimum. Note that lambda.1se can only be chosen without restrictions of max.coef.

standardize

logical, whether the predictors should be standardized or not. Default is TRUE.

nfolds

the number of CV procedure folds.

foldid

an optional vector of values between 1 and nfold identifying what fold each observation is in.

cvoffset

logical, whether CV should be used to estimate the offsets. Default is FALSE.

cvoffsetnfolds

the number of folds in the CV procedure that is performed to estimate the offsets. Default is 10. Only relevant if cvoffset=TRUE.

...

other arguments that can be passed to the function cv.glmnet.

Details

For block1.penalization = TRUE, the function fits a Lasso model for each block. First, a standard Lasso for the first entry of blocks (block of priority 1) is fitted. The predictions are then taken as an offset in the Lasso fit of the block of priority 2, etc. For block1.penalization = FALSE, the function fits a model without penalty to the block of priority 1 (recommended as a block with clinical predictors where p < n). This is either a generalized linear model for family "gaussian" or "binomial", or a Cox model. The predicted values are then taken as an offset in the following Lasso fit of the block with priority 2, etc.

The first entry of blocks contains the indices of variables of the block with priority 1 (first block included in the model). Assume that blocks = list(1:100, 101:200, 201:300) then the block with priority 1 consists of the first 100 variables of the data matrix. Analogously, the block with priority 2 consists of the variables 101 to 200 and the block with priority 3 of the variables 201 to 300.

Value

object of class prioritylasso with the following elements. If these elements are lists, they contain the results for each penalized block.

lambda.ind

list with indices of lambda for lambda.type.

lambda.type

type of lambda which is used for the predictions.

lambda.min

list with values of lambda for lambda.type.

min.cvm

list with the mean cross-validated errors for lambda.type.

nzero

list with numbers of non-zero coefficients for lambda.type.

glmnet.fit

list of fitted glmnet objects.

name

a text string indicating type of measure.

block1unpen

if block1.penalization = FALSE, the results of either the fitted glm or coxph object corresponding to best.blocks.

coefficients

vector of estimated coefficients. If block1.penalization = FALSE and family = gaussian or binomial, the first entry contains an intercept.

call

the function call.

Note

The function description and the first example are based on the R package ipflasso. The second example is inspired by the example of cv.glmnet from the glmnet package.

Author(s)

Simon Klau, Roman Hornung, Alina Bauer
Maintainer: Simon Klau (simonklau@ibe.med.uni-muenchen.de)

References

Klau, S., Jurinovic, V., Hornung, R., Herold, T., Boulesteix, A.-L. (2018). Priority-Lasso: a simple hierarchical approach to the prediction of clinical outcome using multi-omics data. BMC Bioinformatics 19, 322

See Also

pl_data, cvm_prioritylasso, cvr.ipflasso, cvr2.ipflasso

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# gaussian
  prioritylasso(X = matrix(rnorm(50*500),50,500), Y = rnorm(50), family = "gaussian",
                type.measure = "mse", blocks = list(bp1=1:75, bp2=76:200, bp3=201:500),
                max.coef = c(Inf,8,5), block1.penalization = TRUE,
                lambda.type = "lambda.min", standardize = TRUE, nfolds = 5, cvoffset = FALSE)
## Not run: 
  # cox
  # simulation of survival data:
  n <- 50;p <- 300
  nzc <- trunc(p/10)
  x <- matrix(rnorm(n*p), n, p)
  beta <- rnorm(nzc)
  fx <- x[, seq(nzc)]%*%beta/3
  hx <- exp(fx)
  # survival times:
  ty <- rexp(n,hx)
  # censoring indicator:
  tcens <- rbinom(n = n,prob = .3,size = 1)
  library(survival)
  y <- Surv(ty, 1-tcens)
  blocks <- list(bp1=1:20, bp2=21:200, bp3=201:300)
  # run prioritylasso:
  prioritylasso(x, y, family = "cox", type.measure = "deviance", blocks = blocks,
                block1.penalization = TRUE, lambda.type = "lambda.min", standardize = TRUE,
                nfolds = 5)

  # binomial
  # using pl_data:
  prioritylasso(X = pl_data[,1:1028], Y = pl_data[,1029], family = "binomial", type.measure = "auc",
                blocks = list(bp1=1:4, bp2=5:9, bp3=10:28, bp4=29:1028), standardize = FALSE)
## End(Not run)

prioritylasso documentation built on Jan. 13, 2021, 8:45 p.m.