modelGeneExpression: Gene expression modeling pipeline
In bkaczkowski/xcore: xcore expression regulators inference

modelGeneExpression

R Documentation

Gene expression modeling pipeline

Description

modelGeneExpression uses parallelization if parallel backend is registered. For that reason we advise against passing parallel argument to internally called cv.glmnet routine.

Usage

modelGeneExpression(
  mae,
  yname = "Y",
  uname = "U",
  xnames,
  design = NULL,
  standardize = TRUE,
  parallel = FALSE,
  pvalues = TRUE,
  precalcmodels = NULL,
  ...
)

Arguments

`mae`	MultiAssayExperiment object such as produced by `prepareCountsForRegression`.
`yname`	string indicating experiment in `mae` to use as the expression input.
`uname`	string indicating experiment in `mae` to use as the basal expression level.
`xnames`	character indicating experiments in `mae` to use as molecular signatures.
`design`	matrix giving the design matrix for the samples. Default (`NULL`) is to use design found in `mae` metadata. Columns corresponds to samples groups and rows to samples names. Only samples included in the design will be processed.
`standardize`	logical flag indicating if the molecular signatures should be scaled. Advised to be set to `TRUE`.
`parallel`	parallel argument to internally used `cv.glmnet` function. Advised to be set to `FALSE` as it might interfere with parallelization used in `modelGeneExpression`.
`pvalues`	logical flag indicating if significance testing for the estimated molecular signatures activities should be performed.
`precalcmodels`	optional list of precomputed `'cv.glmnet'` objects for each molecular signature and sample. The elements of this list should be matching the `xnames` vector. Each of those elements should be a named list holding `'cv.glmnet'` objects for each sample. If provided those models will be used instead of running regression from scratch.
`...`	arguments passed to glmnet::cv.glmnet.

Details

For speeding up the calculations consider lowering number of folds used in internally run cv.glmnet by specifying nfolds argument. By default 10 fold cross validation is used.

The relationship between the expression (Y) and molecular signatures (X) is described using linear model formulation. The pipeline attempts to model the change in expression between basal expression level (u) and each sample, with the goal of finding the unknown molecular signatures activities. Linear models are fit using popular ridge regression implementation glmnet (Friedman, Hastie, and Tibshirani 2010).

If pvalues is set to TRUE the significance of the estimated molecular signatures activities is tested using methodology introduced by (Cule, Vineis, and De Iorio 2011) which original implementation can be found in ridge-package.

If replicates are available the signatures activities estimates and their standard error estimates can be combined. This is done by averaging signatures activities estimates and pooling their significance estimates using Stouffer's method for the Z-scores and Fisher's method for the p-values.

For detailed pipeline description we refer interested user to paper accompanying this package.

Value

Nested list with following elements

regression_models: Named list with elements corresponding to signatures specified in xnames. Each of these is a list holding 'cv.glmnet' objects corresponding to each sample.
pvalues: Named list with elements corresponding to signatures specified in xnames. Each of these is a list holding data.frame of signature's p-values and test statistics estimated for each sample.
zscore_avg: Named list with elements corresponding to signatures specified in xnames. Each of these is a matrix holding replicate average Z-scores with columns corresponding to groups in the design.
coef_avg: Named list with elements corresponding to signatures specified in xnames. Each of these is a matrix holding replicate averaged signatures activities with columns corresponding to groups in the design.
results: Named list of a data.frames holding replicate average molecular signatures, overall molecular signatures Z-score and p-values calculated over groups using Stouffer's and Fisher's methods.

Examples

data("rinderpest_mini", "remap_mini")
base_lvl <- "00hr"
design <- matrix(
  data = c(1, 0, 0,
           1, 0, 0,
           1, 0, 0,
           0, 1, 0,
           0, 1, 0,
           0, 1, 0,
           0, 0, 1,
           0, 0, 1,
           0, 0, 1),
  ncol = 3,
  nrow = 9,
  byrow = TRUE,
  dimnames = list(colnames(rinderpest_mini), c("00hr", "12hr", "24hr")))
mae <- prepareCountsForRegression(
  counts = rinderpest_mini,
  design = design,
  base_lvl = base_lvl)
mae <- addSignatures(mae, remap = remap_mini)
mae <- filterSignatures(mae)
res <- modelGeneExpression(
  mae = mae,
  xnames = "remap",
  nfolds = 5)

bkaczkowski/xcore documentation built on Jan. 26, 2024, 6:24 p.m.