createObjectMApath: Creation of the object to use in gene set enrichment...

View source: R/createObjectMApath.R

createObjectMApathR Documentation

Creation of the object to use in gene set enrichment meta-analysis

Description

It allows the creation of an object to perform gene set enrichment meta-analysis.

Usage

createObjectMApath(
  listEX,
  listPheno = NULL,
  namePheno = c(rep(1, length(listEX))),
  expGroups = c(rep(1, length(listEX))),
  refGroups = c(rep(2, length(listEX))),
  geneSets,
  pathMethod = c("GSVA", "Zscore", "ssGSEA", "Singscore"),
  minSize = 7,
  kcdf = "Gaussian",
  normalize = TRUE,
  n.cores = 1,
  internal.n.cores = 1
)

Arguments

listEX

A list of dataframes or matrix (genes in rows and sample in columns). A list of ExpressionSets can be used too

listPheno

A list of phenodatas (dataframes or matrix). If the object listEX is a list of ExpressionSets this element can be null.

namePheno

A list or vector of the different colunm names or positions from the phenodatas where the experimental and reference groups are identified. Each element of namePheno correspont to its equivalent element in the listPheno (default a vector of 1, all the first columns of each elements of listPheno are selected).

expGroups

A list of vectors or a vector containing the names or the positions with which we identify the elements of the experiment groups (cases) of the namePheno element (default a vector of 1, all the first groups are selected)

refGroups

A list of vectors or a vector containing the names or the positions with which we identify the elements of the reference groups (control) of the namePheno elements (default a vector of 1, all the first groups are selected)

geneSets

List of gene sets to check. Object similar to the one used in the fgsea package

pathMethod

The single sample enrichment method used to obtain the enrichment score of each sample and gene set. See details for more information

minSize

Minimum size of the resulting gene sets after gene identifier mapping. By default, the minimum size is 7.

kcdf

Only neccesary for the GSVA method. Character vector of length 1 denoting the kernel to use during the non-parametric estimation of the cumulative distribution function of expression levels across samples. By default, kcdf="Gaussian" which is suitable when input expression values are continuous, such as microarray fluorescent units in logarithmic scale, RNA-seq log-CPMs, log-RPKMs or log-TPMs. When input expression values are integer counts, such as those derived from RNA-seq experiments, then this argument should be set to kcdf="Poisson".

normalize

boolean specifying if the gen set matrices should be normalized. Default value "TRUE".

n.cores

Number of cores to use in the parallelization of the datsets. By default, n.cores=1.

internal.n.cores

Number of cores to use in the parallelization of the single sample enrichment methods. By default internal.n.cores= 1.

Details

The single sample scoring methods that can be used to obtain the enrichment score of each sample and gene set are:

  1. "GSVA": Gene Set Variation method (Hänzelmann S, 2013)

  2. "Zscore": Z-score method (Lee E, 2008)

  3. "ssGSEA": Single Sample Gene Set Enrichment Analysis method (Barbie DA, 2009)

  4. "Singscore": Single sample scoring of molecular phenotypes (Foroutan M, 2018)

In parallelization, several aspects must be considered. n.cores refers to the parallelization of studies or datasets. Therefore, if we have 3 studies, the maximum number for n.cores will be 3. internal.n.cores refers to the parallelization of single sample enrichment methods. This is especially recommended for the ssGSEA method. For Singscore and GSVA, it may also be advisable. The process is parallelized based on the samples in each study. Therefore, the larger the number of samples, the slower the process will be. The number of cores that the computer will use is the multiplication of both parameters n.cores * internal.n.cores = total cores.

Value

The object needed to perform gene set enrichment meta-analysis. Each list contains two elements: The first element is the gene set matrix (gene sets in rows and samples in columns) The second element is a vector of zeros and ones that represents the state of the diffenrent samples of the gene sets matrix. 0 represents reference group (controls) and 1 represents experimental group (cases).

References

Hänzelmann S, Castelo R, Guinney J. (2013) GSVA: gene set variation analysis for microarray and RNA-Seq data. BMC Bioinformatics. 2013;14: 7. doi:10.1186/1471-2105-14-7

Lee E, Chuang H-Y, Kim J-W, Ideker T, Lee D. (2008) Inferring Pathway Activity toward Precise Disease Classification. PLOS Computational Biology. 2008;4: e1000217. doi:10.1371/journal.pcbi.1000217

Barbie DA, Tamayo P, Boehm JS, Kim SY, Moody SE, Dunn IF, et al. (2009) Systematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK1. Nature. 2009;462: 108–112. doi:10.1038/nature08460

Foroutan M, Bhuva DD, Lyu R, Horan K, Cursons J, Davis MJ. (2018) Single sample scoring of molecular phenotypes. BMC Bioinformatics. 2018;19: 404. doi:10.1186/s12859-018-2435-4

Korotkevich G, Sukhov V, Budin N, Shpak B, Artyomov MN, Sergushichev A. (2021) Fast gene set enrichment analysis. bioRxiv; 2021. p. 060012. doi:10.1101/060012

Examples


data("simulatedData")
listMatrices <- list(study1Ex, study2Ex)
listPhenodata <- list(study1Pheno, study2Pheno)
phenoGroups <- c("Condition","Condition")
phenoCases <- list("Case", "Case")
phenoControls <- list("Healthy", "Healthy")
objectMApathSim <- createObjectMApath(
   listEX = listMatrices,
   listPheno = listPhenodata, namePheno = phenoGroups,
   expGroups = phenoCases, refGroups = phenoControls,
   geneSets = GeneSets,
   pathMethod = "Zscore")




GSEMA documentation built on Oct. 14, 2024, 5:09 p.m.