TCGAanalyze_DEA: Differential expression analysis (DEA) using edgeR or limma...

Description Usage Arguments Value Examples

Description

TCGAanalyze_DEA allows user to perform Differentially expression analysis (DEA), using edgeR package or limma to identify differentially expressed genes (DEGs). It is possible to do a two-class analysis.

TCGAanalyze_DEA performs DEA using following functions from edgeR:

  1. edgeR::DGEList converts the count matrix into an edgeR object.

  2. edgeR::estimateCommonDisp each gene gets assigned the same dispersion estimate.

  3. edgeR::exactTest performs pair-wise tests for differential expression between two groups.

  4. edgeR::topTags takes the output from exactTest(), adjusts the raw p-values using the False Discovery Rate (FDR) correction, and returns the top differentially expressed genes.

TCGAanalyze_DEA performs DEA using following functions from limma:

  1. limma::makeContrasts construct matrix of custom contrasts.

  2. limma::lmFit Fit linear model for each gene given a series of arrays.

  3. limma::contrasts.fit Given a linear model fit to microarray data, compute estimated coefficients and standard errors for a given set of contrasts.

  4. limma::eBayes Given a microarray linear model fit, compute moderated t-statistics, moderated F-statistic, and log-odds of differential expression by empirical Bayes moderation of the standard errors towards a common value.

  5. limma::toptable Extract a table of the top-ranked genes from a linear model fit.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
TCGAanalyze_DEA(
  mat1,
  mat2,
  metadata = TRUE,
  Cond1type,
  Cond2type,
  pipeline = "edgeR",
  method = "exactTest",
  fdr.cut = 1,
  logFC.cut = 0,
  elementsRatio = 30000,
  batch.factors = NULL,
  ClinicalDF = data.frame(),
  paired = FALSE,
  log.trans = FALSE,
  voom = FALSE,
  trend = FALSE,
  MAT = data.frame(),
  contrast.formula = "",
  Condtypes = c()
)

Arguments

mat1

numeric matrix, each row represents a gene, each column represents a sample with Cond1type

mat2

numeric matrix, each row represents a gene, each column represents a sample with Cond2type

metadata

Add metadata

Cond1type

a string containing the class label of the samples in mat1 (e.g., control group)

Cond2type

a string containing the class label of the samples in mat2 (e.g., case group)

pipeline

a string to specify which package to use ("limma" or "edgeR")

method

is 'glmLRT' (1) or 'exactTest' (2) used for edgeR (1) Fit a negative binomial generalized log-linear model to the read counts for each gene (2) Compute genewise exact tests for differences in the means between two groups of negative-binomially distributed counts.

fdr.cut

is a threshold to filter DEGs according their p-value corrected

logFC.cut

is a threshold to filter DEGs according their logFC

elementsRatio

is number of elements processed for second for time consumation estimation

batch.factors

a vector containing strings to specify options for batch correction. Options are "Plate", "TSS", "Year", "Portion", "Center", and "Patients"

ClinicalDF

a dataframe returned by GDCquery_clinic() to be used to extract year data

paired

boolean to account for paired or non-paired samples. Set to TRUE for paired case

log.trans

boolean to perform log cpm transformation. Set to TRUE for log transformation

voom

boolean to perform voom transformation for limma-voom pipeline. Set to TRUE for voom transformation

trend

boolean to perform limma-trend pipeline. Set to TRUE to go through limma-trend

MAT

matrix containing expression set as all samples in columns and genes as rows. Do not provide if mat1 and mat2 are used

contrast.formula

string input to determine coefficients and to design contrasts in a customized way

Condtypes

vector of grouping for samples in MAT

Value

table with DEGs containing for each gene logFC, logCPM, pValue,and FDR, also for each contrast

Examples

1
2
3
4
5
6
7
8
dataNorm <- TCGAbiolinks::TCGAanalyze_Normalization(dataBRCA, geneInfo)
dataFilt <- TCGAanalyze_Filtering(tabDF = dataBRCA, method = "quantile", qnt.cut =  0.25)
samplesNT <- TCGAquery_SampleTypes(colnames(dataFilt), typesample = c("NT"))
samplesTP <- TCGAquery_SampleTypes(colnames(dataFilt), typesample = c("TP"))
dataDEGs <- TCGAanalyze_DEA(mat1 = dataFilt[,samplesNT],
                            mat2 = dataFilt[,samplesTP],
                            Cond1type = "Normal",
                            Cond2type = "Tumor")

TCGAbiolinks documentation built on Nov. 8, 2020, 5:37 p.m.