Linnorm-limma pipeline for Differentially Expression Analysis

Description

This function first performs Linnorm transformation on the dataset. Then, it will perform limma for DEG analysis. Please cite both Linnorm and limma when you use this function for publications.

Usage

1
2
3
4
Linnorm.limma(datamatrix, design = NULL, input = "Raw",
  output = "DEResults", noINF = TRUE, showinfo = FALSE,
  perturbation = 10, minZeroPortion = 2/3, keepAll = TRUE,
  robust = TRUE)

Arguments

datamatrix

The matrix or data frame that contains your dataset. Each row is a feature (or Gene) and each column is a sample (or replicate). Raw Counts, CPM, RPKM, FPKM or TPM are supported. Undefined values such as NA are not supported. It is not compatible with log transformed datasets. If a Linnorm transfored dataset is being used, please set the "input" argument into "Linnorm".

design

A design matrix required for limma. Please see limma's documentation or our vignettes for more detail.

input

Character. "Raw" or "Linnorm". In case you have already transformed your dataset with Linnorm, set input into "Linnorm" so that you can input the Linnorm transformed dataset into the "datamatrix" argument. Defaults to "Raw".

output

Character. "DEResults" or "Both". Set to "DEResults" to output a matrix that contains Differential Expression Analysis Results. Set to "Both" to output a list that contains both Differential Expression Analysis Results and the transformed data matrix.

noINF

Logical. Prevent generating INF in the fold change column by using Linnorm's lambda and adding one. If it is set to FALSE, INF will be generated if one of the conditions has zero expression. Defaults to TRUE.

showinfo

Logical. Show lambda value calculated. Defaults to FALSE.

perturbation

Integer >=2. To search for an optimal minimal deviation parameter (please see the article), Linnorm uses the iterated local search algorithm which perturbs away from the initial local minimum. The range of the area searched in each perturbation is exponentially increased as the area get further away from the initial local minimum, which is determined by their index. This range is calculated by 10 * (perturbation ^ index).

minZeroPortion

Double >=0, <= 1. For example, setting minZeroPortion as 0.5 will remove genes with more than half data values being zero in the calculation of normalizing parameter. It is strongly suggested to change this to 0 for single cell RNA-seq data. Defaults to 2/3.

keepAll

Logical. After applying minZeroPortion filtering, should Linnorm keep all genes in the results? Defualts to TRUE.

robust

Logical. In the eBayes function of Limma, run with robust setting with TRUE or FALSE. Defaults to TRUE.

Details

This function performs both Linnorm and limma for users who are interested in differential expression analysis. Please note that if you directly use a Linnorm Nomralized dataset with limma, the output fold change and average expression with be wrong. (p values and adj.pvalues will be fine.) This is because the voom-limma pipeline assumes input to be in raw counts. This function is written to fix this problem.

Value

If output is set to "DEResults", this function will output a matrix with Differntial Expression Analysis Results with the following columns:

  • logFC: Log 2 Fold Change

  • XPM: Average Expression. If input is raw count or CPM, this column has the CPM unit. If input is RPKM, FPKM or TPM, this column has the TPM unit.

  • t: moderated t-statistic

  • P.Value: p value

  • adj.P.Val: Adjusted p value. This is also called False Discovery Rate or q value.

  • B: log odds that the feature is differential

If output is set to Both, this function will output a list with the following objects:

  • DEResults: Differntial Expression Analysis Results as described above.

  • Linnorm: Linnorm transformed and filtered data matrix.

Examples

1
2
3
4
5
6
7
8
9
#Obtain example matrix:
data(LIHC)
#Create limma design matrix (first 10 columns are tumor, last 10 columns are normal)
designmatrix <- c(rep(1,10),rep(2,10))
designmatrix <- model.matrix(~ 0+factor(designmatrix))
colnames(designmatrix) <- c("group1", "group2")
rownames(designmatrix) <- colnames(LIHC)
#DEG analysis
DEGResults <- Linnorm.limma(LIHC, designmatrix)