RankDEGs: Rank genes based on differential expression statistics

View source: R/RankDEGs.R

RankDEGsR Documentation

Rank genes based on differential expression statistics

Description

Rank genes from lists of pairwise comparison by significance or effect size

Usage

RankDEGs(
  res,
  delim = "_vs_",
  signif.column = "FDR",
  signif.threshold = 0.05,
  effect.column = "logFC",
  effect.threshold = 0,
  gene.column = "Gene",
  rnk.column = "PValue",
  rnk.method = "increasing"
)

Arguments

res

a named list of pairwise DE results, see details.

delim

a string that delimits the comparison groups in names(res), e.g. celltype1_vs_celltype2 would be "_vs_"

signif.column

colname storing significances to use for filtering, e.g. FDR

signif.threshold

keep only genes with signif.column below that threshold

effect.column

colname storing the effect size, e.g. logFC. Must be a zero-centered effect size so effect size > 0 means higher in one group and < 0 means lower. Don't use something like AUCs from a Wilcox test where > 0.5 means higher and < 0.5 means lower per group.

effect.threshold

keep only genes with effect.column above this threshold, could be a minimum effect size even though it is recommended to explicitely test against the desired minimum effect size rather than postfiltering, see details.

gene.column

colname storing genes or any kind of row identifiers, those will be returned in the output meeting the above criteria

rnk.column

use this column for the ranking

rnk.method

either 'decreasing' or 'increasing' ranking based on rnk.column

Details

For an example of how the input should look like see the examples. The pairwise comparisons must be unique, so if something like celltype1_vs_celltype2 is present then do not include celltype2_vs_celltype1 into res as this is identical and only the sign of the effect size (e.g. the logFC) changes. The function handles this internally for every celltype.

The signif.column and effect.column are used first to filter the data, e.g. for FDR and logFC, and then the ranking is done based on the rnk.column but aware of the direction of change based on effect.column, and in its current state only genes with a positive effect size are taken into account. The rnk.column could e.g. e the nominal PValue or t-stat column which both have the advantage over FDR that they usually have no ties.

The output will be a nested list with the ranked genes for every celltype compared to every other celltype based in the entries of res, see the examples. Something like: $celltype1 ..$celltype2 ..$celltype3 $celltype2 ..$celltype1 ..$celltype3 $celltype3 ..celltype1 ..celltype2

For an example with real data see the examples of the CreateGeneSignatures function of this package.#'

Author(s)

Alexander Toenges

Examples

# first make some dummy DE results, then rank:
set.seed(1)
res <- sapply(c("gr1_vs_gr2","gr2_vs_gr3","gr1_vs_gr3"), function(x){
  data.frame(Gene=paste0("Gene",1:10), 
             logFC=rnorm(10,1,2),
             PValue=jitter(rep(0.001, 10), 20),
             FDR=jitter(rep(0.04, 10), 20))
},simplify=FALSE)

# this is how the results tables look:
res$gr1_vs_gr2

ranked <- RankDEGs(res=res, rnk.column="PValue", rnk.method="increasing")


ATpoint/CreateGeneSignatures documentation built on Dec. 1, 2023, 12:44 a.m.