Home

/

GitHub

/

ATpoint/CreateGeneSignatures

/

CreateGeneSignatures: Create gene signatures in a rank-based style

CreateGeneSignatures: Create gene signatures in a rank-based style
In ATpoint/CreateGeneSignatures: A rank-based method to define gene signatures from gene expression data.

View source: R/CreateGeneSignatures.R

CreateGeneSignatures

R Documentation

Create gene signatures in a rank-based style

Description

Convert lists of ranked genes into celltype-specific signatures

Usage

CreateGeneSignatures(
  ranked,
  use_groups = NULL,
  exclude_groups = NULL,
  delim = "_vs_",
  keep.n = Inf,
  min.prop = 1,
  extended = FALSE
)

Arguments

`ranked`	nested lists of ranked genes for each celltype, see details and the RankDEGs function
`use_groups`	trigger group-aware mode. A vector with celltype names to be used to define markers for as a group, so this group versus all others. See details.
`exclude_groups`	vector, remove these groups from the signature creation
`delim`	a string indicating the delimiter of names(ranked), e.g. celltype1_vs_celltype2 would be "_vs_"
`keep.n`	number of genes to keep per signature, by default all candidates are returned for manual posthoc filtering
`min.prop`	minimum proportion of comparisons per celltype that a gene must be included in. See details.
`extended`	logical, whether to output the signature genes as a data.frame that indicates whether the gene qualified as a marker against the other groups (1) or not (0)

Details

The function takes as input a nested list that, for every celltype, contains the ranked genes from pairwise comparisons against every other celltype in this analysis, see also the RankDEGs function of this package. For every celltype first a rank matrix is created that stores the ranked genes from every pairwise comparison, assigning low scores to genes that ranked high in the input ranking and vice versa. Genes to form the signatures are then selected by choosing those genes with minimal ranks across all comparisons. In an iterative fashion genes that ranked high all comparisons are chosen first (and then ordered by median score), then genes ranked high in all but one, in all but two, all but three etc comparisons. The user can choose via min.prop the fraction of comparisons that a gene be highly ranked in. For example a setup with four celltypes and min.prop=1 would mean that only genes are included that ranked high in all comparisons. If min.prop=2/3 is chosen then signature genes must rank high in at least two out of three comparisons. The higher this value between (between zero and one) the more stringent the filtering. If celltypes are decently separated it might be trivial to find large numbers of signature genes even with min.prop of 1. In cases where cells are very similar to each other, e.g. cells being close in a developmental continuum, one will need to lower this threshold in order to get some (or any) signature genes.

The use_groups argument allows to specify a group of celltypes and the function will then infer markers for this combined group versus the rest. In the example above we use it to define combined CD4/CD8 markers. The advantage here is that these cells have many markers in common so by defining them as a group we only look for "pan"-T cell markers and that retains more genes that when treating both celltypes as independent groups.

Author(s)

Alexander Toenges

Examples

# load RNA-seq data for CD4T-, CD8T and naive B cells from Haemopedia:
counts <- readRDS(paste0(
            system.file("extdata",package="CreateGeneSignatures"),
            "/haemopedia_subset.rds"))

# Use edgeR to perform all pairwise comparisons

library(edgeR)
y <- DGEList(counts=counts,group=gsub("\\..", "", colnames(counts)))
design <- model.matrix(~0+group,y$samples)
colnames(design) <- gsub("group", "", colnames(design))
y <- y[filterByExpr(y),,keep.lib.size=FALSE]
y <- calcNormFactors(y)
y <- estimateDisp(y,design)
fit <- glmQLFit(y,design)

# all unique pairwise contrasts:
contrasts <- makeContrasts(CD4T_vs_CD8T  = CD4T-CD8T,
                           CD4T_vs_NK    = CD4T-NK,
                           CD4T_vs_NveB  = CD4T-NveB,
                           CD8T_vs_NK    = CD8T-NK,
                           CD8T_vs_NveB  = CD8T-NveB,
                           NK_vs_NveB    = NK-NveB,
                           levels = design)
                           
# test using glmTreat against a minumum fold change of ~1.5                          
res <- sapply(colnames(contrasts), function(con){
  tt<-topTags(glmTreat(fit,contrast=contrasts[,con],lfc=log2(1.5)),n=Inf)$table
return(data.frame(Gene=rownames(tt), tt))
}, simplify = FALSE)

# Rank the DEGs:
ranked <- RankDEGs(res)

# Create signatures, keeping top 50 signature genes that separate the respective celltype
# from all other celltypes:
signatures <- CreateGeneSignatures(ranked=ranked, keep.n=50)
# check number of genes. for CD8T cells we found < 50 genes:
lengths(signatures)

# Inspect signatures using heatmaps plotting the scaled logcpms of the signature genes
library(pheatmap)
logcpm <- log2(edgeR::cpm(y,log=FALSE)+1)

# plot a heatmap in the order of names(signatures)
col_order <- unlist(lapply(names(ranked), 
                           function(x) grep(paste0("^", x), colnames(logcpm))))
                           
# use scaled logCPMs                           
logcpmZ <- t(scale(t(logcpm[unique(unlist(signatures)),])))
pheatmap(mat=logcpmZ[,col_order],
         show_rownames=FALSE, cluster_rows=FALSE, cluster_cols=FALSE)
         
# or each signature individually:
pheatmap(mat=t(scale(t(logcpm[signatures$CD4T,]))), show_rownames=FALSE)
pheatmap(mat=t(scale(t(logcpm[signatures$CD8T,]))), show_rownames=FALSE)
pheatmap(mat=t(scale(t(logcpm[signatures$NveB,]))), show_rownames=FALSE)
pheatmap(mat=t(scale(t(logcpm[signatures$NK,]))), show_rownames=FALSE)
         
# or genes that separate the CD4T and CD8T cells from the rest
signatures2 <- CreateGeneSignatures(ranked=ranked, use_groups=c("CD4T", "CD8T"), extended=FALSE)
logcpmZ2 <- t(scale(t(logcpm[signatures2,])))
pheatmap(mat=logcpmZ2[,col_order],
         show_rownames=TRUE, cluster_rows=FALSE, cluster_cols=FALSE)
         
# find signatures for each group but exclude the CD4T categorically
signatures3 <- CreateGeneSignatures(ranked=ranked, exclude_groups=c("CD4T"), extended=FALSE)

# Find all markers for each group against all but one group and output the extended table
# to easily see which group the gene does not qualify as a marker for.
# Here, the genes in the below tail command are markers for CD4T against NK and NveB but do not separate CD4T from CD8T.
signatures4 <- CreateGeneSignatures(ranked=ranked, min.prop=2/3, extended=TRUE)
tail(signatures4$CD4T)

ATpoint/CreateGeneSignatures documentation built on Dec. 1, 2023, 12:44 a.m.

ATpoint/CreateGeneSignatures index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

ATpoint/CreateGeneSignatures
A rank-based method to define gene signatures from gene expression data.

CreateGeneSignatures: Create gene signatures in a rank-based style
In ATpoint/CreateGeneSignatures: A rank-based method to define gene signatures from gene expression data.

Create gene signatures in a rank-based style

Description

Usage

Arguments

Details

Author(s)

Examples

Related to CreateGeneSignatures in ATpoint/CreateGeneSignatures...

R Package Documentation

Browse R Packages

We want your feedback!

ATpoint/CreateGeneSignatures A rank-based method to define gene signatures from gene expression data.

CreateGeneSignatures: Create gene signatures in a rank-based style In ATpoint/CreateGeneSignatures: A rank-based method to define gene signatures from gene expression data.

Create gene signatures in a rank-based style

Description

Usage

Arguments

Details

Author(s)

Examples

Related to CreateGeneSignatures in ATpoint/CreateGeneSignatures...

R Package Documentation

Browse R Packages

We want your feedback!

ATpoint/CreateGeneSignatures
A rank-based method to define gene signatures from gene expression data.

CreateGeneSignatures: Create gene signatures in a rank-based style
In ATpoint/CreateGeneSignatures: A rank-based method to define gene signatures from gene expression data.