knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
Clustering genes by differential expression value allow to locate genes which were affected the most.
Here we use the wrapper function 'clusterGenes' to produce a heatmap of significant DE genes.
library(vic3PCD) suppressMessages(library(pheatmap)) suppressMessages(library(kableExtra)) suppressMessages(library(DESeq2)) suppressMessages(library(RColorBrewer))
Creating annotation object. Here we combine novel and existing annotations in a single data frame.
# Load Novel transcripts annotation annot_novel <- readRDS(system.file("extdata", "GenesTableFull_cp_NOVEL_annotation.rda", package = "vic3PCD")) # Load default annotation and join it with novel data annot <- readRDS(system.file("extdata", "GenesTableFull_cp_annotation.rda", package = "vic3PCD")) %>% dplyr::bind_rows(annot_novel) %>% tibble::remove_rownames() # rownames(annot) <- annot$gene_id
Creading DESeq2 data object using SE object
# Load SE dataset se <- readRDS(system.file("extdata", "se_vic3_2020.RData", package = "vic3PCD")) # DESeq dataset dds <- DESeqDataSet(se, design = ~ condition) # To make sure we have right category used as reference in the analysis dds$condition <- relevel(dds$condition, ref = "Control") # DESeq analysis dds <- DESeq(dds, test = "Wald", sfType = "poscounts", useT = FALSE, minReplicatesForReplace = 7) # Filter genes with more than 10 aligned reads keep <- rowSums(counts(dds)) >= 10 dds <- dds[keep,]
To perform clustering with 'clusterGenes' first we define a significance threshold.
Genes will be filtered based on significance p-value produced by DESeq2 and DE value.
To identify the number of clusters use the tree on the left. Imagine a vertical line going through the tree. Place it so that it crosses horizontal lines on the tree without crossing any vertical line. Count how many horizontal lines it crosses. That will give an approximate number of clusters.
The heatamp the right most column contains sample means for each gene (row means).
# Threshold values val = 1.9 pval = 0.001 clastTree = 6 groups <- clusterGenes(dds = dds, annotationTbl = annot, summarise_clusters = FALSE, value = val, pvalue = pval, cutTree = clastTree, distRows = "euclidean", clusterMethod = "average")
Here is a partial view on clustering table.
tbl <- dplyr::arrange(groups,desc(log2FoldChange)) tbl <- tbl[c(1:10), c("gene_id", "proteinId", "X6_3_EP155mix", "X6_4_EP155mix", "X6_5_EP155mix", "Mean", "baseMean", "log2FoldChange", "pvalue")] kable(tbl, escape = F, linesep = "", booktabs = T, longtable = F, caption = "Hierarchical Clustering (partial)") %>% kable_styling(latex_options = c("scale_down", "repeat_header"), bootstrap_options = c("condensed"), font_size = 12, full_width = F) %>% column_spec(1, bold = T)
Another useful option for 'clusterGenes' function is to make a heatmap of average DE values in each cluster for in individual samples. This is a summarised version of Hierarchical clustering plot. In addition to row means column it also creates a variance column for each cluster.
groups <- clusterGenes(dds = dds, annotationTbl = annot, summarise_clusters = TRUE, value = val, pvalue = pval, cutTree = clastTree, distRows = "euclidean", clusterMethod = "average")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.