clusterGenes: Hierarchical clustering heatmap

Description Usage Arguments Details Examples

View source: R/clusterGenes.R

Description

clusterGenes Creates a heatmap of differentially expressed genes.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
clusterGenes(
  value = 0,
  pvalue = 0.1,
  clusterColumns = NA,
  summarise_clusters = FALSE,
  cutTree = NA,
  distRows = "euclidean",
  clusterMethod = "average",
  clusterNames = NULL,
  annotationTbl = NA,
  dds = NA,
  titleExperiment = NA,
  shrinkLFC = FALSE,
  test = FALSE
)

Arguments

value

Numeric. LFC threshold to subset dataset based on DESeq2 results values. Taken as absolute value.

pvalue

Numeric. P-value threshhold.

clusterColumns

Character or integer. Specify columns in counts matrix which going to used for clustering. The default setting chooses samples that correspont to colData(dds)[,"clustering"] == 'experiment' in 'clustering' column.

summarise_clusters

Show summarised version of the heatmap. Instead of full heatmap the function will output mean values for each cluster in samples.

cutTree

Integer. Number of clusters to cut tree into.

distRows

Default "euclidean". the distance measure to be used. This must be one of "euclidean", "maximum", "manhattan", "canberra", "binary" or "minkowski". Any unambiguous substring can be given.

clusterMethod

Default "average". Hierarchical clustering the agglomeration method to be used. This should be (an unambiguous abbreviation of) one of "ward.D", "ward.D2", "single", "complete", "average" (= UPGMA), "mcquitty" (= WPGMA), "median" (= WPGMC) or "centroid" (= UPGMC).

clusterNames

Character, optional. Specify names for clusters. Should be the same lenght as 'cutTree'.

annotationTbl

Data frame with 'gene_id' column corresponding to gene IDs in results(dds) table. The annotation data frame option allows to add additional info to the results output table.

dds

DESeq2 dataset, output of DESeq(x).

titleExperiment

Character. Heatmap main title. If not specified generated automatically.

shrinkLFC

Logical. Apply lfcShrink function to generate DESeq2 results.

test

Logical. Output tree only. Can used to determine number of clusters.

Details

The function is useful, but requires many very specific settings. The function estimates differential expression based on results function from DESeq2. Then it subsets counts matrix (assay(x)) using threshold values for LFC and then for p-value if provided. Then using data provided in colData(dds)[,"clustering"] it substracts 'control' samples from 'experiment' generating prelimenary DE values. Then it adds 'means' column containing average values for individual genes. Using sample names specified in colData(dds)[,"clustering"] as 'experiment' the function then performs hierarchical clustering of differentially expressed genes.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# Load SE dataset
se <- readRDS(system.file("extdata", "se_vic3_2020.RData", package = "vic3PCD"))

## DESeq dataset
dds <- DESeqDataSet(se, design = ~ condition)
## To make sure we have right category used as reference in the analysis
dds$condition <- relevel(dds$condition, ref = "Control")
## DESeq analysis
dds <- DESeq(dds, test = "Wald", sfType = "poscounts", useT = FALSE, minReplicatesForReplace = 7)
## Filter genes with more than 10 aligned reads
keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]

## Annotation Table
annot <- readRDS(system.file("extdata", "GenesTableFull_cp_annotation.rda", package = "vic3PCD"))

## Threshold values
val = 1.9
pval = 0.001
clastTree = 6

## Clustering
groups <- clusterGenes(dds = dds,
                      annotationTbl = annot,
                      summarise_clusters = FALSE,
                      value = val,
                      pvalue = pval,
                      cutTree = clastTree,
                      distRows = "euclidean",
                      clusterMethod = "average")

anabeloff/vic3PCD documentation built on Dec. 2, 2020, 11:03 a.m.