plotClusters: Plot static parallel coordinate clusters

Description Usage Arguments Value See Also Examples

View source: R/plotClusters.R

Description

Perform hierarchical clustering analysis and visualize results with parallel coordinate plots. Optionally, save gene IDs within each cluster to .rds files for later use.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
plotClusters(
  data,
  dataMetrics = NULL,
  dataSE = NULL,
  geneList = NULL,
  geneLists = NULL,
  threshVar = "FDR",
  threshVal = 0.05,
  clusterAllData = TRUE,
  showPairs = TRUE,
  nC = 4,
  colList = rainbow(nC),
  aggMethod = c("ward.D", "ward.D2", "single", "complete", "average", "mcquitty",
    "median", "centroid"),
  yAxisLabel = "Count",
  xAxisLabel = "Sample",
  lineSize = 0.1,
  lineAlpha = 0.5,
  vxAxis = FALSE,
  outDir = tempdir(),
  saveFile = TRUE,
  verbose = FALSE
)

Arguments

data

DATA FRAME | Read counts

dataMetrics

LIST | Differential expression metrics; default NULL

dataSE

SUMMARIZEDEXPERIMENT | Summarized experiment format that can be used in lieu of data and dataMetrics; default NULL

geneList

CHARACTER ARRAY | Array of ID values of genes to be drawn from data as parallel coordinate lines. Use this parameter if you have predetermined genes to be drawn. These genes will be clustered. Otherwise, use dataMetrics, threshVar, and threshVal to create clusters to be overlaid as parallel coordinate lines; default NULL. See package website for examples

geneLists

LIST | List of ID values of genes already clustered to be drawn from data as parallel coordinate lines. Each list item is an array of genes ID values that are already grouped as a cluster. Unlike the singular geneList object, the plural geneLists object is not be clustered. If you instead wish to cluster genes, use dataMetrics, threshVar, and threshVal or geneList to create clusters to be overlaid as parallel coordinate lines; default NULL. See package website for examples

threshVar

CHARACTER STRING | Name of column in dataMetrics object that is used to threshold significance; default "FDR"

threshVal

INTEGER | Maximum value to threshold significance from threshVar object; default 0.05

clusterAllData

BOOLEAN [TRUE | FALSE] | Create clusters based on the whole dataset and then assign significant genes to those clusters; default is TRUE. If FALSE, create clusters based on just the significant genes. With either option, the side-by-side boxplot will represent the whole dataset (from data input) and the parallel coordinate lines will represent only the significant genes (those that pass threshVal for threshVar)

showPairs

BOOLEAN [TRUE | FALSE] | When more than three treatment groups are present, for each pairwise comparison, show only the results for that pair of treatment groups; default is TRUE. If FALSE, show results for all treatment groups even though clusters and significance are determined in pairwise fashion. Note this parameter will not make a difference when the data only contains two treatment groups

nC

INTEGER | Number of clusters; default 4

colList

CHARACTER ARRAY | List of colors for each cluster; default is rainbow(nC)

aggMethod

CHARACTER STRING ["ward.D" | "ward.D2" | "single" | "complete" | "average" | "mcquitty" | "median" | "centroid"] | The agglomeration method to be used in the hierarchical clustering; default "ward.D"

yAxisLabel

CHARACTER STRING | Vertical axis label; default "Count"

xAxisLabel

CHARACTER STRING | Horizontal axis label; default "Sample"

lineSize

INTEGER | Size of plotted parallel coordinate lines; default 0.1

lineAlpha

INTEGER | Alpha value of plotted parallel coordinate lines, default 0.5

vxAxis

BOOLEAN [TRUE | FALSE] | Flip x-axis text labels to vertical orientation; default FALSE

outDir

CHARACTER STRING | Output directory to save all images; default tempdir()

saveFile

BOOLEAN [TRUE | FALSE] | Save file to outDir; default TRUE

verbose

BOOLEAN [TRUE | FALSE] | Print each cluster from each cluster size into separate files and print the associated IDs of each cluster from each cluster size into separate .rds files; default is FALSE

Value

List of n elements each containing a grid of parallel coordinate plots, where n is the number of treatment pair combinations in the data object. If the saveFile parameter has a value of TRUE, then each grid of parallel coordinate plots is saved to the location specified in the outDir parameter as a JPG file. If the verbose parameter has a value of TRUE, then a JPG file for each parallel coordinate plot in each grid, RDS file containing the superimposed IDs for each parallel coordinate plot in each grid, and the JPG file of each grid of parallel coordinate plots is saved to the location specified in the outDir parameter.

See Also

hclust https://lindsayrutter.github.io/bigPint/articles/clusters.html

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
# The first set of five examples use data and dataMetrics
# objects as input. The last set of five examples create the same plots now
# using the SummarizedExperiment (i.e. dataSE) object input.

# Example 1: Perform hierarchical clustering of size four using the 
# default agglomeration method "ward.D". Cluster only on the genes that have
# FDR < 1e-7 (n = 113) and overlay these genes.

library(grid)
library(matrixStats)
library(ggplot2)
data(soybean_ir_sub)
soybean_ir_sub[,-1] <- log(soybean_ir_sub[-1]+1)
data(soybean_ir_sub_metrics)
colList = c("#00A600FF", rainbow(5)[c(1,4,5)])
ret <- plotClusters(data=soybean_ir_sub,
    dataMetrics = soybean_ir_sub_metrics, nC=4, colList = colList,
    clusterAllData = FALSE, threshVal = 1e-7, saveFile = FALSE)
grid.draw(ret[["N_P_4"]])

# Example 2: Perform the same analysis, only now create the four groups by 
# clustering on all genes in the data (n = 5,604). Then, overlay the genes 
# that have FDR < 1e-7 (n = 113) into their corresponding clusters.

ret <- plotClusters(data=soybean_ir_sub,
    dataMetrics = soybean_ir_sub_metrics, nC=4, colList = colList,
    clusterAllData = TRUE, threshVal = 1e-7, saveFile = FALSE)
grid.draw(ret[["N_P_4"]])

# Example 3: Perform the same analysis, only now overlay all genes in the 
# data by keeping the dataMetrics object as its default value of NULL.

ret <- plotClusters(data=soybean_ir_sub, nC=4, colList = colList,
    clusterAllData = TRUE, saveFile = FALSE)
grid.draw(ret[["N_P_4"]])

# Example 4: Visualization of gene clusters is usually performed on
# standardized data. Here, hierarchical clustering of size four is performed
# using the agglomeration method "average" on standardized data. Only genes 
# with FDR < 0.05 are used for the clustering. Only two of the three 
# pairwise combinations of treatment groups (S1 and S2; S1 and S3) have any 
# genes with FDR < 0.05. The output plots for these two pairs are examined. 

data(soybean_cn_sub)
data(soybean_cn_sub_metrics)
soybean_cn_sub_st <- as.data.frame(t(apply(as.matrix(soybean_cn_sub[,-1]),
    1, scale)))
soybean_cn_sub_st$ID <- as.character(soybean_cn_sub$ID)
soybean_cn_sub_st <- soybean_cn_sub_st[,c(length(soybean_cn_sub_st),
    1:length(soybean_cn_sub_st)-1)]
colnames(soybean_cn_sub_st) <- colnames(soybean_cn_sub)
nID <- which(is.nan(soybean_cn_sub_st[,2]))
soybean_cn_sub_st[nID,2:length(soybean_cn_sub_st)] <- 0
ret <- plotClusters(data=soybean_cn_sub_st,
    dataMetrics = soybean_cn_sub_metrics, nC=4,
    colList = c("#00A600FF", "#CC00FFFF", "red", "darkorange"),
    lineSize = 0.5, lineAlpha = 1, clusterAllData = FALSE,
    aggMethod = "average", yAxisLabel = "Standardized read count",
    saveFile = FALSE)
names(ret)
grid.draw(ret[["S1_S2_4"]])
grid.draw(ret[["S1_S3_4"]])

# Example 5: Run the same analysis, only now set the verbose parameter to 
# value TRUE. This will save images of each individual cluster, .rds files 
# that contain the IDs within each cluster, and images of the conglomerate 
# clusters to outDir (default tempdir()).

## Not run: 
plotClusters(data=soybean_cn_sub_st, dataMetrics = soybean_cn_sub_metrics,
  nC=4, colList = c("#00A600FF", "#CC00FFFF", "red", "darkorange"),
  lineSize = 0.5, lineAlpha = 1, clusterAllData = FALSE,
  aggMethod = "average", yAxisLabel = "Standardized read count",
  verbose = TRUE)

## End(Not run)

# Below are the same five examples, only now using the
# SummarizedExperiment (i.e. dataSE) object as input.

# Example 1: Perform hierarchical clustering of size four using the 
# default agglomeration method "ward.D". Cluster only on the genes that have
# FDR < 1e-7 (n = 113) and overlay these genes.

## Not run: 
library(grid)
library(matrixStats)
library(ggplot2)
data(se_soybean_ir_sub)
assay(se_soybean_ir_sub) <- log(as.data.frame(assay(se_soybean_ir_sub))+1)
colList = c("#00A600FF", rainbow(5)[c(1,4,5)])
ret <- plotClusters(dataSE=se_soybean_ir_sub, nC=4, colList = colList,
    clusterAllData = FALSE, threshVal = 1e-7, saveFile = FALSE)
grid.draw(ret[["N_P_4"]])

## End(Not run)

## Not run: 
# Example 2: Perform the same analysis, only now create the four groups by 
# clustering on all genes in the data (n = 5,604). Then, overlay the genes 
# that have FDR < 1e-7 (n = 113) into their corresponding clusters.

ret <- plotClusters(dataSE=se_soybean_ir_sub, nC=4, colList = colList,
    clusterAllData = TRUE, threshVal = 1e-7, saveFile = FALSE)
grid.draw(ret[["N_P_4"]])

## End(Not run)

# Example 3: Perform the same analysis, only now overlay all genes in the 
# data by setting the rowData() to NULL.

## Not run: 
se_soybean_ir_sub_nm <- se_soybean_ir_sub
rowData(se_soybean_ir_sub_nm) <- NULL
ret <- plotClusters(dataSE=se_soybean_ir_sub_nm, nC=4, colList = colList,
    clusterAllData = TRUE, saveFile = FALSE)
grid.draw(ret[["N_P_4"]])

## End(Not run)

# Example 4: Visualization of gene clusters is usually performed on
# standardized data. Here, hierarchical clustering of size four is performed
# using the agglomeration method "average" on standardized data. Only genes 
# with FDR < 0.05 are used for the clustering. Only two of the three 
# pairwise combinations of treatment groups (S1 and S2; S1 and S3) have any 
# genes with FDR < 0.05. The output plots for these two pairs are examined. 

## Not run: 
data(se_soybean_cn_sub)
se_soybean_cn_sub_st = se_soybean_cn_sub
assay(se_soybean_cn_sub_st) <-as.data.frame(t(apply(as.matrix(as.data.frame(
    assay(se_soybean_cn_sub))), 1, scale)))
nID <- which(is.nan(as.data.frame(assay(se_soybean_cn_sub_st))[,1]))
assay(se_soybean_cn_sub_st)[nID,] <- 0
ret <- plotClusters(dataSE=se_soybean_cn_sub_st, nC=4,
    colList = c("#00A600FF", "#CC00FFFF", "red", "darkorange"),
    lineSize = 0.5, lineAlpha = 1, clusterAllData = FALSE,
    aggMethod = "average", yAxisLabel = "Standardized read count",
    saveFile = FALSE)
names(ret)
grid.draw(ret[["S1_S2_4"]])
grid.draw(ret[["S1_S3_4"]])

## End(Not run)

# Example 5: Run the same analysis, only now set the verbose parameter to 
# value TRUE. This will save images of each individual cluster, .rds files 
# that contain the IDs within each cluster, and images of the conglomerate 
# clusters to outDir (default tempdir()).

## Not run: 
plotClusters(dataSE=se_soybean_cn_sub_st, nC=4,
    colList = c("#00A600FF", "#CC00FFFF", "red", "darkorange"),
    lineSize = 0.5, lineAlpha = 1, clusterAllData = FALSE,
    aggMethod = "average", yAxisLabel = "Standardized read count",
    verbose = TRUE)

## End(Not run)

bigPint documentation built on Nov. 8, 2020, 5:07 p.m.