Description Usage Arguments Details Value Author(s) References See Also Examples
This function generates color-coded Clustered Image Maps (CIMs) ("heat maps") to represent "high-dimensional" data sets.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | cim(
mat = NULL,
color = NULL,
row.names = TRUE,
col.names = TRUE,
row.sideColors = NULL,
col.sideColors = NULL,
row.cex = NULL,
col.cex = NULL,
cutoff = 0,
cluster = "both",
dist.method = c("euclidean", "euclidean"),
clust.method = c("complete", "complete"),
cut.tree = c(0, 0),
transpose = FALSE,
symkey = TRUE,
keysize = c(1, 1),
keysize.label = 1,
zoom = FALSE,
title = NULL,
xlab = NULL,
ylab = NULL,
margins = c(5, 5),
lhei = NULL,
lwid = NULL,
comp = NULL,
center = TRUE,
scale = FALSE,
mapping = "XY",
legend = NULL,
save = NULL,
name.save = NULL
)
|
mat |
numeric matrix of values to be plotted. Alternatively, an object
of class inheriting from |
color |
a character vector of colors such as that generated by
|
row.names, col.names |
logical, should the name of rows and/or columns
of |
row.sideColors |
(optional) character vector of length |
col.sideColors |
(optional) character vector of length |
row.cex, col.cex |
positive numbers, used as |
cutoff |
numeric between 0 and 1. Variables with correlations below this threshold in absolute value are not plotted. To use only when mapping is "XY". |
cluster |
character string indicating whether to cluster |
dist.method |
character vector of length two. The distance measure used
in clustering rows and columns. Possible values are |
clust.method |
character vector of length two. The agglomeration method
to be used for rows and columns. Accepts the same values as in
|
cut.tree |
numeric vector of length two with components in [0,1]. The height proportions where the trees should be cut for rows and columns, if these are clustered. |
transpose |
logical indicating if the matrix should be transposed for
plotting. Defaults to |
symkey |
Logical indicating whether the color key should be made
symmetric about 0. Defaults to |
keysize |
vector of length two, indicating the size of the color key. |
keysize.label |
vector of length 1, indicating the size of the labels and title of the color key. |
zoom |
logical. Whether to use zoom for interactive zoom. See Details. |
title, xlab, ylab |
title, x- and y-axis titles; default to none. |
margins |
numeric vector of length two containing the margins (see
|
lhei, lwid |
arguments passed to |
comp |
atomic or vector of positive integers. The components to
adequately account for the data association. For a non sparse method, the
similarity matrix is computed based on the variates and loading vectors of
those specified components. For a sparse approach, the similarity matric is
computed based on the variables selected on those specified components. See
example. Defaults to |
center |
either a logical value or a numeric vector of length equal to
the number of columns of |
scale |
either a logical value or a numeric vector of length equal to
the number of columns of |
mapping |
character string indicating whether to map |
legend |
A list indicating the legend for each group, the color vector, title of the legend and cex. |
save |
should the plot be saved? If so, argument to be set to either
|
name.save |
character string for the name of the file to be saved. |
One matrix Clustered Image Map (default method) is a 2-dimensional
visualization of a real-valued matrix (basically
image(t(mat))
) with rows and/or columns reordered according to
some hierarchical clustering method to identify interesting patterns.
Generated dendrograms from clustering are added to the left side and to the
top of the image. By default the used clustering method for rows and columns
is the complete linkage method and the used distance measure is the
distance euclidean.
In "pca"
, "spca"
, "ipca"
, "sipca"
,
"plsda"
, "splsda"
and multilevel variants methods the
mat
matrix is object$X
.
For the remaining methods, if mapping = "X"
or mapping = "Y"
the mat
matrix is object$X
or object$Y
respectively. If
mapping = "XY"
:
in rcc
method, the matrix
mat
is created where element (j,k) is the scalar product value
between every pairs of vectors in dimension length(comp)
representing
the variables X_j and Y_k on the axis defined by Z_i with
i in comp
, where Z_i is the equiangular vector between
the i-th X and Y canonical variate.
in pls
, spls
and multilevel spls methods, if
object$mode
is "regression"
, the element (j,k) of the
matrix mat
is given by the scalar product value between every pairs
of vectors in dimension length(comp)
representing the variables
X_j and Y_k on the axis defined by U_i with i in
comp
, where U_i is the i-th X variate. If
object$mode
is "canonical"
then X_j and Y_k are
represented on the axis defined by U_i and V_i respectively.
By default four components will be displayed in the plot. At the top left is
the color key, top right is the column dendogram, bottom left is the row
dendogram, bottom right is the image plot. When sideColors
are
provided, an additional row or column is inserted in the appropriate
location. This layout can be overriden by specifiying appropriate values for
lwid
and lhei
. lwid
controls the column width, and
lhei
controls the row height. See the help page for
layout
for details on how to use these arguments.
For visualization of "high-dimensional" data sets, a nice zooming tool was
created. zoom = TRUE
open a new device, one for CIM, one for zoom-out
region and define an interactive 'zoom' process: click two points at imagen
map region by pressing the first mouse button. It then draws a rectangle
around the selected region and zoom-out this at new device. The process can
be repeated to zoom-out other regions of interest.
The zoom process is terminated by clicking the second button and selecting 'Stop' from the menu, or from the 'Stop' menu on the graphics window.
A list containing the following components:
M |
the mapped
matrix used by |
rowInd, colInd |
row and column index
permutation vectors as returned by |
ddr, ddc |
object of class |
mat.cor |
the correlation matrix used for the heatmap. Available only when mapping = "XY". |
row.names, col.names |
character vectors with row and column labels used. |
row.sideColors, col.sideColors |
character vector containing the color names for vertical and horizontal side bars used to annotate the rows and columns. |
Ignacio González, Francois Bartolo, Kim-Anh Lê Cao, Al J Abadi
Eisen, M. B., Spellman, P. T., Brown, P. O. and Botstein, D. (1998). Cluster analysis and display of genome-wide expression patterns. Proceeding of the National Academy of Sciences of the USA 95, 14863-14868.
Weinstein, J. N., Myers, T. G., O'Connor, P. M., Friend, S. H., Fornace Jr., A. J., Kohn, K. W., Fojo, T., Bates, S. E., Rubinstein, L. V., Anderson, N. L., Buolamwini, J. K., van Osdol, W. W., Monks, A. P., Scudiero, D. A., Sausville, E. A., Zaharevitz, D. W., Bunow, B., Viswanadhan, V. N., Johnson, G. S., Wittes, R. E. and Paull, K. D. (1997). An information-intensive approach to the molecular pharmacology of cancer. Science 275, 343-349.
González I., Lê Cao K.A., Davis M.J., Déjean S. (2012). Visualising associations between paired 'omics' data sets. BioData Mining; 5(1).
mixOmics article:
Rohart F, Gautier B, Singh A, Lê Cao K-A. mixOmics: an R package for 'omics feature selection and multiple data integration. PLoS Comput Biol 13(11): e1005752
heatmap
, hclust
, plotVar
,
network
and
http://mixomics.org/graphics/ for more details on all options available.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 | ## default method: shows cross correlation between 2 data sets
#------------------------------------------------------------------
data(nutrimouse)
X <- nutrimouse$lipid
Y <- nutrimouse$gene
cim(cor(X, Y), cluster = "none")
## Not run:
## CIM representation for objects of class 'rcc'
#------------------------------------------------------------------
nutri.rcc <- rcc(X, Y, ncomp = 3, lambda1 = 0.064, lambda2 = 0.008)
cim(nutri.rcc, xlab = "genes", ylab = "lipids", margins = c(5, 6))
#-- interactive 'zoom' available as below
cim(nutri.rcc, xlab = "genes", ylab = "lipids", margins = c(5, 6),
zoom = TRUE)
#-- select the region and "see" the zoom-out region
#-- cim from X matrix with a side bar to indicate the diet
diet.col <- palette()[as.numeric(nutrimouse$diet)]
cim(nutri.rcc, mapping = "X", row.names = nutrimouse$diet,
row.sideColors = diet.col, xlab = "lipids",
clust.method = c("ward", "ward"), margins = c(6, 4))
#-- cim from Y matrix with a side bar to indicate the genotype
geno.col = color.mixo(as.numeric(nutrimouse$genotype))
cim(nutri.rcc, mapping = "Y", row.names = nutrimouse$genotype,
row.sideColors = geno.col, xlab = "genes",
clust.method = c("ward", "ward"))
#-- save the result as a jpeg file
jpeg(filename = "test.jpeg", res = 600, width = 4000, height = 4000)
cim(nutri.rcc, xlab = "genes", ylab = "lipids", margins = c(5, 6))
dev.off()
## CIM representation for objects of class 'spca' (also works for sipca)
#------------------------------------------------------------------
data(liver.toxicity)
X <- liver.toxicity$gene
liver.spca <- spca(X, ncomp = 2, keepX = c(30, 30), scale = FALSE)
dose.col <- color.mixo(as.numeric(as.factor(liver.toxicity$treatment[, 3])))
# side bar, no variable names shown
cim(liver.spca, row.sideColors = dose.col, col.names = FALSE,
row.names = liver.toxicity$treatment[, 3],
clust.method = c("ward", "ward"))
## CIM representation for objects of class '(s)pls'
#------------------------------------------------------------------
data(liver.toxicity)
X <- liver.toxicity$gene
Y <- liver.toxicity$clinic
liver.spls <- spls(X, Y, ncomp = 3,
keepX = c(20, 50, 50), keepY = c(10, 10, 10))
# default
cim(liver.spls)
# transpose matrix, choose clustering method
cim(liver.spls, transpose = TRUE,
clust.method = c("ward", "ward"), margins = c(5, 7))
# Here we visualise only the X variables selected
cim(liver.spls, mapping="X")
# Here we should visualise only the Y variables selected
cim(liver.spls, mapping="Y")
# Here we only visualise the similarity matrix between the variables by spls
cim(liver.spls, cluster="none")
# plotting two data sets with the similarity matrix as input in the funciton
# (see our BioData Mining paper for more details)
# Only the variables selected by the sPLS model in X and Y are represented
cim(liver.spls, mapping="XY")
# on the X matrix only, side col var to indicate dose
dose.col <- color.mixo(as.numeric(as.factor(liver.toxicity$treatment[, 3])))
cim(liver.spls, mapping = "X", row.sideColors = dose.col,
row.names = liver.toxicity$treatment[, 3])
# CIM default representation includes the total of 120 genes selected, with the dose color
# with a sparse method, show only the variables selected on specific components
cim(liver.spls, comp = 1)
cim(liver.spls, comp = 2)
cim(liver.spls, comp = c(1,2))
cim(liver.spls, comp = c(1,3))
## CIM representation for objects of class '(s)plsda'
#------------------------------------------------------------------
data(liver.toxicity)
X <- liver.toxicity$gene
# Setting up the Y outcome first
Y <- liver.toxicity$treatment[, 3]
#set up colors for cim
dose.col <- color.mixo(as.numeric(as.factor(liver.toxicity$treatment[, 3])))
liver.splsda <- splsda(X, Y, ncomp = 2, keepX = c(40, 30))
cim(liver.splsda, row.sideColors = dose.col, row.names = Y)
## CIM representation for objects of class splsda 'multilevel'
# with a two level factor (repeated sample and time)
#------------------------------------------------------------------
data(vac18.simulated)
X <- vac18.simulated$genes
design <- data.frame(samp = vac18.simulated$sample)
Y = data.frame(time = vac18.simulated$time,
stim = vac18.simulated$stimulation)
res.2level <- splsda(X, Y = Y, ncomp = 2, multilevel = design,
keepX = c(120, 10))
#define colors for the levels: stimulation and time
stim.col <- c("darkblue", "purple", "green4","red3")
stim.col <- stim.col[as.numeric(Y$stim)]
time.col <- c("orange", "cyan")[as.numeric(Y$time)]
# The row side bar indicates the two levels of the facteor, stimulation and time.
# the sample names have been motified on the plot.
cim(res.2level, row.sideColors = cbind(stim.col, time.col),
row.names = paste(Y$time, Y$stim, sep = "_"),
col.names = FALSE,
#setting up legend:
legend=list(legend = c(levels(Y$time), levels(Y$stim)),
col = c("orange", "cyan", "darkblue", "purple", "green4","red3"),
title = "Condition", cex = 0.7)
)
## CIM representation for objects of class spls 'multilevel'
#------------------------------------------------------------------
data(liver.toxicity)
repeat.indiv <- c(1, 2, 1, 2, 1, 2, 1, 2, 3, 3, 4, 3, 4, 3, 4, 4, 5, 6, 5, 5,
6, 5, 6, 7, 7, 8, 6, 7, 8, 7, 8, 8, 9, 10, 9, 10, 11, 9, 9,
10, 11, 12, 12, 10, 11, 12, 11, 12, 13, 14, 13, 14, 13, 14,
13, 14, 15, 16, 15, 16, 15, 16, 15, 16)
# sPLS is a non supervised technique, and so we only indicate the sample repetitions
# in the design (1 factor only here, sample)
# sPLS takes as an input 2 data sets, and the variables selected
design <- data.frame(sample = repeat.indiv)
res.spls.1level <- spls(X = liver.toxicity$gene,
Y=liver.toxicity$clinic,
multilevel = design,
ncomp = 2,
keepX = c(50, 50), keepY = c(5, 5),
mode = 'canonical')
stim.col <- c("darkblue", "purple", "green4","red3")
# showing only the Y variables, and only those selected in comp 1
cim(res.spls.1level, mapping="Y",
row.sideColors = stim.col[factor(liver.toxicity$treatment[,3])], comp = 1,
#setting up legend:
legend=list(legend = unique(liver.toxicity$treatment[,3]), col=stim.col,
title = "Dose", cex=0.9))
# showing only the X variables, for all selected on comp 1 and 2
cim(res.spls.1level, mapping="X",
row.sideColors = stim.col[factor(liver.toxicity$treatment[,3])],
#setting up legend:
legend=list(legend = unique(liver.toxicity$treatment[,3]), col=stim.col,
title = "Dose", cex=0.9))
# These are the cross correlations between the variables selected in X and Y.
# The similarity matrix is obtained as in our paper in Data Mining
cim(res.spls.1level, mapping="XY")
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.