A nice power-user-friendly utility for EmbedSOM (http://bioinfo.uochb.cas.cz/embedsom/).
First, install FlowSOM from Bioconductor, using the code snippet from here: https://www.bioconductor.org/packages/release/bioc/html/FlowSOM.html
Then you can install EmbedSOM and DiffSOM using devtools
:
devtools::install_github('exaexa/EmbedSOM')
devtools::install_github('exaexa/DiffSOM')
To start out, please see the documentation of the following functions. Using
the standard R help (?Load
) should give you sufficient examples to run the
analysis. The example below follows the same structure. The functions should be
run roughly in this order:
Load
for gathering the files from diskEmbed
for computing the embedding and clusteringPlot
for getting overview of the resultPlotDiff
for getting overview of sample differencesPlotClusters
for getting cleaner picture of clustersPlotExprs
for a view of actual expression values (NEW!)Gate
for selecting subpopulationsPlotSignificance
and GetProbs
for getting a bit of statistics about the subpopulationsMost of extra parameters (not yet documented) are named similarly as in FlowSOM or in R standard libraries; they are usually just forwarded to corresponding FlowSOM/R functions.
We demonstrate the functionality on data from study of Michelle Makiya (see https://www.ncbi.nlm.nih.gov/pubmed/29885264), downloadable from FlowRepository http://flowrepository.org/id/FR-FCM-ZYND.
First, download all the files en masse using Unix shell, and start R:
mkdir diffsom-test
cd diffsom-test
wget -r -np --content-disposition http://flowrepository.org/experiments/1773/download_ziped_files
find -iname '*.fcs' -exec mv {} . \;
rm -fr flowrepository.org/
R
We now want to load DiffSOM (assuming it is installed using devtools, as shown
above), gather the interesting files (we will compare Baseline_None
vs.
12min_VH
, 6 files from each) and load them to DiffSOM:
library(DiffSOM)
baseline_files <- dir(pattern='^Patient_.*_Baseline_None.fcs$')
vh_files <- dir(pattern='^Patient_.*_120min_VH.fcs$')
ds <- Load(c(baseline_files, vh_files),
transform=T, toTransform=c(7:11),
cells=300000)
We have reduced the cell count to 300k to speed up the preliminary analysis, final analysis may be run on much more cells (the slowdown caused by increased cell count is deterministic and linear, i.e., using 1 million of cells will make all computations roughly 3.3x slower).
Let's run the embedding and get some pictures about what the result looks like. We additionally fix the marker names to something tangible:
# optional: fix the column names in the loaded FlowSOM object
ds$fs$prettyColnames[7:11] <- c('CCR1', 'SIGLEC8', 'CXCR4', 'CD45', 'CCR3')
colnames(ds$fs$data)[7:11] <- c('CCR1', 'SIGLEC8', 'CXCR4', 'CD45', 'CCR3')
# embedding (put extra importance on CD45 marker)
ds <- Embed(ds, colsToUse=c(1,4,7:11),
xdim=16, ydim=16,
importance=c(1,1,1,1,1,2,1))
# verifying that the data is OK
PlotExprs(ds, files=T)
PlotExprs(ds, clusters=T)
#plotting
png('out1.png', 900, 900)
Plot(ds)
dev.off()
The Embed
function does the main part of the computation. In this case it
takes around 15 seconds on a modest laptop.
This produces the following nice image.
ExportData
function that can be easily used to create a data frame with all
relevant information. That is pretty useful in many situations; most notably,
it integrates well with ggplot
and allows easy export to various formats.
ExportData(ds)
Additionally to the marker expressions from original files, the cells in the
exported data frame also have File
parameter with the number of the file
where each cell originated from, dimensions EmbedSOM1
and EmbedSOM2
with
the embedding, dimensions Precluster1
and Precluster2
that describe the
mapping of the cells to the SOM (also, there's a slightly redundant
Precluster
with the raw SOM vertex ID), and, finally, the Metacluster
column with the assigned metacluster ID.
Note: Previously, there was a function ExportFCS
function for saving the
contents of your DiffSOM object (ds
, or any other, like ds2
produced in the
next step) to FCS format. That was removed because saving stuff to FCS is too
error-prone to be simplified to a single function (moreover, the FCS standard
specification is so unhelpful that I have exactly zero idea about how to write
the file correctly). You can still get the original functionality using this
code:
flowCore::write.FCS('exported.fcs',
x=new('flowFrame',
exprs=as.matrix(ExportData(ds))))
Let us pretend that we are interested in the CD45-negative part of the cells,
which is formed by clusters 9 and 10 in this case. Following code first shows a
better picture of the clusters, so that you can verify whether 9 and 10 are
really the desired clusters, then it gates the clusters out and saves the
subpopulation to DiffSOM object ds2
:
PlotClusters(ds, alpha=.3)
# check out the cluster images
ds2 <- Gate(ds, clusters=c(9,10))
ds2 <- Embed(ds2, xdim=16, ydim=16, colsToUse=c(1,4,7,8,9,11))
png('out2.png', 1200, 600)
Plot(ds2)
dev.off()
Now let's see how this population differs among the samples. Also, we use larger points for plotting to make the relatively low cell number more visible:
png('out2d.png', 1200, 900)
PlotDiff(ds2, pch=20, cex=0.3)
dev.off()
This doesn't give a very good view of the groups, so let's instead use p-value plots to detect the relevant changes in whole metaclusters and in FlowSOM pre-clusters:
png('out3.png', 600, 300)
par(mar=c(0,0,0,0), mfrow=c(1,2))
PlotSignificance(ds2, control=c(1:6), experiment=c(7:12), paired=T)
PlotSignificance(ds2, control=c(1:6), experiment=c(7:12), paired=T, meta=F)
dev.off()
Orange colors in the plot show that the p-values hint at significant increase of
the cell count in the population in experiment group, blue shows that p-values
hint at significant decrease. You can also use paired=F
if your samples do
not correspond one-to-one.
Colors are only informative; but rigorous statistical testing can be run on a
table of relative cell abundances (a.k.a. probabilities) obtained using
GetProbs(ds2)$probs
:
[,1] [,2] [,3] [,4] [,5] ...
[1,] 0.0010019036 0.2774271 0.0016030458 0.0032060916 0.033663962 ...
[2,] 0.0023727351 0.4236411 0.0002157032 0.0002157032 0.008412425 ...
[3,] 0.0008337502 0.4305486 0.0006670002 0.0010005003 0.020010005 ...
[4,] 0.0032122560 0.2940450 0.0053125772 0.0013590314 0.040400297 ...
[5,] 0.0005069158 0.2591788 0.0010138316 0.0017379970 0.014338475 ...
[6,] 0.0018617500 0.3176986 0.0028827097 0.0016815807 0.017536484 ...
[7,] 0.0010257287 0.3289170 0.0003419096 0.0022224122 0.007949397 ...
[8,] 0.0006872852 0.4828866 0.0032989691 0.0004123711 0.012371134 ...
[9,] 0.0011308728 0.4344608 0.0018505192 0.0008224530 0.016654673 ...
[10,] 0.0059383892 0.3122603 0.0013608809 0.0007422987 0.006680688 ...
[11,] 0.0005499878 0.2864214 0.0023832804 0.0011610853 0.023588365 ...
[12,] 0.0017746772 0.3596475 0.0052628358 0.0013463068 0.020622973 ...
Each line in the matrix corresponds to one file from the input, each row
corresponds to one metacluster (or pre-cluster if meta=F
). Testing can be
then run manually using any favorite testing procedure. Note that t.test
is
provided only as an example -- wilcox.test
will probably be better for this
general use-case, and exact test to choose highly depends on the experiment
type. We examine the combined contents of clusters 2 and 5 to get the exact
test results and p-values:
ps <- GetProbs(ds2)$probs
t.test(ps[7:12,2]+ps[7:12,5],
ps[1:6,2]+ps[1:6,5],
alternative='greater', paired=T)
THIS IS A PROOF OF CONCEPT that only exists for research and demonstration purposes. Please do not rely on consistency of DiffSOM, availability of updates or long-term existence of this repository. If you find DiffSOM useful and want to use it for anything serious, please maintain a separate forked version to avoid conflicts with possible breaking changes that I will need to add to this repository.
A more sophisticated, fast and clickable software for running this kind of analyses will be made available as soon as possible.
As usual, you are welcome to report any encountered issues or possible ideas for improvement using the GitHub issue tracker.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.