ShinySOM is a Shiny application for quick, highly interactive exploration and dissection of multidimensional flow cytometry data. The analysis model is derived from FlowSOM: The user inputs prepared samples in FCS files, uses SOMs and related analysis algorithms to dissect the data, and runs various analyses and clustering algorithms on the result.
This tutorial assumes that you have ShinySOM and all dependencies installed and running (as described in the README). If not, please make sure ShinySOM can be run, or ask your local technical staff for help.
General ShinySOM workflow is as such:
In this tutorial, we demonstrate this workflow on a well-explored dataset used for monitoring immune cells in spleen (Sayes et al., 2016). The dataset can be obtained from FlowRepository under accession number ZZQY.
ShinySOM is a server application and operates its own storage of intermediate files and datasets. Data and dataset management is controlled from the top bar:
The buttons from left to right, allow:
After downloading the data from FlowRepository online, we first want to upload them to the server. After clicking the Upload/download data button, we can choose the 4 FCS files from the dataset to upload using the Browser button:
When the upload is complete, we continue by aggregating the files into a dataset. After clicking the Manage datasets button, an interface for creating the datasets appear:
In the interface, you will be able to
Time
)After a short while, the newly created dataset should appear in the list in the top bar, from where we can opened it using the Load button:
Opened datasets can be explored in the usual 2-dimensional scatterplots; these are available in the Data overview tab. We can choose a set of parameters for horizontal and vertical axes, and a color scheme for the points different from the default density:
The screenshot shows a combination of forward and side scatter plots, colored by side-scatter signal width. (This clearly identifies e.g. the doublet clusters.)
We note that all expression colors in ShinySOM are plotted using the relatively standard ColorBrewer's RdYlBu palette, where blue shades represent "negative", gray-yellow "average" and red shades "positive" expression of the marker. The color scaling is based on the virtual normal distribution, which produces good results in a wide spectrum of use-cases. See EmbedSOM source code for details.
Among other, the plot with CD11b marker clearly shows that the data still need to be transformed, which is the next step in the workflow.
Transformation editor is available under the tab Transform&Scale. Individual data dimensions (ie. cell parameters) can be selected and subjected to some of the common transformation methods:
In the screenshot, we have
Finally, the Apply button is pressed (see 5 in the screenshot) to save the transformation in the dataset.
Other transforms can be applied as well, e.g. the ArcSinh transform is viable for mass-cytometry samples.
Population identification (or "gating") is carried out in two steps, using FlowSOM to train a self-organizing map of the data, and then clustering on the resulting map. Additionally, ShinySOM uses the EmbedSOM view to show a quickly apprehensible 2-dimensional "guiding" picture for the dataset.
Here, we will first focus on "cleaning", i.e. removing the doublets, cell debris, dead cells and various other events, in order to create a dataset with only live singlets. (We will further dissect the individual cell types later.)
SOM training and embedding is done in the Embedding tab, as shown on the screenshot:
The performed actions are as follows:
(Alternatively, we could have chosen to cluster the populations using all available markers at once. Although that would work correctly (and the detailed cell populations would be identifiable right away), for the purpose of this tutorial we split the dissection into 2 separate steps, in order to demonstrate the functionality of creating cell subsets.)
Cell population are selected in a dendrogram that is built upon the FlowSOM clustering. This allows quick identification of large datasets of interest, and provides a relatively effortless way for separating the data in multiple dimensions at once. Additionally, gating bias is reduced, since the dendrogram is precomputed to separate the populations well in the multidimensional space, and diverging from it requires additional user effort.
The populations are selected in a shinyDendro-based interface:
a
to z
and 0
to 9
); cluster key represents an unique assigned classification, which is, put more simply, the "cluster color". Clicking in the dendrogram causes the clicked branch of the tree to be painted by this chosen color. Resulting classification is immediately visible in the embedding on the right. Additionally, the embedding plot can be used for brushing -- drawing a rectangle around cells in the view highlights the selected cells in the dendrogram.For demonstration, we will extract the identified live singlet cells and save them in a separate dataset using the Dissection tab:
The "Spleen-clean" dataset will appear in the top bar, next to the original "Spleen" dataset.
After reducing the dataset, we have embedded and dissected it again, to get a complete view of the subpopulations:
Compared to the processing of the original dataset, there are three main changes:
The screenshot shows dissection of the populations into B cells, T cells, and several other cell types. (Clusters of macrophages, dendritic cells and neutrophils were selected by brushing in the embedded cells, which highlighted their corresponding data in the dendrogram interface.)
ShinySOM offers several useful analyses for getting a good overview of the contents of the selected populations and the differences between individual files. These are available as sub-tabs in the Analysis tab:
Tab Side-by-side comparison allows seeing the difference between two different file groups in the embedding, giving a quick visual comparison of presence of various cell populations.
Tab Difference testing improves the previous view by precisely expressing the significance of the population size differences by coloring based on statistical testing results. P-values from testing the cluster sizes from "control" and "experiment" group for one-sided inequality are used as a basis the coloring. Significance plots are designed for detection of small statistically significant differences in size of the populations. In our example, the significance plot confirms the findings from the heatmap. Because the statistical significance of the difference is relatively low in this dataset (p-value is greater 0.15 for both B and T cell clusters), the p-value slider needs to be adjusted in order to see the coloring:
Tab Export data provides a way to export the generated data for external programs:
There are various limitations that prevent efficient interactive work with large datasets. Those consist mostly of necessary delays while processing large data -- although ShinySOM tries to do the best with large datasets, timing and resource usage of the involved algorithms will always cross the "bearable" limit for interactive usage if the datasets grow enough.
One possible alleviation of this problem is to downscale the datasets. Because ShinySOM is designed to cope well with a few millions of loaded cells, the downsampling is not even required for many datasets (at least not for common experiments); and creates only a minor statistical loss even in larger experiments. Despite of that, downsampled datasets should not be used for obtaining any final results.
Batch API of ShinySOM is designed for alleviating this problem: You can prepare the analysis (create a "gating scheme") in the interactive interface using a slightly reduced cell sample, export the analysis, and automatically apply it to any number of incoming FCS files.
The dataset objects exported from the ShinySOM Export data tab are useful both as sources of actual cell data, and as sources of metadata about analysis. The analysis can be reduced to metadata, but still carries full information necessary for reproducing the workflow -- the reduced datasets are obtained using Export analysis RDS button. Main advantage of exporting just the metadata is the size of resulting data, which is reduced to several kilobytes.
For demonstration, we have exported the analysis files from both datasets we have created before, using the Export analysis RDS button to create files step1.shinysom
and step2.shinysom
. After that, we have transferred these files to the batch processing environment.
The dataset files are formatted as standard RDS and can be directly loaded in R:
step1 <- readRDS('step1.shinysom')
step2 <- readRDS('step2.shinysom')
The loaded structures contain some interesting data about the analysis, e.g. the list of all transformations applied to the data is available in step1$transforms
and the FlowSOM-compatible map object for the final data can be obtained as step2$map
. While this may be already useful, ShinySOM provides own functions that simplify the batch processing.
LoadCells
creates dataset objects from FCS files, in a manner
similar to AggregateFlowFrames
from FlowSOM.Process
applies the stored analysis to a data file.ExportDF
, ExportFlowFrame
and PopulationSizes
can be used to
export data and statistics from the processed datasets (in a similar manner
as in Export data tab),Dissect
can be used to reduce the datasets to annotated subsets,
just like in Dissection tab. To help that purpose, function
PopulationKeys
returns all annotated subsets available in the dataset.Finally, batch processing the original, non-subsampled datasets is done by reading the whole FCS files using LoadCells
functions, applying the analysis and dissection steps using the Process
and Dissect
functions.
First, we read the analysis objects and full 2 million cells of FCS contents:
library(ShinySOM)
step1 <- readRDS('step1.shinysom')
step2 <- readRDS('step2.shinysom')
dataset <- LoadCells(
c('21-10-15_Tube_028.fcs',
'21-10-15_Tube_030.fcs',
'21-10-15_Tube_031.fcs',
'21-10-15_Tube_032.fcs'))
After the cells are loaded, we can apply the analysis and look at the result:
dataset <- Process(dataset, step1)
PopulationKeys(dataset)
After the processing (mapping and embedding) is done, the later command should print out the keys for the available populations:
2 b d l
"Doublets" "Debris" "Dead cells" "Single cells"
From that, we want to reduce the dataset to the subset marker with l
key, which contains the live singlets:
dataset <- Dissect(dataset, c('l'))
(Note that more keys can be specified.)
The reduced dataset is ready for being processed by the second step:
dataset <- Process(dataset, step2)
After the results are ready, we can obtain the population statistics using e.g. PopulationSizes(dataset)
:
Annotation
File B cells Dendritic cells Macrophages Neutrophils NK cells NK T cells T cells <NA>
21-10-15_Tube_028 207493 11006 10279 3225 7177 1425 78099 7778
21-10-15_Tube_030 173869 8783 11310 3368 3706 1246 92268 6548
21-10-15_Tube_031 198724 9642 7469 3025 6366 1472 107295 6879
21-10-15_Tube_032 218421 9904 11307 3017 7789 1673 75288 5373
The column marked <NA>
represents the cells that were not assigned any annotation, ie. those that were left out as "gray" in the clustering interface.
For various purposes, it may also be beneficial to export the data. ExportDF(dataset)
exports a large data frame that contains a lot of information about the dataset:
> ExportDF(dataset)[1:5,]
FSC-A FSC-H FSC-W SSC-A SSC-H SSC-W FITC-A
1 104919.57 91624 75045.94 45042.46 41000 71997.62 47.96475
2 104539.68 72977 93880.44 129815.17 104981 81039.11 145.31468
3 92914.92 82279 74007.61 23377.52 20195 75863.80 -49.92402
4 21442.05 20042 70114.07 36617.53 34942 68678.57 174.06121
5 79960.23 74381 70451.77 34708.58 33134 68650.38 34.66024
MHCII#PerCp-Cy5-5 (PerCP-Cy5-5-A) CD49b#eFluor450 (Pacific Blue-A) AmCyan-A
1 2.879356 1.1410280 1.204577
2 3.092070 1.3442259 1.484064
3 1.249903 1.1925033 1.100682
4 2.155605 0.5509763 1.563597
5 1.115314 1.1847186 1.047064
CD11b#BV605 (BV605-A) CD64#BV711 (BV711-A) FcERI#BV786 (BV786-A)
1 2.4137815 1.3216869 0.9004143
2 1.5304053 1.1014624 1.1421245
3 0.7284499 1.0002436 0.8207423
4 1.7426230 2.0427317 2.2640543
5 1.2014615 0.9959151 0.9562646
CD161#APC (APC-A) Ly-6G#AF700 (Alexa Fluor 700-A) L/D#eFluor780 (APC-Cy7-A)
1 1.0150891 1.1384442 0.9560442
2 1.0533812 0.9488114 1.9926031
3 0.8836237 0.8330915 1.1454782
4 0.9893947 0.9126866 0.9558701
5 0.8401625 0.7957797 1.0021014
CD3#PE (PE-A) CD19#PE-Cy5 (PE-Cy5-A) CD11c#PE-Cy7 (PE-Cy7-A) Time CellFile
1 1.6601674 2.0630345 3.456712 1254.5 1
2 1.7487420 1.5789861 3.840369 5520.4 1
3 0.7130027 2.7802491 1.011603 1846.4 1
4 1.2357820 0.7009728 1.572579 2042.0 1
5 2.8881256 0.6032841 1.165158 287.9 1
EmbedSOM1 EmbedSOM2 SOM1 SOM2 ClusterKey Population
1 8.383144 22.866970 9 24 2 Dendritic cells
2 8.215800 21.121189 8 21 2 Dendritic cells
3 19.121624 9.981588 20 9 1 B cells
4 3.686083 22.917351 4 24 5 Macrophages
5 19.597309 18.423843 20 19 7 T cells
Similarly, one can save this data to an FCS file and analyze it in different software:
flowCore::write.FCS(ExportFlowFrame(dataset), "exported_data.fcs")
Raw data available in the dataset can be used for plotting high-quality graphics suitable for publishing. Importantly, the data frame obtained from ExportDF
can be directly used in ggplot2
:
library(ggplot2)
ggsave('plot.png', units='in', width=4, height=4,
ggplot(ExportDF(dataset)) +
geom_point(size=.1, shape=16, alpha=.1,
aes(EmbedSOM1, EmbedSOM2, color=`MHCII#PerCp-Cy5-5 (PerCP-Cy5-5-A)`)) +
EmbedSOM::ExpressionGradient(guide=F) +
xlim(-3,26) + ylim(-3,26) +
ggtitle("MHCII expression") +
theme_classic()
)
Above code may produce the following plot:
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.