knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
Thank you for your interest in the Infinity Flow approach. This vignette describes how to apply the package to your massively parallel cytometry (e.g. LEGENDScreen or Lyoplates kits) experiment. Massively parallel cytometry experiments are cytometry experiments where a sample is aliquoted in n subsamples, each stained with a fixed panel of "Backbone" antibodies. Each aliquot is in addition stained with a unique "Infinity" exploratory antibody. The goal of the infinityFlow package is to use information from the ubiquitous Backbone staining to predict the expression of sparsely-measured Infinity antibodies across the entire dataset. To learn more about this type of experiments and details about the Infinity Flow approach, please consult Becht et al, 2020. In this vignette we achieve this by using the XGBoost machine-learning framework implemented in the xgboost R package. This vignette aims at explaining how to apply a basic infinityFlow analysis. Advanced usages, including different machine learning models and custom hyperparameters values, are covered in a dedicated vignette.
This vignette will cover:
You can install the package from Bioconductor using
if (!requireNamespace("BiocManager", quietly = TRUE)){ install.packages("BiocManager") } BiocManager::install("infinityFlow")
Now that the package is installed, we load the package and attach its example data. We also load flowCore that will be used to manipulate FCS files in R.
library(infinityFlow) data(steady_state_lung)
The example data is a subset of a massively parallel cytometry experiment of the mouse lung at steady state. The example data contains 10 FCS files. To mimick real world-conditions, we will write this set of FCS files to disk. In this vignette we will use a temporary directory, but you can use the directory of your choice.
dir <- file.path(tempdir(), "infinity_flow_example") print(dir) input_dir <- file.path(dir, "fcs") write.flowSet(steady_state_lung, outdir = input_dir) ## Omit this if you already have FCS files list.files(input_dir)
The second input we have to manually produce is the annotation of the experiment. In the context of a massively parallel cytometry experiment, we need to know what is the protein target of each Infinity (usually PE-conjugated or APC-conjugated) antibody, and what is its isotype. For the example dataset, the annotation is provided in the package and looks like this:
data(steady_state_lung_annotation) print(steady_state_lung_annotation)
The steady_state_lung_annotation
data.frame contains one line per FCS file, with rownames(steady_state_lung_annotation) == sampleNames(steady_state_lung)
. The first column specifies the proteins targeted by the Infinity antibody in each FCS file, and the second column specifies its isotype (species and constant region of the antibody). If you load an annotation file from disk, use this command:
steady_state_lung_annotation = read.csv("path/to/targets/and/isotypes/annotation/file", row.names = 1, stringsAsFactors = FALSE)
That is all we need in terms of inputs! To recap you only need
Now that we have our input data, we need to specify which antibodies are part of the Backbone, and which one is the Infinity antibody. We provide an interactive function to specify this directly in R. This function can be run once, its output saved for future use by downstream functions in the infinityFlow package. The select_backbone_and_exploratory_markers
function will parse an FCS file in the input directory, and for each acquisition channel, ask the user whether it should be used as a predictor (Backbone), exploratory target (Infinity antibodies), or omitted (e.g. Time or Event ID columns...).
backbone_specification <- select_backbone_and_exploratory_markers(list.files(input_dir, pattern = ".fcs", full.names = TRUE))
Below is an example of the interactive execution of the select_backbone_and_exploratory_markers
function for the example data. The resulting data.frame is printed too.
For each data channel, enter either: backbone, exploratory or discard (can be abbreviated) FSC-A (FSC-A):discard FSC-H (FSC-H):backbone FSC-W (FSC-W):b SSC-A (SSC-A):d SSC-H (SSC-H):b SSC-W (SSC-W):b CD69-CD301b (FJComp-APC-A):b Zombie (FJComp-APC-eFlour780-A):b MHCII (FJComp-Alexa Fluor 700-A):b CD4 (FJComp-BUV395-A):b CD44 (FJComp-BUV737-A):b CD8 (FJComp-BV421-A):b CD11c (FJComp-BV510-A):b CD11b (FJComp-BV605-A):b F480 (FJComp-BV650-A):b Ly6C (FJComp-BV711-A):b Lineage (FJComp-BV786-A):b CD45a488 (FJComp-GFP-A):b Legend (FJComp-PE(yg)-A):exploratory CD24 (FJComp-PE-Cy7(yg)-A):b CD103 (FJComp-PerCP-Cy5-5-A):b Time (Time):d name desc type $P1 FSC-A <NA> discard $P2 FSC-H <NA> backbone $P3 FSC-W <NA> backbone $P4 SSC-A <NA> discard $P5 SSC-H <NA> backbone $P6 SSC-W <NA> backbone $P7 FJComp-APC-A CD69-CD301b backbone $P8 FJComp-APC-eFlour780-A Zombie backbone $P9 FJComp-Alexa Fluor 700-A MHCII backbone $P10 FJComp-BUV395-A CD4 backbone $P11 FJComp-BUV737-A CD44 backbone $P12 FJComp-BV421-A CD8 backbone $P13 FJComp-BV510-A CD11c backbone $P14 FJComp-BV605-A CD11b backbone $P15 FJComp-BV650-A F480 backbone $P16 FJComp-BV711-A Ly6C backbone $P17 FJComp-BV786-A Lineage backbone $P18 FJComp-GFP-A CD45a488 backbone $P19 FJComp-PE(yg)-A Legend exploratory $P20 FJComp-PE-Cy7(yg)-A CD24 backbone $P21 FJComp-PerCP-Cy5-5-A CD103 backbone $P22 Time <NA> discard Is selection correct? (yes/no): yes
We cannot run this function interactively from this vignette, so we load the result from the package instead:
data(steady_state_lung_backbone_specification) print(head(steady_state_lung_backbone_specification))
You need to save this backbone specification file as a CSV file for future use.
write.csv(steady_state_lung_backbone_specification, file = file.path(dir, "backbone_selection_file.csv"), row.names = FALSE)
Now that we have our input data, FCS files annotation and specification of the Backbone and Infinity antibodies, we have everything we need to run the pipeline.
All the pipeline is packaged into a single function, infinity_flow()
.
Here is a description of the basic arguments it requires:
We have everything we need in our input folder to fill these arguments:
list.files(dir)
First, input FCS files:
path_to_fcs <- file.path(dir, "fcs") head(list.files(path_to_fcs, pattern = ".fcs"))
Output directory. It will be created if it doesn't already exist
path_to_output <- file.path(dir, "output") ```` Backbone selection file: ```r list.files(dir) backbone_selection_file <- file.path(dir, "backbone_selection_file.csv") head(read.csv(backbone_selection_file))
Annotation of Infinity antibody targets and isotypes:
targets <- steady_state_lung_annotation$Infinity_target names(targets) <- rownames(steady_state_lung_annotation) isotypes <- steady_state_lung_annotation$Infinity_isotype names(isotypes) <- rownames(steady_state_lung_annotation) head(targets) head(isotypes)
Other arguments are optional, but it is notably worth considering the number of input cells and the number of output cells. This will notably be important if you are using a computer with limited RAM. For the example data it does not matter as we only have access to 2,000 cells per well, but if you run the pipeline on your own data I suggest you start by low values, and ramp up (to 20,000 or 50,000 input cells, and e.g. 10,000 output cells per well) once everything is setup. Another optional argument is cores
which controls multicore computing, which can speed up execution at the cost of memory usage. In this vignette we use cores = 1, but you probably want to increase this to 4 or 8 or more if your computer can accomodate it.
input_events_downsampling <- 1000 prediction_events_downsampling <- 500 cores = 1L
There is also an argument to store temporary files, which can be useful to further analyze the data in R. If missing, this argument will default to a temporary directory.
path_to_intermediary_results <- file.path(dir, "tmp")
At last, now let us execute the pipeline:
imputed_data <- infinity_flow( path_to_fcs = path_to_fcs, path_to_output = path_to_output, path_to_intermediary_results = path_to_intermediary_results, backbone_selection_file = backbone_selection_file, annotation = targets, isotype = isotypes, input_events_downsampling = input_events_downsampling, prediction_events_downsampling = prediction_events_downsampling, verbose = TRUE, cores = cores )
The above command populated our output directory with new sets of files which we describe in the next section.
The output mainly consists of
Each of the above comes in two flavours, either raw or background-corrected.
At the end of the pipeline, input FCS files are augmented with imputed data. Feel free to explore these files in whatever flow cytometry software you are comfortable with! They should look pretty much like regular FCS files, although they are computationnally derived. You can find the output FCS files in the path_to_output
directory, specifically:
head(list.files(path_to_fcs)) ## Input files fcs_raw <- file.path(path_to_output, "FCS", "split") head(list.files(fcs_raw)) ## Raw output FCS files fcs_bgc <- file.path(path_to_output, "FCS_background_corrected", "split") ## Background-corrected output FCS files head(list.files(fcs_bgc)) ## Background-corrected output FCS files
Finally, the pipeline produces two PDF files, with a UMAP embedding of the backbone data color-coded by the imputed data. This is a very informative output and a good way to start analyzing your data. These files are present in the path_to_output
directory. The example dataset is very small but feel free to look at the result for illustration purposes. This PDF is available at
file.path(path_to_output, "umap_plot_annotated.pdf") ## Raw plot file.path(path_to_output, "umap_plot_annotated_backgroundcorrected.pdf") ## Background-corrected plot
knitr::include_graphics(file.path(path_to_output, "umap_plot_annotated.pdf"))
Thank you for following this vignette, I hope you made it through the end without too much headache and that it was informative. General questions about proper usage of the package are best asked on the Bioconductor support site to maximize visibility for future users. If you encounter bugs, feel free to raise an issue on infinityFlow's github.
sessionInfo()
files = list.files(dir, recursive = TRUE) sapply( files, function(x){ file.copy(from = file.path(dir, x), to = "~/Desktop/test/", recursive = TRUE) } )
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.