knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
The goal of cytotypr is to identify flow cytometry cell populations efficiently using either Fluorescent Minus One controls (FMOs) or distinct population differences.
You can install the development version of cytotypr from GitHub with:
# install.packages("devtools") devtools::install_github("aef1004/cytotypr")
This is a basic example which shows you how to obtain basic results and plots for flow cytometry data using FMOs:
library(cytotypr) ## basic example code
There are a few additional packages that will need to be downloaded from CRAN:
if (!require("data.table")) install.packages("data.table") if (!require("dplyr")) install.packages("dplyr") if (!require("stringr")) install.packages("stringr") if (!require("tidyr")) install.packages("tidyr") if (!require("tibble")) install.packages("tibble") if (!require("readxl")) install.packages("readxl") if (!require("ggplot2")) install.packages("ggplot2") if (!require("broom")) install.packages("broom") if (!require("purrr")) install.packages("purrr") if (!require("pheatmap")) install.packages("pheatmap")
Other packages will need to be downloaded from bioconductor:
if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") if (!require("openCyto")) BiocManager::install("openCyto") if (!require("ncdfFlow")) BiocManager::install("ncdfFlow") if (!require("flowWorkspace")) BiocManager::install("flowWorkspace") if (!require("ggcyto")) BiocManager::install("ggcyto")
Load additional packages needed to run this example
library(openCyto) library(ncdfFlow) library(flowWorkspace) library(ggcyto) library(data.table) library(dplyr) library(stringr) library(tidyr) library(tibble) library(ggplot2) library(readxl) library(broom) library(purrr) library(pheatmap)
The data that is being used in this example is from the paper "Cyto-feature engineering...." I have taken a few rows for analysis of one of the FMOs and one of the samples. The samples are from the lungs of C57BL/6 mice that were either sham-vaccinated or vaccinated with an Mycobacterium tuberculosis vaccine, Bacillus Calmette–Guérin (BCG).
The first thing that you need to do is develop an intial gating strategy. This includes typical data cleaning normally performed on every flow cytometry sample. A typical gating strategy will first gate on "singlets" or single cells, then "lymphocytes" to remove debris and then only the live cells. Information on developing a .csv with different gating strategies can be found here: http://opencyto.org/articles/HowToWriteCSVTemplate.html
Briefly, the "alias" column is what you will call each of the gating populations, for example, when gating on singlets, I'll probably want to give the alias of "singlets". The "pop" column takes in either a "+" or "-". When you want to take the positive cells, or the cells on the right side (or within) a the gate, you add a "+" here. The "parent" column lists the name of the cells that you want to gate. For example, the first parent will be "root" because because we're gating on all of the available cells. If we gate our root cells to look at our siglets, then our next gate will use "singlets" as the parent. The "dims" is the name of the flow cytometry marker name listed in the data that you want to gate on. For example, when gating on singlets, we look at SSC-A and SSC-H, so our dims will be "SSC-A,SSC-H" when gating on CD4 cells, we will write "CD4." Note that the "dims" name must match exactly do the colunn names in your data. Finally include the name of the gating method that you want to use. Different gating options can be found here: http://opencyto.org/articles/HowToAutoGating.html.
Read in the initial gating strategy, which gates to live lymphocytes, into a gatingTemplate
class.
# Identify the file with the gating strategy ws <- list.files("../inst/extdata/", pattern = "gating_strategy.csv", full = TRUE) ws
# View this template dtTemplate <- fread(ws) dtTemplate
# Read in the gating strategy to a 'gatingTemplate' object initial_gate <- gatingTemplate(ws) plot(initial_gate)
# Identify the file names of all 20 FCS flow cytometry experiments to # read in. FMO_fcsFiles <- list.files("../inst/extdata/simple_example/", pattern = "FMO", full = TRUE) FMO_fcsFiles
# Read these files into a 'ncdfFlowSet' object. This will taken a minute # or so to run. The resulting 'ncdfFlowset' object contains row names with the # individual samples and column names with the markers/parameters used in the flow cytometer. ncfs_FMO <- read.ncdfFlowSet(FMO_fcsFiles, channel_alias = data.frame(alias = c("Zombie Nir-A"), channels = c("Zombie_NIR-A, Zombie NIR-A, Zombie Nir-A"))) ncfs_FMO
Apply the initial gating to filter the data to only measurements on live lymphocyte cells.
# Convert to a 'GatingSet' object, so we can apply the initial gating to this data. gs_FMO <- GatingSet(ncfs_FMO) gs_FMO
# Apply the initial gating to this data, to filter to only measurements # on live lymphocyte cells. This may take a minute. gt_gating(initial_gate, gs_FMO) # You can plot the results of this gating with `autoplot`. For example, to plot # the gating for the first sample, run: ggcyto::autoplot(gs_FMO[[1]])
Now that the initial gating has been applied, to limit the data to measurements oflive, singlet lymphocyte cells, we convert the data to a "tidy data" format, to allow us to work with "tidyverse" tools for further analysis and visualization.
# Pull out the data from the 'live' node of the gating set (the last node # in the initial gating strategy). flowset_FMO_gated_data <- gs_pop_get_data(gs_FMO, "live") %>% cytoset_to_flowSet()
Apply the tidy_flow_set
function to the 'flowSet' of gated FMO data to output a dataframe:
FMO_gated_data <- tidy_flow_set(flowset_FMO_gated_data)
Note that when you plot the FMOs here, you should see all of the FMOs that you want to use. If you don't see all of them, ensure that all ofyour filenames and column names for each of the markers matches exactly. For example, if the filename says "CD103_f" but the corresponding column name for that marker is "CD103", you need to either change the filename or column name so that they are exactly the same.
# note that here the filename and the column marker names need to match exactly df_FMO_gated_data <- FMO_gated_data %>% select(ends_with("-A"), -`FSC-A`, `SSC-A`, filename) %>% dplyr::rename(`CD3` = "Alexa Fluor 532-A", `CD4` = "BV480-A", `CD8` = "BV570-A") %>% na.omit()%>% dplyr::filter(`SSC-A` != max(`SSC-A`)) %>% mutate(filename = str_replace(filename, "_FMO.fcs", "")) FMO_filtered_data <- filter_FMO(df_FMO_gated_data) add_quantile <- get_99(FMO_filtered_data) plot_FMOs(FMO_filtered_data, add_quantile)
I checked that all D30, D60, and D90 samples have Zombie and will run all the way through the gating when separated by day. Looking at the data, some of them are labeled Zombie NIR-A and Zombie Nir-A which means that they are not recognized as the same. - D90 has uncapitalized. I will need to read them in separately and then rename one of the two so that the cases match
fcsFiles <- list.files("../inst/extdata/simple_example", pattern = "sample", full = TRUE) # ncdfFlowset object contains row names with the individual samples and column names with the markers/parameters used in the flow cytometer ncfs <- read.ncdfFlowSet(fcsFiles, channel_alias = data.frame(alias = c("Zombie Nir-A"), channels = c("Zombie_NIR-A, Zombie NIR-A, Zombie Nir-A"))) # apply gating set gs <- GatingSet(ncfs) # gate the samples gt_gating(initial_gate, gs) # View the gates autoplot(gs[[1]])
Pull out the gated data
Want to change gated_flowset to flowset_gated_data
# Pull out the gated data - could potentially add to the function below flowset_gated_data <- gs_pop_get_data(gs, "live") %>% cytoset_to_flowSet() # tidy the flowset and convert to a dataframe df_all_gated_simple <- tidy_flow_set(flowset_gated_data) unique(df_all_gated_simple$filename)
Feature cut the data to get all of the possible populations for each file with the cell count and number of cells. Feature engineering is based on the 99%.
The "count_calc" function at the end calculates the cell counts and percentage of cells in each sample for each population. The dataframe that is input into this function should only contain the markers that you're interesting in looking at, and should remove SSC-A, FSC-A, etc.
all_fe <- df_all_gated_simple %>% select(ends_with("-A"), -`FSC-A`, filename) %>% dplyr::rename(`CD3` = "Alexa Fluor 532-A", `CD4` = "BV480-A", `CD8` = "BV570-A") %>% dplyr::filter(`SSC-A` != max(`SSC-A`)) %>% mutate(CD3 = fe(add_quantile, CD3, "CD3"), CD4 = fe(add_quantile, CD4, "CD4"), CD8 = fe(add_quantile, CD8, "CD8")) %>% select(-`Zombie Nir-A`, -`SSC-A`) %>% count_calc()
Visulatizations
Initial identification of populations plot
We first want to view all of the different cell phenotypes within the data
# this is the order of markers that we want for all of our plots order_of_markers <- c("CD3", "CD4", "CD8") # to view all of the possible combinations total_phenotypes <- filter_for_total_pheno(all_fe, marker_order = order_of_markers) heatmap_all_pheno(total_phenotypes) # gives the total number of populations nrow(total_phenotypes)
After identifying all phenotypes, we can filter the data to see the ones that we're interested in, for example, CD3+ cells that constitute >0.5% of total live leukocytes in a sample.
# view the specific cell phenotypes we're interested in sample_populations <- all_fe %>% dplyr::filter(CD3 == 1 & percentage > 1) %>% filter_pops() sample_populations_all_groups <- identified_pop_perc(sample_populations, all_fe, marker_vector = order_of_markers)
Plot sample populations
simple_pop_df <- sample_populations %>% column_to_rownames("population") simple_pop_df %>% dplyr::select(all_of(order_of_markers)) %>% mutate_all(~convert_factor_numeric(.)) %>% pheatmap::pheatmap(cluster_rows = FALSE, cluster_cols = FALSE, labels_row = rownames(simple_pop_df), cellwidth = 15, cellheight = 15, angle_col = 45, color = c("#3F4788FF", "#56C667FF"), cutree_rows = 2, legend = FALSE)
Plot percentages
sample_populations_all_groups %>% ggplot(aes(x = filename, y = percentage, fill = population)) + geom_bar(stat = "identity") + facet_wrap(~population) + ylab("Average Percent of Total Live Leukocytes")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.