README.md

premessa

premessa is an R package for pre-processing of flow and mass cytometry data, that includes panel editing/renaming for FCS files, bead-based normalization and debarcoding.

Copyright 2016. Parker Institute for Cancer Immunotherapy

---> Make sure to have a backup copy of your data before you use the software! <---

New in version 0.3.0: - Added UI for file concatenation under the normalizer GUI - Much faster debarcoding. Note that for the purpose of debarcoder plotting, data will now be downsampled to 100000 events. This means that absolute cell numbers in the plots will not reflect the absolute cell numbers in the final data (but the ratios and trends will be correct). The final debarcoded data will always include all events

Installation

Install required R packages

You need to install the devtools package, available from CRAN, and the flowCore package from Bioconductor. The rest of the dependencies for premessa will be automatically installed

devtools

Open an R session, type the following command and select a CRAN mirror when prompted.

install.packages("devtools")

flowCore

Open an R session and type the following commands

source("http://bioconductor.org/biocLite.R")
biocLite("flowCore")

Install premessa

Start an R session and type the following commands

library(devtools)
install_github("ParkerICI/premessa")

This will install the premessa R package together with all the required dependencies.

Note: the latest version of devtools seems to be occasionally having problems installing dependencies on Windows. If the installation of premessa fails due to a missing package, please install the offending packages manually, using the R install.packages function

Usage

The software allows you to perform four operations: - Panel editing and renaming - FCS file concatenation - Bead-based normalization - De-barcoding

For each operation there is a separate GUI, and an associated set of R functions that can be called wihtout using the GUI, if desired

Panel editing and renaming

premessa includes a component for editing and renaming the panel of a set of FCS files. This is useful when you need to harmonize panels across a number of files, so that they can be prepared for downstream analysis (most analysis tools expect files that are part of the same analysis to have identical panels).

A few warnings. premessa is opinionated in the way it handles the information in the FCS panels, and is specifically tailored to handle the most common use case. The FCS file specification is problematic in a lot of ways and most instrument and analysis software packages do not use or interpret the information correctly anyways. There are 3 ways to refer to a channel in an FCS file, by number (e.g. the order in which they appear in the file), by name (e.g. Dy161Di ) and by description (e.g. CD3). premessa uses the name as unique channel identifier for two reasons: - it is guaranteed to be unique in a valid FCS file - it minimizes the risk of confusion when matching channels between multiple FCS files, as it corresponds to the intuitive notion of matching channels based on their identity instead of their ordering.

The consequence of this choice is that the ordering of the channels is not preserved during the processing. Also at present premessa only preseves the name and description parameter keywords (i.e. $PnN and $PnS). All other parameter keywords (e.g. $PnG, $PnL, $PnO etc.) are discarded. Most of these keywords are used incorrectly anyways, but please feel free to open an issue if this is impacting your workflow.

Starting the GUI and selecting the working directory

You can start the panel editor GUI by typing the following commands in your R session

library(premessa)
paneleditor_GUI()

This will open a new web browser window, which is used for displaying the GUI. Upon starting, a file selection window will also appear from your R session. You should use this window to navigate to the directory containing the data you want to analyze, and select any file in that directory. The directory itself will then become the working directory for the software.

To stop the software simply hit the "ESC" key in your R session.

Usage

Once you have selected the working directory, the software will extract the panel information from all the FCS files contained in the directory. This information is then displayed in a table, where each row corresponds to a different parameter name ($PnN keyword), indicated by the row names (leftmost column), and each column corresponds to a different file, indicated in the column header. Each cell represents the description string ($PnS keyword) of a specific parameter in a given file. If a parameter is missing from a file, the word absent is displayed in the corresponding cell, which will be colored orange (note that this means that absent cannot be a valid parameter name)

Whatever is written in the table when the Process files button is pressed, represents what the parameters will be renamed to. In other words the table represents the current state of the files, and you have to edit the individual cells as necessary to reflect the desired final state of the files. You can use the same shortcuts you use in Excel to facilitate the editing process (e.g. shift-click to select multiple rows or columns, ctrl-C and ctrl-V for copy and paste respectively, etc.). However be careful that pressing ctrl-Z (the conventional undo shortcut) will undo all you changes

The table begins with three special columns: - Remove: if the box is checked the corresponding parameter is removed from all the files, and the row is grayed out - Parameter: this column represent the parameter name ($PnN keyword). Initially it will be identical to the first column of row names, but this column is editable. You can edit this column if you want to change the parameter names in the output files. - Most common: this column indicates what is the most common description value for that parameter, across all the files under analysis (i.e. the most common string across the row). Cells whose value differs from the value indicated in this column are displayed with a light pink background.

The table columns are sorted by the number of problematic columns, i.e. by the number of pink and orange cells in the column. The first three columns of the table are fixed and always visible when you scroll the table horizontally. Please note that the browser included with the current vesion of RStudio seems to have a problem where the column headers do not scroll correctly. If that is the case, open the application in a regular web browser, by click on the "open browser" button in the top right corner of the RStudio browser.

Two controls are located at the top of the table - Output folder name: a text box where you can input the name of the output folder. If this folder does not exist, it will be created as a sub-folder of the current working directory. - Process files: this button will start file processing. A file will be created in the output folder, with the same name as the original input file. If a file of the same name exists already, it will be overwritten. No change is made to the original files.

FCS file concatenation

The premessa package contains a simple function for concatenating multiple FCS files together. This is useful in case the acquisition of a single samples has been split across multiple files. The function is called concatenate_fcs_files and its documentation can be acessed directly from R. Note that this function assumes that the files are all identical panel-wise, i.e. they have the same parameter names and descriptions. Please use the panel editor before concatenation if that is not the case.

Bead-based normalization

The idea behind the method is described in this publication. This software represents an R re-implementation of the original normalization software developed for Matlab.

Sample data for testing is available here (only download the FCS in the top level directory, not the contents of the beads and normed sub-folders).

Inputs and outputs

The normalization workflow involves the following steps:

  1. Beads identification through gating
  2. Data normalization
  3. Beads removal (optional)

Assuming the working directory is called working_directory and contains two FCS files called A.fcs and B.fcs, at the end of the workflow the following directory structure and output files will be generated

working_directory
|--- A.fcs
|--- B.fcs
|--- normed
     |--- A_normalized.fcs
     |--- B_normalized.fcs
     |--- beads_before_and_after.pdf
     |--- beads_gates.json
     |--- beads_vs_time
          |--- A.pdf
          |--- B.pdf
     |--- beads_removed
          |--- A_normalized_beadsremoved.fcs
          |--- B_normalized_beadsremoved.fcs
          |--- removed_events
               |--- A_normalized_removedEvents.fcs
               |--- B_normalized_removedEvents.fcs
     |--- beads
          |--- A_beads.fcs
          |--- B_beads.fcs

Starting the GUI and selecting the working directory

You can start the normalizer GUI by typing the following commands in your R session

library(premessa)
normalizer_GUI()

This will open a new web browser window, which is used for displaying the GUI. Upon starting, a file selection window will also appear from your R session. You should use this window to navigate to the directory containing the data you want to analyze, and select any file in that directory. The directory itself will then become the working directory for the software.

To stop the software simply hit the "ESC" key in your R session.

The GUI is organized in two tabs: - Normalize data: used for beads gating and data normalization - Remove beads: used for beads removal

Normalize data panel

This panel contains the following controls:

The workflow involves cycling through all the files and adjusting the beads gates in the plot, in order to identify the beads. Only events that are included in all the beads gates are identified as beads. As detailed in the dialog box that is above the row of buttons, only files for which the gates have been defined will be used as input for normalization.

You can cycle back and forth between different files, as the GUI will remember the gates you have selected for each file.

Remove beads panel

This panel has the following controls

The bead removal procedure is based on the idea of looking at the distance between each event and the centroid of the beads population, and removing all the events that are closer than a given threshold to the beads population, and therefore are likely to represent beads as opposed to true cells.

To this end, during normalization, the software calculates the square root of the Mahalanobis distance of each event from the centroid of the beads population, and records this information in the beadDist parameter in the FCS file with the normalized data (i.e. the _normalized.fcs files in the normed sub-folder).

During the beads removal step, all the events whose beadDist is less or equal than the Cutoff for bead removal parameter are removed from the FCS. The removed events are saved in the removed_events sub-folder (see above).

The plots in the bottom half of the panel help you select an appropriate cutoff. They display all the pairs of beads channels. Beads should appear as a population in the upper right corner (as they will be double-positives for all the channel pairs). The color of the points represent the distance from the beads population. You should choose a cutoff so that most of the bead events are below the cutoff, and most of the non-beads events are above it. The legend for the color scale is located above the plots.

Differences with the Matlab Normalizer

The normalization algorithm is exactly identical to the one used in the original Matlab implementation of the normalizer. The only differences relate to the way the GUI manages the workflow, and to the organization of the output directory structure.

With the Matlab implementation, when you do gating and beads removal, you process one file at the time, and there is no way to look back at previously analyzed files, for instance for adjusting the gates.

premessa separates the two steps of the workflow. First you setup the gates, moving back and forth between the input files as needed. This is useful if, for instance, you want to use the exact same gates for all the files, because you need to visualize all the data, before you can identify gates that will work across all the files. Once the gates have been setup, all the files are normalized and the results are saved.

You then switch to the beads removal step, once again visualizing the files back and forth as needed, until you have selected an appropriate cutoff. You can then either remove beads from a single file, or from all the files simultaneously. Because the intermediate normalization results have been saved, you can repeat the beads removal step if needed.

De-barcoding

The idea behind the method is described in this publication. This software represents an R re-implementation of the original debarcoding software developed for Matlab.

Sample data for testing is available here (you only need the .csv and .fcs files)

Operation

You can start the debarcoder GUI by typing the following commands in your R session

library(premessa)
debarcoder_GUI()

Upon launching the GUI you will have access to the following controls:

Assuming the FCS file barcoded_data.fcs is located in the directory dir, and the barcode key defines 3 barcoded populations (A, B, C), the following directories and output files will be created at the end of the debarcoding process:

dir
|--- barcoded_data.fcs
|--- debarcoded
     |--- barcoded_data_A.fcs
     |--- barcoded_data_B.fcs
     |--- barcoded_data_C.fcs
     |--- barcoded_data_Unassigned.fcs

Plot types

There are four types of visualization that allow you to inspect the results of the debarcoding process, and choose the optimal seperation and Mahalanobis distance thresholds. Each plot window is subdivided into a top and bottom section, as described below:

De-barcoding without the GUI

All the R functions necessary to perform debarcoding can be accessed directly, and documentation is available in the standard R documentation format, accessible through the R session. The main wrapper function that executes the entire procedure is called debarcode_fcs. Inspecting the code of this function should give you an idea of what are the steps involved, and which functions perform each one.



ParkerICI/premessa documentation built on Sept. 16, 2022, 3:06 p.m.