knitr::opts_chunk$set(echo = TRUE, results = "markup", message = FALSE)
The purpose of this package is to provide the infrastructure to store, represent and exchange gated flow data. By this we mean accessing the samples, groups, transformations, compensation matrices, gates, and population statistics in the gating tree, which is represented as a
GatingSet object in
There are several ways to generate a
built from scratch within
R (which will be demonstrated later)
imported from the XML workspace files exported from other software (e.g. FlowJo, Diva, CytoBank). Details on the importing xml are documented in CytoML package.
generated by automated gating framework from openCyto package
loaded from the existing GatingSet archive (that was previously saved by
Here we simply load an example
GatingSet archive to illustrate how to interact with a
library(flowWorkspace) dataDir <- system.file("extdata",package="flowWorkspaceData") gs_archive <- list.files(dataDir, pattern = "gs_bcell_auto",full = TRUE) gs <- load_gs(gs_archive) gs
We have loaded a
r length(gs) samples, each of which has
r length(gs_get_pop_paths(gs))-1 associated gates.
To list the samples stored in
Subsets of a
GatingSet can be accessed using the standard R subset syntax
We can plot the gating tree:
plot(gs, bool = TRUE)
The boolean gates(notes) are highlighted in blue color.
We can list the nodes (populations) in the gating hierarchy:
gs_get_pop_paths(gs, path = 2)
Note that the
path argument specifies the depth of the gating path for each population.
1 (i.e. leaf or terminal node name) may not be sufficient to uniquely identify each population. The issue can be resolved by increasing the
path or simply returning the full path of the node:
gs_get_pop_paths(gs, path = "full")
full path may not be necessary and could be too long to be visualized. So we provide the
path = 'auto' option to determine the shortest path that is still unique within the gating tree.
nodelist <- gs_get_pop_paths(gs, path = "auto") nodelist
We can get the gate associated with the specific population:
node <- nodelist g <- gs_pop_get_gate(gs, node) g
We can retrieve the population statistics :
We can plot individual gates. Note the scale of the transformed axes. The second argument is the node path of any depth as long as it is uniquely identifiable.
library(ggcyto) autoplot(gs, node)
More details about gate visualization can be found here.
If we have metadata associated with the experiment, it can be attached to the
d <- data.frame(sample=factor(c("sample 1", "sample 2")),treatment=factor(c("sample","control")) ) pd <- pData(gs) pd <- cbind(pd,d) pData(gs) <- pd pData(gs)
We can subset the
GatingSet by its
subset(gs, treatment == "control")
flow data can be retrieved by:
cs <- gs_pop_get_data(gs) class(cs) nrow(cs[])
GatingSet is a purely reference class, the class type returned by
gs_pop_get_data is a
cytoset, which is the purely reference class analog of a
flowSet and will be discussed in more detail below. Also note that the data is already compensated and transformed during the parsing.
We can retrieve the subset of data associated with a population node:
cs <- gs_pop_get_data(gs, node) nrow(cs[])
We can retrieve a single gating hierarchical tree (corresponding to one sample) by using the
[[ extraction operator
gh <- gs[] gh
Note that the index can be either numeric or character (the
guid returned by the
autoplot method without specifying any node will lay out all the gates in the same plot
We can retrieve the indices specifying if an event is included inside or outside a gate using:
The indices returned are relative to the parent population (member of parent AND member of current gate), so they reflect the true hierarchical gating structure.
GatingSet provides methods to build a gating tree from raw FCS files and add or remove flowCore gates (or populations) to or from it.
We start from a
flowSet that contains three ungated flow samples:
library(flowCore) data(GvHD) #select raw flow data fs <- GvHD[1:2]
Then construct a
GatingSet from the
gs <- GatingSet(fs)
Then compensate it:
cfile <- system.file("extdata","compdata","compmatrix", package="flowCore") comp.mat <- read.table(cfile, header=TRUE, skip=2, check.names = FALSE) ## create a compensation object comp <- compensation(comp.mat) #compensate GatingSet gs <- compensate(gs, comp)
New: You can now pass a list of
compensation objects with elements named by
sampleNames(gs) to achieve sample-specific compensations. e.g.
gs <- compensate(gs, comp.list)
Then we can transform it with any transformation defined by the user through
trans_new function of
require(scales) trans.func <- asinh inv.func <- sinh trans.obj <- trans_new("myAsinh", trans.func, inv.func)
inverse transformation is required so that the gates and data can be visualized in
transformed scale while the axis label still remains in the raw scale. Optionally, the
format functions can be supplied to further customize the appearance of axis labels.
Besides doing all these by hand, we also provide some buildin transformations:
logicle_trans. These are all very commonly used transformations in flow data analysis. User can construct the transform object by simply one-line of code. e.g.
trans.obj <- asinhtGml2_trans() trans.obj
transformer object is created, we must convert it to
GatingSet to use.
chnls <- colnames(fs)[3:6] transList <- transformerList(chnls, trans.obj)
Alternatively, the overloaded
estimateLogicle method can be used directly on
GatingHierarchy to generate a
transformerList object automatically.
Now we can transform our
GatingSet with this
transformerList object. It will also store the transformation in the
GatingSet and can be used to inverse-transform the data.
gs <- transform(gs, transList) gs_get_pop_paths(gs)
It now only contains the root node. We can add our first
rg <- rectangleGate("FSC-H"=c(200,400), "SSC-H"=c(250, 400), filterId="rectangle") nodeID <- gs_pop_add(gs, rg) nodeID gs_get_pop_paths(gs)
Note that the gate is added to the root node by default if the parent is not specified.
Then we add a
quadGate to the new population generated by the
rectangleGate which is named after the
filterId of the gate because the name was not specified when the
add method was called.
qg <- quadGate("FL1-H"= 0.2, "FL2-H"= 0.4) nodeIDs <- gs_pop_add(gs,qg,parent="rectangle") nodeIDs gs_get_pop_paths(gs)
quadGate produces four population nodes/populations named after the dimensions of the gate if names are not specified.
A Boolean gate can also be defined and added to GatingSet:
bg <- booleanFilter(`CD15 FITC-CD45 PE+|CD15 FITC+CD45 PE-`) bg nodeID2 <- gs_pop_add(gs,bg,parent="rectangle") nodeID2 gs_get_pop_paths(gs)
The gating hierarchy is plotted by:
Note that Boolean gate is skipped by default and thus needs to be enabled explictily.
Now all the gates are added to the gating tree but the actual data is not gated yet.
This is done by calling the
recompute method explictily:
After gating is finished, gating results can be visualized by the
autoplot(gs,"rectangle") #plot one Gate
Multiple gates can be plotted on the same panel:
autoplot(gs, gs_pop_get_children(gs[], "rectangle")[1:4])
We may also want to plot all the gates without specifying the gate index:
We can retrieve all the compensation matrices from the
GatingHierarchy in case we wish to use the compensation or transformation for the new data,
gh <- gs[] gh_get_compensations(gh);
Or we can retrieve transformations:
trans <- gh_get_transformations(gh) names(trans) trans[]
If we want to remove one node, simply:
Rm('rectangle', gs) gs_get_pop_paths(gs)
As we see, removing one node causes all its descendants to be removed as well.
Oftentimes, we need to save a
GatingSet including the gated flow data, gates, and populations to disk and reload it later on. This can be done by:
tmp <- tempdir() save_gs(gs,path = file.path(tmp,"my_gs")) gs <- load_gs(file.path(tmp,"my_gs"))
We also provide the
gs_clone method to make a full copy of an existing
gs1 <- gs_clone(gs)
To only copy the gates and populations without copy the underlying cyto data.
gs2 <- gs_copy_tree_only(gs)
This is a lightweight copying which is faster than
gs_clone. But be aware the new
GatingSet share the same events data (i.e.
gs_cyto_data(gs)) with the original one.
Note that the
GatingSet is a purely reference class with an external pointer that points to the internal 'C' data structure. So make sure to use these methods in order to save or make a copy of an existing
GatingSet object. The regular R assignment (<-) or
save routine doesn't work as expected for
GatingSet class no longer uses
flowSet objects for containing the underlying flow data, but rather now uses the analogous
cytoset are essentially reference classes with pointers to internal 'C' data structures and thus enable
GatingSet operations to be performed more efficiently.
While working with
GatingSet objects will often entail working with
cytoset objects implicitly, it is also possible to directly work with objects of both of these classes.
cytoframe objects can be created from FCS files with the
load_cytoframe_from_fcs() method. The optional
num_threads argument allows for parallelization of the read operation.
files <- list.files(dataDir, "Cyto", full.names = TRUE) cf <- load_cytoframe_from_fcs(files, num_threads = 4) cf
Instead of using
read.FCSheader() to obtain only the header of the file, just use the
text.only argument to
cfh <- load_cytoframe_from_fcs(files, text.only = TRUE) cfh
The accessor methods function the same as they would for a
cytoset are reference classes, copying objects of either class by the assignment operator (
<-) will simply provide a copy of the external pointer and so changes made to the copy will also affect the original object.
cf1 <- cf # cf is a reference colnames(cf1)
colnames(cf1) <- "t" colnames(cf) # The change affects the original cf object
Extracting a subset of a
cytoframe is not computationally intensive, as it merely constructs a view of the data of the original
cytoframe. However, both objects still share the same underlying pointer to all of the data and thus changes to a view will affect the data of the original
cf1 <- cf[1:10, 2:3] dim(cf1) exprs(cf)[2,3] exprs(cf1)[2,2] <- 0 # data change affects the orignal cf exprs(cf)[2,3]
To construct a new view of an entire
cytoframe, use the
 method rather than the
<- operator. This will ensure that a new view is created to the full underlying dataset.
cf1 <- cf
It is also possible to perform a deep copy of a
cytoframe or a view of it, resulting in two objects pointing to distinct C-level representations of the data. This is accomplished with the
cf <- load_cytoframe_from_fcs(files, num_threads = 4) # starting fresh cf1 <- realize_view(cf[1:10, 2:3]) dim(cf1) exprs(cf)[2,3] exprs(cf1)[2,2] <- 0 # data change no longer affects the original cf exprs(cf)[2,3] exprs(cf1)[2,2] # but does affect the separate data of cf1
Similarly, if a deep copy of all of the data is desired (not a subset), simply call
realize_view on the original
Conversion of objects between the
flowFrame classes is accomplished with a few coercion methods
fr <- cytoframe_to_flowFrame(cf) class(fr) cf_back <- flowFrame_to_cytoframe(fr) class(cf_back)
Of course (as a side note), here
flowFrame_to_cytoframe() had no knowledge of the
cytoframe origin of
cf_back points to a new copy of the underlying data.
identical(cf@pointer, cf_back@pointer) # These point to distinct copies of the data
A couple of methods handle the task of writing or reading a
cytoframe in the HDF5 format on disk
tmpfile <- tempfile(fileext = ".h5") cf_write_h5(cf, tmpfile) loaded <- load_cytoframe(tmpfile)
Most of the above methods for
cytoframe objects have
For reading in a
cytoset from FCS files, use
files <- list.files(dataDir, "Cyto", full.names = TRUE) cs <- load_cytoset_from_fcs(files, num_threads = 4) cs
Once constructed, it can be saved/loaded through more efficient archive format.
tmp <- tempfile() save_cytoset(cs, tmp) cs <- load_cytoset(tmp, backend_readonly = FALSE)
backend_readonly is set to
TRUE by default to protect the data from accidental changes. So it has to be turned off explicitly if your want to modify the loaded
The accessor methods function the same as they would for a
[ will work in a manner similar to that for a
flowSet, but will result in another
cytoset that is a view in to the data of the original
Subset() method, when called on a
cytoset, will also return a
cytoset that is a view in to the orignal data rather than a deep copy.
sub_cs <- cs
Important: xtraction using
[[ on a
cytoset will by default return a
cytoframe and so will represent a reference of the underlying data. Thus, altering the result of the extraction will alter the underlying data of the original
sub_fr <- cs[] exprs(cs[])[2,2] exprs(sub_fr)[2,2] <- 0 # This WILL affect the original data exprs(cs[])[2,2]
To return a
flowFrame that represents a copy of the data of the
original cytoset, you need to use the
sub_cf <- cs[[1, returnType = "flowFrame"]] exprs(cs[])[2,2] exprs(sub_cf)[2,2] <- 100 # This WILL NOT affect the original data exprs(cs[])[2,2]
Alternatively, if it is easier to remember,
get_cytoframe_from_cs will accomplish the same goal
sub_cf <- get_cytoframe_from_cs(cs,1)
realize_view() methods work in a similar manner for
cytoset objects as
 will return a view in to the original data while
realize_view() will perform a deep copy.
If this package is throwing errors when parsing your workspace, contact the package author by emails for post an issue on https://github.com/RGLab/flowWorkspace/issues. If you can send your workspace by email, we can test, debug, and fix the package so that it works for you. Our goal is to provide a tool that works, and that people find useful.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.