Parse a flowJo Workspace

Share:

Description

Function to parse a flowJo Workspace, generate a GatingHierarchy or GatingSet object, and associated flowCore gates. The data are not loaded or acted upon until an explicit call to recompute() is made on the GatingHierarchy objects in the GatingSet.

Usage

1
2
## S4 method for signature 'flowJoWorkspace'
parseWorkspace(obj, ...)

Arguments

obj

A flowJoWorkspace to be parsed.

...
  • name numeric or character. The name or index of the group of samples to be imported. If NULL, the groups are printed to the screen and one can be selected interactively. Usually, multiple groups are defined in the flowJo workspace file.

  • execute TRUE|FALSE a logical specifying if the gates, transformations, and compensation should be immediately calculated after the flowJo workspace have been imported. TRUE by default.

  • isNcdf TRUE|FALSE logical specifying if you would like to use netcdf to store the data, or if you would like to keep all the flowFrames in memory. For a small data set, you can safely set this to FALSE, but for larger data, we suggest using netcdf. You will need the netcdf C library installed.

  • subset numeric vector specifying the subset of samples in a group to import. Or a character specifying the FCS filenames to be imported. Or an expression to be passed to 'subset' function to filter samples by 'pData' (Note that the columns referred by the expression must also be explicitly specified in 'keywords' argument)

  • requiregates logical Should samples that have no gates be included?

  • includeGates logical Should gates be imported, or just the data with compensation and transformation?

  • path either a character scalar or data.frame. When character, it is a path to the fcs files that are to be imported. The code will search recursively, so you can point it to a location above the files. When it is a data.frame, it is expected to contain two columns:'sampleID' and 'file', which is used as the mapping between 'sampleID' and FCS file (absolute) path. When such mapping is provided, the file system searching is avoided.

  • sampNloc a character scalar indicating where to get sampleName(or FCS filename) within xml workspace. It is either from "keyword" or "sampleNode".

  • compensation=NULL: a compensation or a list of compensations that allow the customized compensation matrix to be used instead of the one specified in flowJo workspace.

  • options=0: a integer option passed to xmlTreeParse

  • channel.ignore.case a logical flag indicates whether the colnames(channel names) matching needs to be case sensitive (e.g. compensation, gating..)

  • extend_val numeric the threshold that determine wether the gates need to be extended. default is 0. It is triggered when gate coordinates are below this value.

  • extend_to numeric the value that gate coordinates are extended to. Default is -4000. Usually this value will be automatically detected according to the real data range. But when the gates needs to be extended without loading the raw data (i.e. execute is set to FALSE), then this hard-coded value is used.

  • leaf.bool a logical whether to compute the leaf boolean gates. Default is TRUE. It helps to speed up parsing by turning it off when the statistics of these leaf boolean gates are not important for analysis. (e.g. COMPASS package will calculate them by itself.) If needed, they can be calculated by calling recompute method at later stage.

  • additional.keys character vector: The keywords (parsed from FCS header) to be combined(concatenated with "_") with FCS filename to uniquely identify samples. Default is '$TOT' (total number of cells) and more keywords can be added to make this GUID.

  • keywords character vector specifying the keywords to be extracted as pData of GatingSet

  • keywords.source character the place where the keywords are extracted from, can be either "XML" or "FCS"

  • keyword.ignore.case a logical flag indicates whether the keywords matching needs to be case sensitive.

  • ...: Additional arguments to be passed to read.ncdfFlowSet or read.flowSet.

Details

A flowJoWorkspace is generated with a call to openWorkspace(), passing the name of the xml workspace file. This returns a flowJoWorkspace, which can be parsed using the parseWorkspace() method. The function can be called non-interactively by passing the index or name of the group of samples to be imported via parseWorkspace(obj,name=x), where x is either the numeric index, or the name. The subset argument allows one to select a set of files from the chosen sample group. The routine will take the intersection of the files in the sample group, the files specified in subset and the files available on disk, and import them.

Value

a GatingSet, which is a wrapper around a list of GatingHierarchy objects, each representing a single sample in the workspace. The GatingHierarchy objects contain graphNEL trees that represent the gating hierarchy of each sample. Each node in the GatingHierarchy has associated data, including the population counts from flowJo, the parent population counts, the flowCore gates generated from the flowJo workspace gate definitions. Data are not yet loaded or acted upon at this stage. To execute the gating of each data file, a call to execute() must be made on each GatingHierarchy object in the GatingSet. This is done automatically by default, and there is no more reason to set this argument to FALSE.

See Also

getSampleGroups,GatingSet

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
## Not run: 
	 #f is a xml file name of a flowJo workspace
	ws <- openWorkspace(f)
 #parse the second group
	gs <- parseWorkspace(ws, name = 2); #assume that the fcs files are under the same folder as workspace

 
 gs <- parseWorkspace(ws, name = 4
                        , path = dataDir     #specify the FCS path 
                        , subset = "CytoTrol_CytoTrol_1.fcs"     #subset the parsing by FCS filename
                        , isNcdf = FALSE)#turn off cdf storage mode (normally you don't want to do this for parsing large dataset)

 

 gs <- parseWorkspace(ws, path = dataDir, name = 4
                         , keywords = c("PATIENT ID", "SAMPLE ID", "$TOT", "EXPERIMENT NAME") #tell the parser to extract keywords as pData
                         , keywords.source = "XML" # keywords are extracted from xml workspace (alternatively can be set to "FCS")
                         , additional.keys = c("PATIENT ID") #use additional keywords together with FCS filename to uniquely identify samples
                         , execute = F) # parse workspace without the actual gating (can save time if just want to get the info from xml)

#subset by pData (extracted from keywords)
gs <- parseWorkspace(ws, path = dataDir, name = 4
                         , subset = `TUBE NAME` %in% c("CytoTrol_1", "CytoTrol_2")
                         , keywords = "TUBE NAME")


#overide the default compensation defined in xml with the customized compenstations
gs <- parseWorkspace(ws, name = 2, compensation = comps); #comp is either a compensation object or a list of compensation objects

## End(Not run)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.