HDCytoData | R Documentation |
Data package containing a collection of high-dimensional cytometry datasets saved in
SummarizedExperiment
and flowSet
Bioconductor object formats,
hosted on Bioconductor ExperimentHub.
Overview
This package contains a set of publicly available high-dimensional flow cytometry and
mass cytometry (CyTOF) datasets, which have been formatted into SummarizedExperiment
and flowSet
Bioconductor object formats.
The objects contain the cell-level expression values, as well as row and column metadata. The row metadata includes sample IDs, group IDs, and true cell population labels or cluster labels (where available). The column metadata includes channel names, protein marker names, and protein marker classes (cell type, cell state, as well as non protein marker columns).
These datasets have been used in our previous work and publications for benchmarking purposes,
e.g. to benchmark clustering algorithms or methods for differential analysis. They are provided
here in the SummarizedExperiment
and flowSet
formats to make them easier to access.
The package contains the following datasets, which can be grouped into datasets useful for benchmarking either (i) clustering algorithms or (ii) methods for differential analysis.
Clustering:
Levine_32dim
Levine_13dim
Samusik_01
Samusik_all
Nilsson_rare
Mosmann_rare
Differential analysis:
Krieg_Anti_PD_1
Bodenmiller_BCR_XL
Programmatic access to list of datasets
An updated list of all available datasets can also be obtained programmatically using the
ExperimentHub
accessor functions, as follows. This retrieves a table of metadata from
the ExperimentHub
database, which includes information such as the ExperimentHub ID,
title, and description for each dataset.
ehub <- ExperimentHub() # create ExperimentHub instance
ehub <- query(ehub, "HDCytoData") # find HDCytoData datasets
md <- as.data.frame(mcols(ehub)) # retrieve metadata table
Additional details
For additional details on each dataset, including references and raw data sources, see the help files for each dataset.
For a short tutorial showing how to load the data objects, see the "HDCytoData package" vignette.
Note that flow and mass cytometry datasets should be transformed prior to performing any
downstream analyses, such as clustering. Standard transforms include the asinh
with
cofactor
parameter equal to 5 (for mass cytometry data) or 150 (for flow cytometry data).
The steps to prepare each data object from the raw data files are included in the make-data
scripts in the directory inst/scripts
.
File sizes are listed in the help files for the datasets. The removeCache
function from the ExperimentHub
package can be used to clear the local download cache.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.