Bodenmiller_BCR_XL_SE: 'Bodenmiller_BCR_XL' dataset

Bodenmiller_BCR_XLR Documentation

'Bodenmiller_BCR_XL' dataset

Description

Mass cytometry (CyTOF) dataset from Bodenmiller et al. (2012), consisting of 8 paired samples (16 samples) of stimulated (BCR-XL) and unstimulated peripheral blood cells from healthy individuals. This dataset can be used to benchmark differential analysis algorithms used to test for differential states within cell populations.

Usage

Bodenmiller_BCR_XL_SE(metadata = FALSE)
Bodenmiller_BCR_XL_flowSet(metadata = FALSE)

Arguments

metadata

logical value indicating whether ExperimentHub metadata (describing the overall dataset) should be returned only, or if the whole dataset should be loaded. Default = FALSE, which loads the whole dataset.

Details

This is a mass cytometry (CyTOF) dataset from Bodenmiller et al. (2012), consisting of paired samples of peripheral blood cells from healthy individuals, where one sample from each pair was stimulated with B cell receptor / Fc receptor cross-linker (BCR-XL), and the other sample is the reference. The dataset contains strong differential expression of several signaling markers in several cell populations; one of the strongest effects is differential expression of phosphorylated S6 (pS6) in the population of B cells.

This dataset can be used to benchmark differential analysis algorithms used to test for differential states within cell populations (e.g. differential expression of pS6 in B cells).

There are 8 paired samples (i.e. 16 samples in total), and a total of 172,791 cells. The dataset contains expression levels of 24 protein markers (10 surface lineage markers used to define cell populations or clusters, and 14 intracellular signaling functional markers). The surface markers are classified as 'cell type' markers, and the signaling markers as 'cell state' markers.

Cell population or cluster labels are available from Nowicka et al. (2017), where these were generated using a strategy of expert-guided manual merging of automatically generated clusters from the FlowSOM clustering algorithm (Van Gassen et al., 2015).

The dataset is provided in two Bioconductor object formats: SummarizedExperiment and flowSet. In each case, cells are stored in rows, and protein markers in columns (this is the usual format used for flow and mass cytometry data).

For the link{SummarizedExperiment}, row and column metadata can be accessed with the rowData and colData accessor functions from the SummarizedExperiment package. The row data contains group IDs, patient IDs, sample IDs, and cell population IDs. The column data contains channel names, protein marker names, and a factor marker_class to identify the class of each protein marker ('cell type', 'cell state', as well as 'none' for any non protein marker columns that are not needed for downstream analyses). The expression values for each cell can be accessed with assay. The expression values are formatted as a single table.

For the flowSet, the expression values are stored in a separate table for each sample. Each sample is represented by one flowFrame object within the overall flowSet. The expression values can be accessed with the exprs function from the flowCore package. Row metadata is stored as additional columns of data within the flowFrame for each sample; note that factor values are converted to numeric values, since the expression tables must be numeric matrices. Channel names are stored in the column names of the expression tables. Additional row and column metadata is stored in the description slots, which can be accessed with the description accessor function for the individual flowFrames; this includes filenames, additional sample information, additional marker information, and cell population information.

Prior to performing any downstream analyses, the expression values should be transformed. A standard transformation used for mass cytometry data is the asinh with cofactor = 5.

File sizes: 24.6 MB (SummarizedExperiment), 24.8 MB (flowSet).

Original source: Bodenmiller et al. (2012): https://www.ncbi.nlm.nih.gov/pubmed/22902532

Original link to raw data (Cytobank, experiment 15713): https://community.cytobank.org/cytobank/experiments/15713/download_files

Additional information (Citrus wiki page): https://github.com/nolanlab/citrus/wiki/PBMC-Example-1

Cell population labels from: Nowicka et al. (2017), v2: https://f1000research.com/articles/6-748/v2

This dataset has previously been used to benchmark algorithms for differential analysis by ourselves and other authors, including Bruggner et al. (2014) (https://www.ncbi.nlm.nih.gov/pubmed/24979804/), Nowicka et al. (2017) (https://f1000research.com/articles/6-748/v2), and Weber et al. (2019) (https://www.ncbi.nlm.nih.gov/pubmed/31098416).

Data files are also available from FlowRepository (FR-FCM-ZYL8): http://flowrepository.org/id/FR-FCM-ZYL8

Value

Returns a SummarizedExperiment or flowSet object.

References

Bodenmiller et al. (2012). "Multiplexed mass cytometry profiling of cellular states perturbed by small-molecule regulators." Nature Biotechnology, 30(9), 858-867: https://www.ncbi.nlm.nih.gov/pubmed/22902532

Bruggner et al. (2014), "Automated identification of stratifying signatures in cellular subpopulations." PNAS, 111(26), E2770-E2777: https://www.ncbi.nlm.nih.gov/pubmed/24979804/

Nowicka et al. (2017). "CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets." F1000Research, v2: https://f1000research.com/articles/6-748/v2

Van Gassen et al. (2015). "FlowSOM: Using self-organizing maps for visualization and interpretation of cytometry data." Cytometry Part A, 87A, 636-645: https://www.ncbi.nlm.nih.gov/pubmed/25573116

Weber et al. (2019). "diffcyt: Differential discovery in high-dimensional cytometry via high-resolution clustering." Communications Biology, 2:183: https://www.ncbi.nlm.nih.gov/pubmed/31098416

Examples

Bodenmiller_BCR_XL_SE()
Bodenmiller_BCR_XL_flowSet()

lmweber/HDCytoData documentation built on March 19, 2024, 4:41 a.m.