Weber_AML_sim | R Documentation |
Semi-simulated mass cytometry (CyTOF) datasets from Weber et al. (2019), constructed by computationally 'spiking in' small percentages of AML (acute myeloid leukemia) blast cells into samples of healthy BMMCs (bone marrow mononuclear cells), simulating the phenotype of minimal residual disease (MRD) in AML patients. These datasets can be used to benchmark differential analysis algorithms used to test for differentially abundant rare cell populations. Raw data sourced from Levine et al. (2015), and data generation strategy modified from Arvaniti et al. (2017). See Weber et al. (2019), Supplementary Note 1, for more details.
Weber_AML_sim_main_5pc_SE(metadata = FALSE)
Weber_AML_sim_main_5pc_flowSet(metadata = FALSE)
Weber_AML_sim_main_1pc_SE(metadata = FALSE)
Weber_AML_sim_main_1pc_flowSet(metadata = FALSE)
Weber_AML_sim_main_0.1pc_SE(metadata = FALSE)
Weber_AML_sim_main_0.1pc_flowSet(metadata = FALSE)
Weber_AML_sim_main_blasts_all_SE(metadata = FALSE)
Weber_AML_sim_main_blasts_all_flowSet(metadata = FALSE)
Weber_AML_sim_null_SE(metadata = FALSE)
Weber_AML_sim_null_flowSet(metadata = FALSE)
Weber_AML_sim_random_seeds_5pc_SE(metadata = FALSE)
Weber_AML_sim_random_seeds_5pc_flowSet(metadata = FALSE)
Weber_AML_sim_random_seeds_1pc_SE(metadata = FALSE)
Weber_AML_sim_random_seeds_1pc_flowSet(metadata = FALSE)
Weber_AML_sim_random_seeds_0.1pc_SE(metadata = FALSE)
Weber_AML_sim_random_seeds_0.1pc_flowSet(metadata = FALSE)
Weber_AML_sim_less_distinct_5pc_SE(metadata = FALSE)
Weber_AML_sim_less_distinct_5pc_flowSet(metadata = FALSE)
Weber_AML_sim_less_distinct_1pc_SE(metadata = FALSE)
Weber_AML_sim_less_distinct_1pc_flowSet(metadata = FALSE)
Weber_AML_sim_less_distinct_0.1pc_SE(metadata = FALSE)
Weber_AML_sim_less_distinct_0.1pc_flowSet(metadata = FALSE)
metadata |
|
This is a set of semi-simulated mass cytometry (CyTOF) datasets, generated for benchmarking purposes in our paper introducing the 'diffcyt' framework (Weber et al., 2019).
The datasets are constructed by computationally 'spiking in' small percentages of AML (acute myeloid leukemia) blast cells into samples of healthy BMMCs (bone marrow mononuclear cells), simulating the phenotype of minimal residual disease (MRD) in AML patients. Blast cells are spiked in at 3 different thresholds of abundance (5%, 1%, and 0.1%), to create multiple simulations with varying levels of difficulty.
These datasets can be used to benchmark differential analysis algorithms used to test for differentially abundant rare cell populations.
The raw data consists of 5 healthy samples (H1-H5), 1 AML (diseased) sample from condition CN (cytogenetically normal), and 1 AML sample from condition CBF (core-binding factor translocation). The dataset is constructed by splitting the healthy samples into 3 parts; one part is kept as the healthy reference condition, and small proportions of either CN or CBF blast cells are spiked into the other two parts to create the semi-simulated MRD conditions CN and CBF. We are then interested in detecting the differentially abundant rare population of either CN or CBF blasts in differential comparisons between CN and healthy, or CBF and healthy.
The dataset contains 16 surface protein markers used to define cell populations. For additional details, including numbers of cells per sample, see Weber et al. (2019), Supplementary Note 1 (in particular Supplementary Tables 1 and 2).
Multiple simulations are available, as described in our paper (Weber et al., 2019). These are stored in the objects listed below.
In each case, the objects are available in both SummarizedExperiment
and
flowSet
formats, with cells stored in rows, and protein markers in columns (i.e.
the usual format for cytometry data). After loading the datasets, they can be inspected using the
standard accessor functions for either SummarizedExperiments
or flowSets
(e.g. for
SummarizedExperiments
: rowData
, colData
, assays
,
and metadata
).
For the SummarizedExperiments
: assays
contain tables of expression values
(multiple assays
for datasets with multiple replicates);
rowData
contains group IDs, patient IDs, sample IDs, and a column identifying spike-in cells;
colData
contains channel names, marker names, and marker classes; and
metadata
contains experiment information and number of cells.
For the flowSets
: individual flowFrames
within the flowSet
contain tables
of expression values (multiple flowFrames
for datasets with multiple replicates);
row data is stored as additional columns of numeric values within the expression tables;
column data is stored in the pData(parameters())
slot of the individual flowFrames
; and
additional information (e.g. experiment information, marker information, replicate information,
and lookup tables to identify row data values) is stored in the description()
slot of
the flowFrames
.
Main simulations
Separate files for each threshold of abundance (5%, 1%, and 0.1%), as well additional objects containing all blast cells.
Weber_AML_sim_main_5pc_SE (31.2 MB)
Weber_AML_sim_main_5pc_flowSet (31.2 MB)
Weber_AML_sim_main_1pc_SE (30.4 MB)
Weber_AML_sim_main_1pc_flowSet (30.4 MB)
Weber_AML_sim_main_0.1pc_SE (30.2 MB)
Weber_AML_sim_main_0.1pc_flowSet (30.2 MB)
Weber_AML_sim_main_blasts_all_SE (11.0 MB)
Weber_AML_sim_main_blasts_all_flowSet (11.0 MB)
Additional simulations: null simulations
Weber_AML_sim_null_SE (90.3 MB)
Weber_AML_sim_null_flowSet (90.3 MB)
Additional simulations: modified random seeds
Separate files for each threshold of abundance (5%, 1%, and 0.1%).
Weber_AML_sim_random_seeds_5pc_SE (93.6 MB)
Weber_AML_sim_random_seeds_5pc_flowSet (93.6 MB)
Weber_AML_sim_random_seeds_1pc_SE (91.1 MB)
Weber_AML_sim_random_seeds_1pc_flowSet (91.1 MB)
Weber_AML_sim_random_seeds_0.1pc_SE (90.6 MB)
Weber_AML_sim_random_seeds_0.1pc_flowSet (90.6 MB)
Additional simulations: 'less distinct' spike-in cells
Separate files for each threshold of abundance (5%, 1%, and 0.1%).
Weber_AML_sim_less_distinct_5pc_SE (64.1 MB)
Weber_AML_sim_less_distinct_5pc_flowSet (64.1 MB)
Weber_AML_sim_less_distinct_1pc_SE (61.1 MB)
Weber_AML_sim_less_distinct_1pc_flowSet (61.1 MB)
Weber_AML_sim_less_distinct_0.1pc_SE (60.4 MB)
Weber_AML_sim_less_distinct_0.1pc_flowSet (60.4 MB)
Note that prior to performing any downstream analyses, the expression values should be
transformed. A standard transformation used for mass cytometry data is the asinh
with cofactor = 5
.
The raw data is sourced from Levine et al. (2015), and the data generation strategy is modified from Arvaniti et al. (2017). See Weber et al. (2019), Supplementary Note 1, for more details.
Original links to raw data from Cytobank:
- all cells (also contains gating scheme for CD34+CD45mid cells, i.e. blasts): https://community.cytobank.org/cytobank/experiments/46098/illustrations/121588
- blasts (repository cloned from the one for 'all cells' above, using the gating scheme for CD34+CD45mid cells; this allows .fcs files for the subset to be exported): https://community.cytobank.org/cytobank/experiments/63534/illustrations/125318
Data files are also available from FlowRepository (FR-FCM-ZYL8): http://flowrepository.org/id/FR-FCM-ZYL8
Returns a SummarizedExperiment
or flowSet
object.
Arvaniti and Claassen (2017), "Sensitive detection of rare disease-associated cell subsets via representation learning", Nature Communications, 8:14825: https://www.ncbi.nlm.nih.gov/pubmed/28382969
Levine et al. (2015), "Data-driven phenotypic dissection of AML reveals progenitor-like cells that correlate with prognosis", Cell, 162, 184-197: https://www.ncbi.nlm.nih.gov/pubmed/26095251
Weber et al. (2019). "diffcyt: Differential discovery in high-dimensional cytometry via high-resolution clustering." Communications Biology, 2:183: https://www.ncbi.nlm.nih.gov/pubmed/31098416
Weber_AML_sim_main_5pc_SE()
Weber_AML_sim_main_5pc_flowSet()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.