JacksonFischer2020Data: Obtain the jackson-fischer-2020 dataset

JacksonFischer2020DataR Documentation

Obtain the jackson-fischer-2020 dataset

Description

Obtain the jackson-fischer-2020 dataset, which consists of three data objects: single cell data, multichannel images and cell segmentation masks. The data was obtained by imaging mass cytometry of tumour tissue from patients with breast cancer.

Usage

JacksonFischer2020Data(
  data_type = c("sce", "images", "masks"),
  metadata = FALSE,
  on_disk = FALSE,
  h5FilesPath = NULL,
  force = FALSE
)

Arguments

data_type

type of object to load, should be 'sce' for single cell data, 'images' for multichannel images or 'masks' for cell segmentation masks.

metadata

if FALSE (default), the data object selected in data_type is returned. If TRUE, only the metadata associated to this object is returned.

on_disk

logical indicating if images in form of HDF5Array objects (as .h5 files) should be stored on disk rather than in memory. This setting is valid when downloading images and masks.

h5FilesPath

path to where the .h5 files for on disk representation are stored. This path needs to be defined when on_disk = TRUE. When files should only temporarily be stored on disk, please set h5FilesPath = getHDF5DumpDir()

force

logical indicating if images should be overwritten when files with the same name already exist on disk.

Details

This is an Imaging Mass Cytometry (IMC) dataset from Jackson, Fischer et al. (2020), consisting of three data objects:

  • images contains a hundred 42-channel images in the form of a CytoImageList class object.

  • masks contains the cell segmentation masks associated with the images, in the form of a CytoImageList class object.

  • sce contains the single cell data extracted from the multichannel images using the cell segmentation masks, as well as the associated metadata, in the form of a SingleCellExperiment. This represents a total of 285,851 cells x 42 channels.

All data are downloaded from ExperimentHub and cached for local re-use.

Mapping between the three data objects is performed via variables located in their metadata columns: mcols() for the CytoImageList objects and ColData() for the SingleCellExperiment object. Mapping at the image level can be performed with the ImageNb variable. Mapping between cell segmentation masks and single cell data is performed with the CellNb variable, the values of which correspond to the intensity values of the JacksonFischer2020_masks object. For practical examples, please refer to the "Accessing IMC datasets" vignette.

This dataset is a subset of the complete Jackson, Fischer et al. (2020) dataset comprising the data from tumour tissue from 100 patients with breast cancer (one image per patient).

The assay slot of the SingleCellExperiment object contains three assays:

  • counts contains mean ion counts per cell.

  • exprs contains arsinh-transformed counts, with cofactor 1.

  • quant_norm contains quantile-normalized counts (0 to 1, 99th percentile).

The marker-associated metadata, including antibody information and metal tags are stored in the rowData of the SingleCellExperiment object.

The cell-associated metadata are stored in the colData of the SingleCellExperiment object. These metadata include clusters (in colData(sce)$PhenoGraphBasel) and metaclusters (in colData(sce)$metacluster), as well as spatial information (e.g., cell areas are stored in colData(sce)$Area).

The patient-associated clinical data are also stored in the colData of the SingleCellExperiment object. For instance, the tumor grades can be retrieved with colData(sce)$grade.

File sizes:

  • `images`: size in memory = 17.8 Gb, size on disk = 1.99 Gb.

  • `masks`: size in memory = 433 Mb, size on disk = 10.2 Mb.

  • `sce`: size in memory = 517 Mb, size on disk = 272 Mb.

When storing images on disk, these need to be first fully read into memory before writing them to disk. This means the process of downloading the data is slower than directly keeping them in memory. However, downstream analysis will lose its memory overhead when storing images on disk.

Original source: Jackson, Fischer et al. (2020): https://doi.org/10.1038/s41586-019-1876-x

Original link to raw data, containing the entire dataset: https://doi.org/10.5281/zenodo.3518284

Value

A SingleCellExperiment object with single cell data, a CytoImageList object containing multichannel images, or a CytoImageList object containing cell masks.

Author(s)

Jana Fischer

References

Jackson, Fischer et al. (2020). The single-cell pathology landscape of breast cancer. Nature 578(7796), 615-620.

Examples

# Load single cell data
sce <- JacksonFischer2020Data(data_type = "sce")
print(sce)

# Display metadata
JacksonFischer2020Data(data_type = "sce", metadata = TRUE)

# Load masks on disk
library(HDF5Array)
masks <- JacksonFischer2020Data(data_type = "masks", on_disk = TRUE,
h5FilesPath = getHDF5DumpDir())
print(head(masks))



BodenmillerGroup/imcdatasets documentation built on July 5, 2022, 4:34 p.m.