HochSchulz_2022_Melanoma: Obtain the HochSchulz_2022_Melanoma dataset

HochSchulz_2022_MelanomaR Documentation

Obtain the HochSchulz_2022_Melanoma dataset

Description

Obtain the HochSchulz_2022_Melanoma dataset, which is composed of two panels (rna and protein) that were acquired on consecutive sections. Each dataset (panel) is composed of three data objects: single cell data, multichannel images and cell segmentation masks. The data was obtained by imaging mass cytometry (IMC) of a tissue microarray (TMA) with multiple cores of formalin-fixed paraffin-embedded (FFPE) tissue from 69 patients with metastatic melanoma.

Usage

HochSchulz_2022_Melanoma(
  data_type = c("sce", "spe", "images", "masks"),
  panel = "rna",
  full_dataset = FALSE,
  version = "latest",
  metadata = FALSE,
  on_disk = FALSE,
  h5FilesPath = NULL,
  force = FALSE
)

Arguments

data_type

type of object to load, 'images' for multichannel images or 'masks' for cell segmentation masks. Single cell data are retrieved using either 'sce' for the SingleCellExperiment format or 'spe' for the SpatialExperiment format.

panel

which panel should be returned? Can be set to "rna" (default) or "protein".

full_dataset

if FALSE (default), a subset corresponding to the 50 images containing the most B cells is returned. If TRUE, the full dataset (corresponding to 166 images) is returned. Due to memory space limitations, this option is only available for single cell data and masks, not for data_type = "images".

version

dataset version. By default, the latest version is returned.

metadata

if FALSE (default), the data object selected in data_type is returned. If TRUE, only the metadata associated to this object is returned.

on_disk

logical indicating if images in form of HDF5Array objects (as .h5 files) should be stored on disk rather than in memory. This setting is valid when downloading images and masks.

h5FilesPath

path to where the .h5 files for on disk representation are stored. This path needs to be defined when on_disk = TRUE. When files should only temporarily be stored on disk, please set h5FilesPath = getHDF5DumpDir().

force

logical indicating if images should be overwritten when files with the same name already exist on disk.

Details

This is an Imaging Mass Cytometry (IMC) dataset from Hoch, Schulz et al. (2022):

  • images contains fifty 38-channel images in the form of a CytoImageList class object.

  • masks contains the cell segmentation masks associated with the images, in the form of a CytoImageList class object.

  • sce contains the single cell data extracted from the multichannel images using the cell segmentation masks, as well as the associated metadata, in the form of a SingleCellExperiment object.

  • spe same single cell data as for sce, but in the SpatialExperiment format.

All data are downloaded from ExperimentHub and cached for local re-use.

Mapping between the three data objects is performed via variables located in their metadata columns: mcols() for the CytoImageList objects and ColData() for the SingleCellExperiment and SpatialExperiment objects. Mapping at the image level can be performed with the image_name or image_number variables. Mapping between cell segmentation masks and single cell data is performed with the cell_number variable, the values of which correspond to the intensity values of the masks object. For practical examples, please refer to the "Accessing IMC datasets" vignette.

The assay slots of the SingleCellExperiment and SpatialExperiment objects contain three assays:

  • counts contains raw mean ion counts per cell.

  • exprs contains arsinh-transformed counts, with cofactor 1.

  • scaled_counts contains scaled counts.

  • scaled_exprs contains scaled asinh-transformed counts.

The marker-associated metadata, including antibody information and metal tags are stored in the rowData of the SingleCellExperiment / SpatialExperiment objects.

The cell-associated metadata are stored in the colData of the SingleCellExperiment and SpatialExperiment objects. These metadata include various information about cells, milieu, samples, and patients. For instance, cell types can be retrieved with colData(sce)$cell_type and cell clusters with colData(sce)$cell_cluster.

Neighborhood information, defined here as cells that are localized next to each other, is stored as a SelfHits object in the colPairs slot of the SingleCellExperiment and SpatialExperiment objects.

For more information, please refer to the Hoch, Schulz, et al. publication.

Dataset versions: a version argument can be passed to the function to specify which dataset version should be retrieved.

  • `v1`: first published version

File sizes:

  • `images_rna`: size in memory = 13.9 Gb, size on disk = 954 Mb.

  • `masks_rna`: size in memory = 347 Mb, size on disk = 11 Mb.

  • `sce_rna`: size in memory = 774 Mb, size on disk = 401 Mb.

  • `masks_full_rna`: size in memory = 1.1 Gb, size on disk = 30 Mb.

  • `sce_full_rna`: size in memory = 2.0 Gb, size on disk = 1.1 Gb.

  • `images_protein`: size in memory = 16.8 Gb, size on disk = 1.2 Gb.

  • `masks_protein`: size in memory = 374 Mb, size on disk = 12 Mb.

  • `sce_protein`: size in memory = 856 Mb, size on disk = 531 Mb.

  • `masks_full_protein`: size in memory = 1.2 Gb, size on disk = 35 Mb.

  • `sce_full_protein`: size in memory = 2.2 Gb, size on disk = 1.4 Gb.

When storing images on disk, these need to be first fully read into memory before writing them to disk. This means the process of downloading the data is slower than directly keeping them in memory. However, downstream analysis will lose its memory overhead when storing images on disk.

Original source: Hoch, Schulz et al. (2022): https://doi.org/10.1126/sciimmunol.abk1692

Original link to raw data: https://doi.org/10.5281/zenodo.5994136.

Value

A SingleCellExperiment object with single cell data, a SpatialExperiment object with single cell data, a CytoImageList object containing multichannel images, or a CytoImageList object containing cell segmentation masks.

Author(s)

Nicolas Damond

References

Hoch, Schulz et al. (2022). Multiplexed imaging mass cytometry of the chemokine milieus in melanoma characterizes features of the response to immunotherapy Sci Immunol 7(70):eabk1692.

Examples

# Load single cell data
sce <- HochSchulz_2022_Melanoma(data_type = "sce")
print(sce)

# Display metadata
HochSchulz_2022_Melanoma(data_type = "sce", metadata = TRUE)

# Load masks on disk
library(HDF5Array)
masks <- HochSchulz_2022_Melanoma(data_type = "masks", on_disk = TRUE,
h5FilesPath = getHDF5DumpDir())
print(head(masks))


BodenmillerGroup/imcdatasets documentation built on March 20, 2024, 9:24 a.m.