ZanotelliSpheroids2020Data: Obtain the zanotelli-spheroids-2020 dataset

ZanotelliSpheroids2020DataR Documentation

Obtain the zanotelli-spheroids-2020 dataset

Description

Obtain the zanotelli-spheroids-2020 dataset, which consists of three data objects: single cell data, multichannel images and cell segmentation masks. The data were obtained by imaging mass cytometry of sections of 3D spheroids generated from different cell lines.

Usage

ZanotelliSpheroids2020Data(
  data_type = c("sce", "images", "masks"),
  metadata = FALSE,
  on_disk = FALSE,
  h5FilesPath = NULL,
  force = FALSE
)

Arguments

data_type

type of data to load, should be sce for single cell data, images for multichannel images or masks for cell segmentation masks.

metadata

if FALSE (default), the data object selected in data_type is returned. If TRUE, only the metadata associated to this object is returned.

on_disk

logical indicating if images in form of HDF5Array objects (as .h5 files) should be stored on disk rather than in memory. This setting is valid when downloading images and masks.

h5FilesPath

path to where the .h5 files for on disk representation are stored. This path needs to be defined when on_disk = TRUE. When files should only temporarily be stored on disk, please set h5FilesPath = getHDF5DumpDir()

force

logical indicating if images should be overwritten when files with the same name already exist on disk.

Details

This is an Imaging Mass Cytometry (IMC) dataset from Zanotelli et al. (2020), consisting of three data objects:

  • images contains 517 multichannel images, each containing 51 channels, in the form of a CytoImageList class object.

  • masks contains the cell segmentation masks associated with the images, in the form of a CytoImageList class object.

  • sce contains the single cell data extracted from the multichannel images using the cell segmentation masks, as well as the associated metadata, in the form of a SingleCellExperiment. This represents a total of 229,047 cells x 51 channels.

All data are downloaded from ExperimentHub and cached for local re-use.

Mapping between the three data objects is performed via variables located in their metadata columns: mcols() for the CytoImageList objects and ColData() for the SingleCellExperiment object. Mapping at the image level can be performed with the ImageName or ImageNumber variables. Mapping between cell segmentation masks and single cell data is performed with the CellNumber variable, the values of which correspond to the intensity values of the ZanotelliSpheroids2020_masks object. For practical examples, please refer to the "Accessing IMC datasets" vignette.

This dataset was obtained as following (the names of the experimental variables, located in the colData of the SingleCellExperiment object, are indicated in parentheses): i) Cells from four different cell lines (cellline) were seeded at three different densities (concentration, relative densities) and grown for either 72 or 96 hours (time_point, duration in hours). In the appropriate experimental conditions (see the paper for details), the cells aggregate into 3D spheroids. ii) Cells were harvested and pooled into 60-well barcoding plates. iii) A pellet of each spheroid pool was generated and cut into several 6 um-thick sections. iv) A subset of these sections (site_id) were stained with an IMC panel and acquired as one or more acquisitions (acquisition_id) containing multiple spheres each. v) Spheres in these acquisitions were identified by computer vision and cropped into individual images (ImageNumber).

Other relevant cell metadata include:

  • condition_name: experimental conditions in the format: "Cell line name"_c"seeding density"_tp"time point".

  • Center_X/Y: object centroid position in image.

  • Area: area of the cell (um^2).

  • dist.rim: estimated distance to spheroid border.

  • dist.sphere: distance to spheroid section border.

  • dist.other: distance to the closest of the other spheroid sections in the same image (if there is any).

  • dist.bg: distance to background pixels.

  • counts_neighb: contains arsinh-transformed counts (cofactor = 1).

  • exprs_neighb: contains arsinh-transformed counts (cofactor 1).

For a full description of the other experimental variables, please refer to the publication (https://doi.org/10.15252/msb.20209798) and to the original dataset repository (https://doi.org/10.5281/zenodo.4271910).

The marker-associated metadata, including antibody information and metal tags are stored in the rowData of the SingleCellExperiment object. The channels with names starting with "BC_" are the channels used for barcoding. Post-transcriptional modification of the protein targets are indicated in brackets.

The assay slot of the SingleCellExperiment object contains four assays:

  • counts: mean ion counts per cell.

  • exprs: arsinh-transformed counts per cell, with cofactor 1.

  • counts_neighb: mean ion counts of the neighboring cells.

  • exprs_neighb: arsinh-transformed counts (cofactor 1) of the neighboring cells.

The metadata slot of the SingleCellExperiment object contains a graph of cell neighbors, generated with the igraph::graph_from_data_frame function.

File sizes:

  • `images`: size in memory = 21.2 Gb, size on disk = 881 Mb.

  • `masks`: size in memory = 426 Mb, size on disk = 11.6 Mb.

  • `sce`: size in memory = 584 Mb, size on disk = 340 Mb.

When storing images on disk, these need to be first fully read into memory before writing them to disk. This means the process of downloading the data is slower than directly keeping them in memory. However, downstream analysis will lose its memory overhead when storing images on disk.

Original source: Zanotelli et al. (2020): https://doi.org/10.15252/msb.20209798

Original link to raw data, also containing the entire dataset: https://doi.org/10.5281/zenodo.4271910

Value

A SingleCellExperiment object with single cell data, a CytoImageList object containing multichannel images, or a CytoImageList object containing cell segmentation masks.

Author(s)

Nicolas Damond

References

Zanotelli VRT et al. (2020). A quantitative analysis of the interplay of environment, neighborhood, and cell state in 3D spheroids Mol Syst Biol 16(12), e9798.

Examples

# Load single cell data
sce <- ZanotelliSpheroids2020Data(data_type = "sce")
print(sce)

# Display metadata
ZanotelliSpheroids2020Data(data_type = "sce", metadata = TRUE)

# Load masks on disk
library(HDF5Array)
masks <- ZanotelliSpheroids2020Data(data_type = "masks", on_disk = TRUE,
h5FilesPath = getHDF5DumpDir())
print(head(masks))


BodenmillerGroup/imcdatasets documentation built on July 5, 2022, 4:34 p.m.