Tutorial 2: Using `metadata_ondisc_matrix` and `multimodal_ondisc_matrix`

Tutorial 1 covered ondisc_matrix, the core class implemented by ondisc. Thus tutorial covers metadata_ondisc_matrix and multimodal_ondisc_matrix, two additional classes provided by the package. metadata_ondisc_matrix stores cell-specific and feature-specific covariate matrices alongside the expression matrix, and multimodal_ondisc_matrix stores multiple metadata_ondisc_matrices representing different cellular modalities. Together, metadata_ondisc_matrix and multimodal_ondisc_matrix facilitate feature selection, quality control, and other common single-cell data preprocessing tasks.

We begin by loading the package.


The metadata_ondisc_matrix class

A metadata_ondisc_matrix object consists of three components: (i) an ondisc_matrix representing the expression data, (ii) a data frame storing the cell-specific covariates, and (iii) a data frame storing the feature-specific covariates. The easiest way to initialize a metadata_ondisc_matrix is by calling create_ondisc_matrix_from_mtx on an mtx file and associated metadata files, setting the optional parameter return_metadata_ondisc_matrix to TRUE. Below, we reproduce the example from Tutorial 1, this time returning a metadata_ondisc_matrix instead of a list.

# Set paths to the .mtx and .tsv files
raw_data_dir <- system.file("extdata", package = "ondisc")
mtx_fp <- paste0(raw_data_dir, "/gene_expression.mtx")
barcodes_fp <- paste0(raw_data_dir, "/cell_barcodes.tsv")
features_fp <- paste0(raw_data_dir, "/genes.tsv")

# Specify directory in which to store the .h5 file
temp_dir <- tempdir()

# Initialize metadata_ondisc_matrix
expressions <- create_ondisc_matrix_from_mtx(mtx_fp = mtx_fp,
                                              barcodes_fp = barcodes_fp,
                                              features_fp = features_fp,
                                              on_disk_dir = temp_dir,
                                              return_metadata_ondisc_matrix = TRUE)

The variable expressions is an object of class metadata_ondisc_matrix; expressions contains the fields ondisc_matrix, cell_covariates, and feature_covariates.

# Print the variable

We alternately can initialize a metadata_ondisc_matrix by calling the constructor function of the metadata_ondisc_matrix class; see documentation (via ?metadata_ondisc_matrix) for details.

The multimodal_ondisc_matrix class

The multimodal_ondisc_matrix class is used to represent multimodal data. multimodal_ondisc_matrix objects have two fields: (i) a named list of metadata_ondisc_matrices representing different modalities, and (ii) a global (i.e., cross-modality) cell-specific covariate matrix. The ondisc package ships with example CRISPR perturbation data, which we use to initialize a new perturbation modality via a call to create_ondisc_matrix_from_mtx.

# Set paths to the perturbation .mtx and .tsv files
mtx_fp <- paste0(raw_data_dir, "/perturbation.mtx")
barcodes_fp <- paste0(raw_data_dir, "/cell_barcodes.tsv")
features_fp <- paste0(raw_data_dir, "/guides.tsv")

# Initialize metadata_ondisc_matrix
perturbations <- create_ondisc_matrix_from_mtx(mtx_fp = mtx_fp,
                                               barcodes_fp = barcodes_fp,
                                               features_fp = features_fp,
                                               on_disk_dir = temp_dir,
                                               return_metadata_ondisc_matrix = TRUE)

Like expressions, the variable perturbations is an object of class metadata_ondisc_matrix. However, because perturbations represents logical perturabtion data instead of integer gene expression data, the cell-specific and feature-specific covariates of perturbations differ from those of expressions.

# These matrices have different columns

The expressions and perturbations data are multimodal -- they are collected from the same set of cells. We can create a multimodal_ondisc_matrix by passing a named list of metadata_ondisc_matrix objects -- in this case, expressions and perturbations -- to the constructor function of the multimodal_ondisc_matrix class.

modality_list <- list(expressions = expressions, perturbations = perturbations)
crispr_experiment <- multimodal_ondisc_matrix(modality_list)

The variable crispr_experiment is an object of class multimodal_ondisc_matrix. The column names of the global covariate matrix are derived from the names of the modalities.

# print variable

# show the global covariate matrix

The figure below summarizes the relationship between ondisc_matrix, metadata_ondisc_matrix, and multimodal_ondisc_matrix.


Querying basic information

We can use the functions get_feature_ids, get_feature_names, and get_cell_barcodes to obtain the feature IDs, feature names (if applicable), and cell barcodes, respectively, of a metadata_ondisc_matrix or a multimodal_ondisc_matrix. get_feature_ids and get_feature_names return a list when called on a multimodal_ondisc_matrix, as the different modalities contain different features.

# metadata_ondisc_matrix
cell_barcodes <- get_cell_barcodes(expressions)
feature_ids <- get_feature_ids(expressions)
feature_names <- get_feature_names(expressions)

# multimodal_ondisc_matrix
cell_barcodes <- get_cell_barcodes(crispr_experiment)
feature_ids <- get_feature_ids(crispr_experiment)

We likewise can use dim, nrow, and ncol to query the dimension, number of rows, and number of columns of a metadata_ondisc_matrix or multimodal_ondisc_matrix. dim and nrow again return lists when called on a multimodal_ondisc_matrix.

# metadata_ondisc_matrix

# multimodal_ondisc_matrix


Similar to ondisc_matrices, metadata_ondisc_matrices and multimodal_ondisc_matrices can be subset using the [ operator. metadata_ondisc_matrices can be subset either by feature or cell, while multimodal_ondisc_matrices can be subset by cell only.

# metadata_ondisc_matrix
# keep cells 100 - 150
expressions_sub <- expressions[,100:150]
# keep genes ENSG00000188305, ENSG00000257284, ENSG00000251655
expressions_sub <- expressions[c("ENSG00000188305", "ENSG00000257284", "ENSG00000251655"),]

# multimodal_ondisc_matrix
# keep all cells except 1 - 100
crispr_experiment_sub <- crispr_experiment[,-c(1:100)]

As with ondisc_matrices, the original objects remain unchanged.


Notes and tips

