Tutorial 2: Using `metadata_ondisc_matrix` and `multimodal_ondisc_matrix`

  collapse = TRUE,
  comment = "#>"

Tutorial 1 covered ondisc_matrix, the core class implemented by ondisc. Thus tutorial covers metadata_ondisc_matrix and multimodal_ondisc_matrix, two additional classes provided by the package. metadata_ondisc_matrix stores cell-specific and feature-specific covariate matrices alongside the expression matrix, and multimodal_ondisc_matrix stores multiple metadata_ondisc_matrices representing different cellular modalities. Together, metadata_ondisc_matrix and multimodal_ondisc_matrix facilitate feature selection, quality control, and other common single-cell data preprocessing tasks.

We begin by loading the package.


The metadata_ondisc_matrix class

A metadata_ondisc_matrix object consists of three components: (i) an ondisc_matrix representing the expression data, (ii) a data frame storing the cell-specific covariates, and (iii) a data frame storing the feature-specific covariates. The easiest way to initialize a metadata_ondisc_matrix is by calling create_ondisc_matrix_from_mtx on an mtx file and associated metadata files, setting the optional parameter return_metadata_ondisc_matrix to TRUE. Below, we reproduce the example from Tutorial 1, this time returning a metadata_ondisc_matrix instead of a list.

# Set paths to the .mtx and .tsv files
raw_data_dir <- system.file("extdata", package = "ondisc")
mtx_fp <- paste0(raw_data_dir, "/gene_expression.mtx")
barcodes_fp <- paste0(raw_data_dir, "/cell_barcodes.tsv")
features_fp <- paste0(raw_data_dir, "/genes.tsv")

# Specify directory in which to store the .h5 file
temp_dir <- tempdir()

# Initialize metadata_ondisc_matrix
expressions <- create_ondisc_matrix_from_mtx(mtx_fp = mtx_fp,
                                              barcodes_fp = barcodes_fp,
                                              features_fp = features_fp,
                                              on_disk_dir = temp_dir,
                                              return_metadata_ondisc_matrix = TRUE)

The variable expressions is an object of class metadata_ondisc_matrix; expressions contains the fields ondisc_matrix, cell_covariates, and feature_covariates.

# Print the variable

We alternately can initialize a metadata_ondisc_matrix by calling the constructor function of the metadata_ondisc_matrix class; see documentation (via ?metadata_ondisc_matrix) for details.

The multimodal_ondisc_matrix class

The multimodal_ondisc_matrix class is used to represent multimodal data. multimodal_ondisc_matrix objects have two fields: (i) a named list of metadata_ondisc_matrices representing different modalities, and (ii) a global (i.e., cross-modality) cell-specific covariate matrix. The ondisc package ships with example CRISPR perturbation data, which we use to initialize a new perturbation modality via a call to create_ondisc_matrix_from_mtx.

# Set paths to the perturbation .mtx and .tsv files
mtx_fp <- paste0(raw_data_dir, "/perturbation.mtx")
barcodes_fp <- paste0(raw_data_dir, "/cell_barcodes.tsv")
features_fp <- paste0(raw_data_dir, "/guides.tsv")

# Initialize metadata_ondisc_matrix
perturbations <- create_ondisc_matrix_from_mtx(mtx_fp = mtx_fp,
                                               barcodes_fp = barcodes_fp,
                                               features_fp = features_fp,
                                               on_disk_dir = temp_dir,
                                               return_metadata_ondisc_matrix = TRUE)

Like expressions, the variable perturbations is an object of class metadata_ondisc_matrix. However, because perturbations represents logical perturabtion data instead of integer gene expression data, the cell-specific and feature-specific covariates of perturbations differ from those of expressions.

# These matrices have different columns

The expressions and perturbations data are multimodal -- they are collected from the same set of cells. We can create a multimodal_ondisc_matrix by passing a named list of metadata_ondisc_matrix objects -- in this case, expressions and perturbations -- to the constructor function of the multimodal_ondisc_matrix class.

modality_list <- list(expressions = expressions, perturbations = perturbations)
crispr_experiment <- multimodal_ondisc_matrix(modality_list)

The variable crispr_experiment is an object of class multimodal_ondisc_matrix. The column names of the global covariate matrix are derived from the names of the modalities.

# print variable

# show the global covariate matrix

The figure below summarizes the relationship between ondisc_matrix, metadata_ondisc_matrix, and multimodal_ondisc_matrix.


Querying basic information

We can use the functions get_feature_ids, get_feature_names, and get_cell_barcodes to obtain the feature IDs, feature names (if applicable), and cell barcodes, respectively, of a metadata_ondisc_matrix or a multimodal_ondisc_matrix. get_feature_ids and get_feature_names return a list when called on a multimodal_ondisc_matrix, as the different modalities contain different features.

# metadata_ondisc_matrix
cell_barcodes <- get_cell_barcodes(expressions)
feature_ids <- get_feature_ids(expressions)
feature_names <- get_feature_names(expressions)

# multimodal_ondisc_matrix
cell_barcodes <- get_cell_barcodes(crispr_experiment)
feature_ids <- get_feature_ids(crispr_experiment)

We likewise can use dim, nrow, and ncol to query the dimension, number of rows, and number of columns of a metadata_ondisc_matrix or multimodal_ondisc_matrix. dim and nrow again return lists when called on a multimodal_ondisc_matrix.

# metadata_ondisc_matrix

# multimodal_ondisc_matrix


Similar to ondisc_matrices, metadata_ondisc_matrices and multimodal_ondisc_matrices can be subset using the [ operator. metadata_ondisc_matrices can be subset either by feature or cell, while multimodal_ondisc_matrices can be subset by cell only.

# metadata_ondisc_matrix
# keep cells 100 - 150
expressions_sub <- expressions[,100:150]
# keep genes ENSG00000188305, ENSG00000257284, ENSG00000251655
expressions_sub <- expressions[c("ENSG00000188305", "ENSG00000257284", "ENSG00000251655"),]

# multimodal_ondisc_matrix
# keep all cells except 1 - 100
crispr_experiment_sub <- crispr_experiment[,-c(1:100)]

As with ondisc_matrices, the original objects remain unchanged.


Notes and tips

Try the ondisc package in your browser

Any scripts or data that you put into this service are public.

ondisc documentation built on March 5, 2021, 5:07 p.m.