knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
Tutorial 1 covered ondisc_matrix
, the core class implemented by ondisc
. Thus tutorial covers metadata_ondisc_matrix
and multimodal_ondisc_matrix
, two additional classes provided by the package. metadata_ondisc_matrix
stores cell-specific and feature-specific covariate matrices alongside the expression matrix, and multimodal_ondisc_matrix
stores multiple metadata_ondisc_matrices
representing different cellular modalities. Together, metadata_ondisc_matrix
and multimodal_ondisc_matrix
facilitate feature selection, quality control, and other common single-cell data preprocessing tasks.
We begin by loading the package.
library(ondisc)
metadata_ondisc_matrix
classA metadata_ondisc_matrix
object consists of three components: (i) an ondisc_matrix
representing the expression data, (ii) a data frame storing the cell-specific covariates, and (iii) a data frame storing the feature-specific covariates. The easiest way to initialize a metadata_ondisc_matrix
is by calling create_ondisc_matrix_from_mtx
on an mtx file and associated metadata files, setting the optional parameter return_metadata_ondisc_matrix
to TRUE
. Below, we reproduce the example from Tutorial 1, this time returning a metadata_ondisc_matrix
instead of a list.
# Set paths to the .mtx and .tsv files raw_data_dir <- system.file("extdata", package = "ondisc") mtx_fp <- paste0(raw_data_dir, "/gene_expression.mtx") barcodes_fp <- paste0(raw_data_dir, "/cell_barcodes.tsv") features_fp <- paste0(raw_data_dir, "/genes.tsv") # Specify directory in which to store the .h5 file temp_dir <- tempdir() # Initialize metadata_ondisc_matrix expressions <- create_ondisc_matrix_from_mtx(mtx_fp = mtx_fp, barcodes_fp = barcodes_fp, features_fp = features_fp, on_disk_dir = temp_dir, return_metadata_ondisc_matrix = TRUE)
The variable expressions
is an object of class metadata_ondisc_matrix
; expressions
contains the fields ondisc_matrix
, cell_covariates
, and feature_covariates
.
# Print the variable
expressions
We alternately can initialize a metadata_ondisc_matrix
by calling the constructor function of the metadata_ondisc_matrix
class; see documentation (via ?metadata_ondisc_matrix) for details.
multimodal_ondisc_matrix
classThe multimodal_ondisc_matrix
class is used to represent multimodal data. multimodal_ondisc_matrix
objects have two fields: (i) a named list of metadata_ondisc_matrices
representing different modalities, and (ii) a global (i.e., cross-modality) cell-specific covariate matrix. The ondisc
package ships with example CRISPR perturbation data, which we use to initialize a new perturbation modality via a call to create_ondisc_matrix_from_mtx
.
# Set paths to the perturbation .mtx and .tsv files mtx_fp <- paste0(raw_data_dir, "/perturbation.mtx") barcodes_fp <- paste0(raw_data_dir, "/cell_barcodes.tsv") features_fp <- paste0(raw_data_dir, "/guides.tsv") # Initialize metadata_ondisc_matrix perturbations <- create_ondisc_matrix_from_mtx(mtx_fp = mtx_fp, barcodes_fp = barcodes_fp, features_fp = features_fp, on_disk_dir = temp_dir, return_metadata_ondisc_matrix = TRUE)
Like expressions
, the variable perturbations
is an object of class metadata_ondisc_matrix
. However, because perturbations
represents logical perturabtion data instead of integer gene expression data, the cell-specific and feature-specific covariates of perturbations
differ from those of expressions
.
# These matrices have different columns head(expressions@cell_covariates) head(perturbations@cell_covariates)
The expressions
and perturbations
data are multimodal -- they are collected from the same set of cells. We can create a multimodal_ondisc_matrix
by passing a named list of metadata_ondisc_matrix
objects -- in this case, expressions
and perturbations
-- to the constructor function of the multimodal_ondisc_matrix
class.
modality_list <- list(expressions = expressions, perturbations = perturbations) crispr_experiment <- multimodal_ondisc_matrix(modality_list)
The variable crispr_experiment
is an object of class multimodal_ondisc_matrix
. The column names of the global covariate matrix are derived from the names of the modalities.
# print variable crispr_experiment # show the global covariate matrix head(crispr_experiment@global_cell_covariates)
The figure below summarizes the relationship between ondisc_matrix
, metadata_ondisc_matrix
, and multimodal_ondisc_matrix
.
knitr::include_graphics("classes_cropped.jpg")
We can use the functions get_feature_ids
, get_feature_names
, and get_cell_barcodes
to obtain the feature IDs, feature names (if applicable), and cell barcodes, respectively, of a metadata_ondisc_matrix
or a multimodal_ondisc_matrix
. get_feature_ids
and get_feature_names
return a list when called on a multimodal_ondisc_matrix
, as the different modalities contain different features.
# metadata_ondisc_matrix cell_barcodes <- get_cell_barcodes(expressions) feature_ids <- get_feature_ids(expressions) feature_names <- get_feature_names(expressions) # multimodal_ondisc_matrix cell_barcodes <- get_cell_barcodes(crispr_experiment) feature_ids <- get_feature_ids(crispr_experiment)
We likewise can use dim
, nrow
, and ncol
to query the dimension, number of rows, and number of
columns of a metadata_ondisc_matrix
or multimodal_ondisc_matrix
. dim
and nrow
again return lists when called on a multimodal_ondisc_matrix
.
# metadata_ondisc_matrix dim(expressions) nrow(expressions) ncol(expressions) # multimodal_ondisc_matrix dim(crispr_experiment) nrow(crispr_experiment) ncol(crispr_experiment)
Similar to ondisc_matrices
, metadata_ondisc_matrices
and multimodal_ondisc_matrices
can be subset using the [
operator. metadata_ondisc_matrices
can be subset either by feature or cell, while multimodal_ondisc_matrices
can be subset by cell only.
# metadata_ondisc_matrix # keep cells 100 - 150 expressions_sub <- expressions[,100:150] # keep genes ENSG00000188305, ENSG00000257284, ENSG00000251655 expressions_sub <- expressions[c("ENSG00000188305", "ENSG00000257284", "ENSG00000251655"),] # multimodal_ondisc_matrix # keep all cells except 1 - 100 crispr_experiment_sub <- crispr_experiment[,-c(1:100)]
As with ondisc_matrices
, the original objects remain unchanged.
expressions crispr_experiment
To save and load a metadata_ondisc_matrix
or multimodal_ondisc_matrix
, simply use the functions saveRDS
and readRDS
, respectively.
It is not possible to apply the double-bracket "pull submatrix into memory" operator to a metadata_ondisc_matrix
or multimodal_ondisc_matrix
. To apply the double-bracket operator to an ondisc_matrix
stored inside a metadata_ondisc_matrix
or multimodal_ondisc_matrix
, simply access the desired ondisc_matrix
first.
The metadata_ondisc_matrix
and multimodal_ondisc_matrix
classes are inspired by analogous classes in the SingleCellExperiment
and Seurat
packages. However, unlike SingleCellExperiment
and Seurat
, ondisc
does not (currently) support storing the dense matrix of normalized UMI counts. We view this as a feature: storing the normalized expression matrix is almost never necessary and is hugely expensive (in space and time) for large-scale data.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.