knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
options(max.print = 500) library(tiledbsc) library(fs) library(tiledb) library(SeuratObject) data_dir <- file.path(tempdir(), "pbmc_small") dir.create(data_dir, showWarnings = FALSE)
This vignette will cover the creation of an SOMACollection from a Seurat
object.
Load the subsetted 10X genomics PBMC dataset provided by SeuratObject.
data("pbmc_small", package = "SeuratObject") pbmc_small
Seurat
object to a TileDB-backed SOMACollection
The SOMACollection
class provides a method for converting an entire Seurat
object to an SOMACollection
. This is the recommended way to perform the conversion since it can handle multiple Assay
objects and will (eventually) convert all of the standard slots that comprise a Seurat
object.
This first step is to create a new SOMACollection
object and provide a URI where the dataset should be created:
soco <- SOMACollection$new(uri = file.path(tempdir(), "soco"))
Next, we'll pass the entire pbmc_small
object directly to from_seurat()
and one SOMA
will be created for each Assay
object:
soco$from_seurat(object = pbmc_small)
Examining the directory structure, you can see the top-level SOMACollection
directory now contains a single soma_RNA
sub-directory, corresponding to pbmc_small
's only assay, "RNA"
:
fs::dir_tree(soco$uri, recurse = 1)
Internally, the SOMACollection
class is used to convert each Seurat Assay
object to a SOMA
, which creates and populates the various sub-components, including:
counts
, data
, and scale.data
matrices are each stored in separate attribute of the X
arraydata.frame
containing feature-level metadata is ingested into the var
arraySeparately, any dimensional reductions are extracted and stored in corresponding obsm
/varm
arrays.
To close the loop we can then read the on-disk SOMACollection
back into memory as a Seurat
object.
soco$to_seurat(project = "SOCO Example")
Assay
to TileDB-backed SOMA
Conversions can happen at multiple levels of the API. For example, we can operate directly on a Seurat Assay
using SOMA
. The workflow is largely the same:
soma <- SOMA$new(uri = file.path(tempdir(), "soma")) fs::dir_tree(soma$uri)
Then we'll pass RNA
assay from pbmc_small
to the from_seurat_assay()
method of the SOMA
class.
Note: Because cell-level metadata is stored in the parent Seurat
object, we need to provide this data separately.
soma$from_seurat_assay( object = pbmc_small[["RNA"]], obs = pbmc_small[[]] )
Examine the directory structure of the soma
we can see the X
, var
, and obs
arrays have all been created.
fs::dir_tree(soma$uri, recurse = FALSE)
Any of the underlying TileDB arrays can be accessed directly from a SOMACollection
object by navigating its internal classes.
As an example, let's access the cell-level metadata. Recall from the SOMA spec that cell-level metadata is stored in the obs
array of an SOMA
. Therefore, we must first access a specific SOMA
within the SOMACollection
's somas
slot. Let's generate a list of the available SOMA
s:
names(soco$somas)
Easy choice. "RNA"
can then be used to index the corresponding SOMA
:
soco$members$RNA
We can see we have access to a variety of fields and methods, but obs
is the one we're after.
soco$members$RNA$obs$to_dataframe()
This is a AnnotationDataframe
object, which includes a method for reading the data into R as a data.frame
:
head(soco$members$RNA$obs$to_dataframe())
All of the array-based classes include a number of helper functions for interacting with the underlying arrays.
Print the schema of an array:
soma_obs <- soco$members$RNA$obs soma_obs$schema()
List the names of the array's dimensions (i.e., indexed columns)
soma_obs$dimnames()
and attributes (i.e., non-indexed columns):
soma_obs$attrnames()
You can also use the tiledb_array()
method to directly access the underlying arrays using the standard TileDB API, providing the full functionality of the tiledb
package. For example, let's query the obs
array and retrieve a subset of cells that match our QC criteria:
obs_array <- soma_obs$tiledb_array( return_as = "tibble", attrs = c("nCount_RNA", "nFeature_RNA"), query_condition = parse_query_condition(nFeature_RNA < 2500) ) obs_array[]
Session Info
sessionInfo()
unlink(data_dir, recursive = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.