knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Setup

library(tiledbsc)
library(SeuratObject)

Overview

While tiledbsc facilitates the storage and retrieval of complete Seurat/Bioconductor datasets, it's also possible to selectively update specific components of an existing SOMA.

Create the initial dataset

As in the Introduction we'll use the pbmc_small dataset from the SeuratObject package.

data("pbmc_small", package = "SeuratObject")
pbmc_small

We'll start by creating a Seurat Assay object that contains only raw RNA counts.

pbmc_small_rna <- CreateAssayObject(
  counts = GetAssayData(pbmc_small[["RNA"]], "counts")
)

Key(pbmc_small_rna) <- Key(pbmc_small[["RNA"]])

and then ingest this into a new SOMA on disk:

soma <- SOMA$new(uri = file.path(tempdir(), "pbmc_small_rna"))
soma$from_seurat_assay(pbmc_small_rna)
soma

This performed the following operations:

Printing out the X group, we can see it contains only a single member, counts.

soma$X

Note: A Seurat Assay instantiated with data passed to the counts argument will store a reference to the same matrix in both the counts and data slots (i.e., GetAssayData(pbmc_small_rna, "counts") and GetAssayData(pbmc_small_rna, "data")). To avoid redundantly storing data on disk, tiledbsc only ingests data from data when it is not identical to counts.

Add new layers for normalized and scaled data

Now let's populate pbmc_small_rna's counts slot with normalized counts.

pbmc_small_rna <- SetAssayData(
  pbmc_small_rna,
  slot = "data",
  new.data = GetAssayData(pbmc_small[["RNA"]], "data")
)

pbmc_small_rna

At this point, counts and data contain different transformations of the same matrix but only the counts data has been written to disk.

We could simply rerun soma$from_seurat_assay(pbmc_small_rna) to update the SOMA and ingest the normalized counts matrix into X/data, however, this would perform another full conversion of the entire Assay object. As a result, data from the counts slot would be re-ingested into X/counts and feature-level metadata would be re-ingested into var.

To avoid these redudndant ingestions, use the layers argument to specify which of the assay slots (counts, data, or scale.data) should be ingested. We also set var = FALSE to prevent re-ingestion of the feature-level metadata since it hasn't changed.

soma$from_seurat_assay(pbmc_small_rna, layers = "data", var = FALSE)

Printing out the X group again reveals that it now contains 2 members: counts and data.

soma$X

Let's repeat this process for the scaled/centered version of the data that's stored in the scale.data slot.

pbmc_small_rna <- SetAssayData(
  pbmc_small_rna,
  slot = "scale.data",
  new.data = GetAssayData(pbmc_small[["RNA"]], "scale.data")
)

soma$from_seurat_assay(pbmc_small_rna, layers = "scale.data", var = FALSE)
soma$X

The X group now contains 3 arrays, each of which has only been written to a single time. We can verify this by counting the number of fragments contained within any of individual arrays:

soma$X$members$count$fragment_count()

TileDB creates a new fragment each time a write is performed, so 1 fragment confirms the array has been written to only once.

Update an existing layer

If the data in your Seurat Assay has changed since it was ingested into TileDB, you can follow the same basic process to update the array.

Here, we'll aritifically shift all of the values in the scale.data slot by 1 and then re-ingest the updated data into the scale.data layer on disk:

pbmc_small_rna <- SetAssayData(
  pbmc_small_rna,
  slot = "scale.data",
  new.data = GetAssayData(pbmc_small[["RNA"]], "scale.data") + 1
)

soma$from_seurat_assay(pbmc_small_rna, layers = "scale.data", var = FALSE)

Now let's verify that number of fragments in each X's layers:

sapply(
  names(soma$X$members),
  function(x) soma$X$members[[x]]$fragment_count()
)

Session

Session Info

sessionInfo()

unlink(soma$uri, recursive = TRUE)


TileDB-Inc/tiledbsc documentation built on Aug. 26, 2023, 2:32 p.m.