knitr::opts_chunk$set(
    collapse = TRUE,
    comment = "#>",
    crop = NULL ## Related to https://stat.ethz.ch/pipermail/bioc-devel/2020-April/016656.html
)
## Track time spent on making the vignette
startTime <- Sys.time()

## Bib setup
library("RefManageR")

## Write bibliography information
bib <- c(
    R = citation(),
    BiocFileCache = citation("BiocFileCache")[1],
    BiocStyle = citation("BiocStyle")[1],
    knitr = citation("knitr")[1],
    lobstr = citation("lobstr")[1],
    RefManageR = citation("RefManageR")[1],
    rmarkdown = citation("rmarkdown")[1],
    rtracklayer = citation("rtracklayer")[1],
    sessioninfo = citation("sessioninfo")[1],
    testthat = citation("testthat")[1],
    SpatialExperiment = citation("SpatialExperiment")[1],
    spatialLIBD = citation("spatialLIBD")[1],
    spatialLIBDpaper = citation("spatialLIBD")[2]
)

Basics

Install spatialLIBD

R is an open-source statistical environment which can be easily modified to enhance its functionality via packages. r Biocpkg("spatialLIBD") r Citep(bib[['spatialLIBD']]) is an R package available via the Bioconductor repository for packages. R can be installed on any operating system from CRAN after which you can install r Biocpkg("spatialLIBD") by using the following commands in your R session:

if (!requireNamespace("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}

BiocManager::install("spatialLIBD")

## Check that you have a valid Bioconductor installation
BiocManager::valid()

To run all the code in this vignette, you might need to install other R/Bioconductor packages, which you can do with:

BiocManager::install("spatialLIBD", dependencies = TRUE, force = TRUE)

If you want to use the development version of spatialLIBD, you will need to use the R version corresponding to the current Bioconductor-devel branch as described in more detail on the Bioconductor website. Then you can install spatialLIBD from GitHub using the following command.

BiocManager::install("LieberInstitute/spatialLIBD")

Required knowledge

Please first check the Introduction to spatialLIBD vignette available through GitHub or Bioconductor.

Citing spatialLIBD

We hope that r Biocpkg("spatialLIBD") will be useful for your research. Please use the following information to cite the package and the overall approach. Thank you!

## Citation info
citation("spatialLIBD")

Download data from 10x Genomics

In this vignette we'll show you how you can use r Biocpkg("spatialLIBD") r Citep(bib[['spatialLIBD']]) for exploring spatially resolved transcriptomics data from the Visium platform by 10x Genomics. That is, you will learn how to use r Biocpkg("spatialLIBD") for data beyond the one it was initially developed for r Citep(bib[['spatialLIBDpaper']]). To illustrate these steps, we will use data that 10x Genomics made publicly available at https://support.10xgenomics.com/spatial-gene-expression/datasets. We will use files from the human lymph node example publicly available at https://support.10xgenomics.com/spatial-gene-expression/datasets/1.1.0/V1_Human_Lymph_Node.

Load packages

To get started, lets load the different packages we'll need for this vignette. Here's a brief summary of why we need these packages:

## Load packages in the order we'll use them next
library("BiocFileCache")
library("SpatialExperiment")
library("rtracklayer")
library("lobstr")
library("spatialLIBD")

Download spaceranger output files

Next we download data from 10x Genomics available from the human lymph node example, available at https://support.10xgenomics.com/spatial-gene-expression/datasets/1.1.0/V1_Human_Lymph_Node. We don't need to download all the files listed there since r Biocpkg("SpatialExperiment") doesn't need all of them for importing the data into R. These files are part of the output that gets generated by spaceranger which is the processing pipeline provided by 10x Genomics for Visium data.

We'll use r Biocpkg("BiocFileCache") to keep the data in a local cache in case we want to run this example again and don't want to re-download the data from the web.

## Download and save a local cache of the data provided by 10x Genomics
bfc <- BiocFileCache::BiocFileCache()
lymph.url <-
    paste0(
        "https://cf.10xgenomics.com/samples/spatial-exp/",
        "1.1.0/V1_Human_Lymph_Node/",
        c(
            "V1_Human_Lymph_Node_filtered_feature_bc_matrix.tar.gz",
            "V1_Human_Lymph_Node_spatial.tar.gz",
            "V1_Human_Lymph_Node_analysis.tar.gz"
        )
    )
lymph.data <- sapply(lymph.url, BiocFileCache::bfcrpath, x = bfc)

10x Genomics provides the files in compressed tarballs (.tar.gz file extension). Which is why we'll need to use utils::untar() to decompress the files. This will create new directories and we will use list.files() to see what files these directories contain.

## Extract the files to a temporary location
## (they'll be deleted once you close your R session)
xx <- sapply(lymph.data, utils::untar, exdir = file.path(tempdir(), "outs"))
## The names are the URLs, which are long and thus too wide to be shown here,
## so we shorten them to only show the file name prior to displaying the
## utils::untar() output status
names(xx) <- basename(names(xx))
xx

## List the files we downloaded and extracted
## These files are typically SpaceRanger outputs
lymph.dirs <- file.path(
    tempdir(), "outs",
    c("filtered_feature_bc_matrix", "spatial", "raw_feature_bc_matrix", "analysis")
)
list.files(lymph.dirs)

Now that we have the files that we need, we can import the data into R using read10xVisium() from r Biocpkg("SpatialExperiment"). We'll import the low resolution histology images produced by spaceranger using the images = "lowres" and load = TRUE arguments. We'll also load the filtered gene expression data using the data = "filtered" argument. The count matrix can still be quite large, which is why we'll use the type = "sparse" argument to load the data into an R object that is memory-efficient for sparse data.

## Import the data as a SpatialExperiment object
spe <- SpatialExperiment::read10xVisium(
    samples = tempdir(),
    sample_id = "lymph",
    type = "sparse", data = "filtered",
    images = "lowres", load = TRUE
)
## Inspect the R object we just created: class, memory, and how it looks in
## general
class(spe)
lobstr::obj_size(spe) / 1024^2 ## Convert to MB
spe

## The counts are saved in a sparse matrix R object
class(counts(spe))

Modify spe for spatialLIBD

Now that we have an SpatialExperiment R object (spe) with the data from 10x Genomics for the human lymph node example, we need to add a few features to the R object in order to create the interactive website using spatialLIBD::run_app(). These additional elements power features in the interactive website that you might be interested in.

First we start with adding a few variables to the sample information table (colData()) of our spe object. We add:

## Add some information used by spatialLIBD
spe <- add_key(spe)
spe$sum_umi <- colSums(counts(spe))
spe$sum_gene <- colSums(counts(spe) > 0)

Add gene annotation information

The files SpatialExperiment::read10xVisium() uses to read in the spaceranger outputs into R do not include much information about the genes, such as their chromosomes, coordinates, and other gene annotation information. We thus recommend that you read in this information from a gene annotation file: typically a gtf file. For a real case scenario, you'll mostly likely have access to the GTF file provided by 10x Genomics. However, we cannot download that file without downloading other files for this example. Thus we'll show you the code you would use if you had access to the GTF file from 10x Genomics and also show a second approach that works for this vignette.

## Initially we don't have much information about the genes
rowRanges(spe)

From 10x

Depending on the version of spaceranger you used, you might have used different GTF files 10x Genomics has made available at https://support.10xgenomics.com/single-cell-gene-expression/software/downloads/latest and described at https://support.10xgenomics.com/single-cell-gene-expression/software/release-notes/build. These files are too big though and we won't download them in this example. For instance, References - 2020-A (July 7, 2020) for Human reference (GRCh38) is 11 GB in size and contains files we do not need for this vignette. If you did have the file locally, you could use the following code to read in the GTF file prepared by 10x Genomics and add the information into your spe object that SpatialExperiment::read10xVisium() does not include.

For example, in our computing cluster this GTF file is located at the following path and is 1.4 GB in size:

```{bash, eval = FALSE} $ cd /dcs04/lieber/lcolladotor/annotationFiles_LIBD001/10x/refdata-gex-GRCh38-2020-A $ du -sh --apparent-size genes/genes.gtf 1.4G genes/genes.gtf

If you have the GTF file from 10x Genomics, we show next how you can read the information into R, match it appropriately with the information in the `spe` object and add it back into the `spe` object.

```r
## You could:
## * download the 11 GB file from
## https://cf.10xgenomics.com/supp/cell-exp/refdata-gex-GRCh38-2020-A.tar.gz
## * decompress it

## Read in the gene information from the annotation GTF file provided by 10x
gtf <-
    rtracklayer::import(
        "/path/to/refdata-gex-GRCh38-2020-A/genes/genes.gtf"
    )

## Subject to genes only
gtf <- gtf[gtf$type == "gene"]

## Set the names to be the gene IDs
names(gtf) <- gtf$gene_id

## Match the genes
match_genes <- match(rownames(spe), gtf$gene_id)

## They should all be present if you are using the correct GTF file from 10x
stopifnot(all(!is.na(match_genes)))

## Keep only some columns from the gtf (you could keep all of them if you want)
mcols(gtf) <-
    mcols(gtf)[, c(
        "source",
        "type",
        "gene_id",
        "gene_version",
        "gene_name",
        "gene_type"
    )]

## Add the gene info to our SPE object
rowRanges(spe) <- gtf[match_genes]

## Inspect the gene annotation data we added
rowRanges(spe)

From Gencode

In this vignette, we'll use the GTF file from Gencode v32. That's because the build notes from References - 2020-A (July 7, 2020) and Human reference, GRCh38 (GENCODE v32/Ensembl 98) at https://support.10xgenomics.com/single-cell-gene-expression/software/release-notes/build#GRCh38_2020A show that 10x Genomics used Gencode v32. They also used Ensembl version 98 which is why a few genes we have in our object are going to be missing. We show next how you can read the information into R, match it appropriately with the information in the spe object and add it back into the spe object.

## Download the Gencode v32 GTF file and cache it
gtf_cache <- BiocFileCache::bfcrpath(
    bfc,
    paste0(
        "ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/",
        "release_32/gencode.v32.annotation.gtf.gz"
    )
)

## Show the GTF cache location
gtf_cache

## Import into R (takes ~1 min)
gtf <- rtracklayer::import(gtf_cache)

## Subset to genes only
gtf <- gtf[gtf$type == "gene"]

## Remove the .x part of the gene IDs
gtf$gene_id <- gsub("\\..*", "", gtf$gene_id)

## Set the names to be the gene IDs
names(gtf) <- gtf$gene_id

## Match the genes
match_genes <- match(rownames(spe), gtf$gene_id)
table(is.na(match_genes))

## Drop the few genes for which we don't have information
spe <- spe[!is.na(match_genes), ]
match_genes <- match_genes[!is.na(match_genes)]

## Keep only some columns from the gtf
mcols(gtf) <- mcols(gtf)[, c("source", "type", "gene_id", "gene_name", "gene_type")]

## Add the gene info to our SPE object
rowRanges(spe) <- gtf[match_genes]

## Inspect the gene annotation data we added
rowRanges(spe)

Enable a friendlier gene search

Regardless of which method you used to obtain the gene annotation information, we can now proceed by adding the gene symbol and gene ID information that helps users search for genes in the shiny app produced by spatialLIBD. This will enable users to search genes by gene symbol or gene ID. If you didn't do this, users would only be able to search genes by gene ID which makes the web application harder to use.

We also compute the total expression for the mitochondrial chromosome (chrM) as well as the ratio of chrM expression. Both of these continuous variables are interesting to explore and in some situations could be useful for biological interpretations. For instance, in our pilot data r Citep(bib[['spatialLIBDpaper']]), we noticed that the expr_chrM_ratio was associated to DLPFC layers. That is, spots with high expr_chrM_ratio were not randomly located in our Visium slides.

## Add information used by spatialLIBD
rowData(spe)$gene_search <- paste0(
    rowData(spe)$gene_name, "; ", rowData(spe)$gene_id
)

## Compute chrM expression and chrM expression ratio
is_mito <- which(seqnames(spe) == "chrM")
spe$expr_chrM <- colSums(counts(spe)[is_mito, , drop = FALSE])
spe$expr_chrM_ratio <- spe$expr_chrM / spe$sum_umi

Filter the spe object

We can now continue with some filtering steps since this can help reduce the object size in memory as well as make it ready to use for downstream processing tools such as those from the r Biocpkg("scran") and r Biocpkg("scuttle") packages. Though these steps are not absolutely necessary.

## Remove genes with no data
no_expr <- which(rowSums(counts(spe)) == 0)

## Number of genes with no counts
length(no_expr)

## Compute the percent of genes with no counts
length(no_expr) / nrow(spe) * 100
spe <- spe[-no_expr, , drop = FALSE]

## Remove spots without counts
summary(spe$sum_umi)

## If we had spots with no counts, we would remove them
if (any(spe$sum_umi == 0)) {
    spots_no_counts <- which(spe$sum_umi == 0)
    ## Number of spots with no counts
    print(length(spots_no_counts))
    ## Percent of spots with no counts
    print(length(spots_no_counts) / ncol(spe) * 100)
    spe <- spe[, -spots_no_counts, drop = FALSE]
}

Check object

Next, we add the ManualAnnotation variable to the sample information table (colData()) with "NA". That variable is used by the interactive website to store any manual annotations.

## Add a variable for saving the manual annotations
spe$ManualAnnotation <- "NA"

Finally, we can now check the final object using spatialLIBD::check_spe(). This is a helper function that will warn us if some important element is missing in spe that we use later for the interactive website. If it all goes well, it will return the original spe object.

## Check the final dimensions and object size
dim(spe)
lobstr::obj_size(spe) / 1024^2 ## Convert to MB

## Run check_spe() function
check_spe(spe)

Explore the data

With our complete spe object, we can now use r Biocpkg("spatialLIBD") for visualizing our data. We can do so using functions such as vis_gene() and vis_clus() that are described in more detail at the Introduction to spatialLIBD vignette available through GitHub or Bioconductor.

## Example visualizations. Let's start with a continuous variable.
spatialLIBD::vis_gene(
    spe = spe,
    sampleid = "lymph",
    geneid = "sum_umi",
    assayname = "counts"
)

## We next create a random cluster label to visualize
set.seed(20210428)
spe$random_cluster <- sample(1:7, ncol(spe), replace = TRUE)

## Next we visualize that random cluster
spatialLIBD::vis_clus(
    spe = spe,
    sampleid = "lymph",
    clustervar = "random_cluster"
)

Run the interactive website

We are now ready to create our interactive website for the human lymph node data. The interactive website is a r CRANpkg("shiny") web application that uses r CRANpkg("plotly") to power several of the interactive features. We can create the interactive website using the spatialLIBD::run_app() function. The default arguments of that function are customized for the data from our initial study r Citep(bib[['spatialLIBDpaper']]), so we will need to make some adjustments:

## Run our shiny app
if (interactive()) {
    run_app(
        spe,
        sce_layer = NULL,
        modeling_results = NULL,
        sig_genes = NULL,
        title = "spatialLIBD: human lymph node by 10x Genomics",
        spe_discrete_vars = c("random_cluster", "ManualAnnotation"),
        spe_continuous_vars = c("sum_umi", "sum_gene", "expr_chrM", "expr_chrM_ratio"),
        default_cluster = "random_cluster"
    )
}

We also recommend creating custom website documentation files as described in the documentation of spatialLIBD::run_app(). Those documentation files will help you describe your project to your users in a more personalized way. The easiest way to start is to copy our documentation files to a new location and adapt them. You can locate them at the following path.

## Locate our documentation files
docs_path <- system.file("app", "www", package = "spatialLIBD")
docs_path
list.files(docs_path)

Wrapper functions

To facilitate reading in the data and preparing it to visualize it interactively using spatialLIBD::run_app(), we implemented read10xVisiumWrapper() which expands SpatialExperiment::read10xVisium() and performs the steps described in this vignette. In this example, we'll load all four images created by SpaceRanger: lowres, hires, detected, and aligned. That way we can toggle between them on the web application.

## Import the data as a SpatialExperiment object
spe_wrapper <- read10xVisiumWrapper(
    samples = file.path(tempdir(), "outs"),
    sample_id = "lymph",
    type = "sparse", data = "filtered",
    images = c("lowres", "hires", "detected", "aligned"), load = TRUE,
    reference_gtf = gtf_cache
)
## Run our shiny app
if (interactive()) {
    vars <- colnames(colData(spe_wrapper))

    run_app(
        spe_wrapper,
        sce_layer = NULL,
        modeling_results = NULL,
        sig_genes = NULL,
        title = "spatialLIBD: human lymph node by 10x Genomics (made with wrapper)",
        spe_discrete_vars = c(vars[grep("10x_", vars)], "ManualAnnotation"),
        spe_continuous_vars = c("sum_umi", "sum_gene", "expr_chrM", "expr_chrM_ratio"),
        default_cluster = "10x_graphclust"
    )
}

Publishing your web application

Now we have have our spe_wrapper object and have verified that the web application works, we can share it with others through shinyapps.io by RStudio if our data is small enough. To do, you will need to:

Below we illustrate these steps

## Directory we created to host the data for the web application
## Use a directory of your preference instead of copy-pasting this code
app_dir <- here::here("inst", "spe_wrapper_app")
dir.create(app_dir, showWarnings = FALSE)

## Code we used to save the data
saveRDS(spe_wrapper, file = file.path(app_dir, "spe_wrapper.rds"))

## Copy the contents of system.file("app", "www", package = "spatialLIBD")
file.copy(system.file("app", "www", package = "spatialLIBD"), app_dir, recursive = TRUE)
## Manually edit them to your liking.

Next, create an app_dir/app.R script with these contents.

cat(paste0(readLines(system.file("spe_wrapper_app", "app.R", package = "spatialLIBD")), "\n"))

Next create an app_dir/deploy.R script with these contents. You will need to login to your shinyapps.io account to get the exact authentication details for your access token.

cat(paste0(readLines(system.file("spe_wrapper_app", "deploy.R", package = "spatialLIBD")), "\n"))

Next, you should test that app_dir/app.R works locally before running app_dir/deploy.R.

You'll end up with a website just like this one. In our case, we further configured our website through the shinyapps.io dashboard. We selected the following options:

Limitations

spatialLIBD::run_app() has limitations that are inherent to the methods used to implement it, such as:

  1. the memory per user required by a server for hosting the web application,
  2. response speeds for the interactive views due to the number of spots,
  3. the resolution of the images displayed limiting the usefulness to magnify specific spots,
  4. and customization of the web application by the end user.

Memory

Regarding the memory limitation, you can estimate how much memory you need per user by considering the memory required for the spe and sce_layer objects.

lobstr::obj_size(spe) / 1024^2 ## Convert to MB

In our pilot data r Citep(bib[['spatialLIBDpaper']]) our object uses about 2.1 GB of RAM since it contains the data for 12 Visium slides and we considered using about 3 GB of RAM per user. You could filter the genes more aggressively to drop lowly expressed genes or if you have many Visium slides, you could consider making multiple websites for different sets of slides. You could also have multiple mirrors to support several users, though in that case, we recommend linking users to a stable website instead of one that might not be available if you have too many users: for us our stable website is http://research.libd.org/spatialLIBD/ which includes the links to all the mirrors.

Given these memory limitations, we chose to deploy our main web application at http://spatial.libd.org/spatialLIBD/ using an Amazon EC2 instance: an 'r5.4xlarge' EC2 instance with 16 vCPUs, 128 GB DRAM, 10 Gb max network, 1.008 USD/Hour. We also have deployed mirrors at https://www.shinyapps.io/ using the "3X-Large (8 GB)" instances they provide.

Response speeds

This limitation is mostly due to the number of spots shown under the "clusters (interactive)" section of the interactive website powered by r CRANpkg("plotly"). Each spot is shown four times which is about 16 thousand spots for one Visium slide (depending on any filter steps you applied). The response time will depend on your own computer RAM memory, that is, the client side. This limitation might be more noticeable if you have a computer with 8GB of RAM or lower, as well as if you have other high-memory software open. Furthermore, if you are running web application locally through spatialLIBD::run_app() then you also need to consider the required memory for the R objects. That is, the server side memory use.

Thanks to Jesús Vélez Santiago, the app is more responsive as of version 1.3.15 by using plotly::toWebGL().

Image resolution

When you construct the SpatialExperiment spe object with r Biocpkg("SpatialExperiment"), you can read in higher resolution images. However, the benefit of loading the raw histology images (500 MB to 20 GB per image) is likely non-existent in this web application. The memory required would likely become prohibitive. Other solutions load these raw histology images in chunks and display the chunks necessary for a given visualization area. We thus recommend using other software if you want to zoom in at the spot and/or cell resolution.

Customization

While the documentation, title, icon and HTML footer are all customizable at spatialLIBD::run_app(), ultimately the panels shown are not unless you fork and adapt the internal code of this package. Thus, the interactive web applications powered by r Biocpkg("spatialLIBD") are not as easy to customize as say r Biocpkg("iSEE") web applications are. We think of our web application as a good enough prototype that can be useful for initial explorations of 10x Genomics Visium data. We welcome additions to our code, though we recognize that you might want to build your own production-level solution.

Reproducibility

The r Biocpkg("spatialLIBD") package r Citep(bib[["spatialLIBD"]]) was made possible thanks to:

This package was developed using r BiocStyle::Biocpkg("biocthis").

Code for creating the vignette

## Create the vignette
library("rmarkdown")
system.time(render("TenX_data_download.Rmd", "BiocStyle::html_document"))

## Extract the R code
library("knitr")
knit("TenX_data_download.Rmd", tangle = TRUE)

Date the vignette was generated.

## Date the vignette was generated
Sys.time()

Wallclock time spent generating the vignette.

## Processing time in seconds
totalTime <- diff(c(startTime, Sys.time()))
round(totalTime, digits = 3)

R session information.

## Session info
library("sessioninfo")
options(width = 120)
session_info()

Bibliography

This vignette was generated using r Biocpkg("BiocStyle") r Citep(bib[["BiocStyle"]]) with r CRANpkg("knitr") r Citep(bib[["knitr"]]) and r CRANpkg("rmarkdown") r Citep(bib[["rmarkdown"]]) running behind the scenes.

Citations made with r CRANpkg("RefManageR") r Citep(bib[["RefManageR"]]).

## Print bibliography
PrintBibliography(bib, .opts = list(hyperlink = "to.doc", style = "html"))


LieberInstitute/spatialLIBD documentation built on Nov. 4, 2024, 11:57 a.m.