knitr::opts_chunk$set( collapse = TRUE, comment = "#>", crop = NULL ## Related to https://stat.ethz.ch/pipermail/bioc-devel/2020-April/016656.html )
## Track time spent on making the vignette startTime <- Sys.time() ## Bib setup library("RefManageR") ## Write bibliography information bib <- c( R = citation(), BiocFileCache = citation("BiocFileCache")[1], BiocStyle = citation("BiocStyle")[1], knitr = citation("knitr")[1], lobstr = citation("lobstr")[1], RefManageR = citation("RefManageR")[1], rmarkdown = citation("rmarkdown")[1], rtracklayer = citation("rtracklayer")[1], sessioninfo = citation("sessioninfo")[1], testthat = citation("testthat")[1], SpatialExperiment = citation("SpatialExperiment")[1], spatialLIBD = citation("spatialLIBD")[1], spatialLIBDpaper = citation("spatialLIBD")[2] )
spatialLIBD
R
is an open-source statistical environment which can be easily modified to enhance its functionality via packages. r Biocpkg("spatialLIBD")
r Citep(bib[['spatialLIBD']])
is an R
package available via the Bioconductor repository for packages. R
can be installed on any operating system from CRAN after which you can install r Biocpkg("spatialLIBD")
by using the following commands in your R
session:
if (!requireNamespace("BiocManager", quietly = TRUE)) { install.packages("BiocManager") } BiocManager::install("spatialLIBD") ## Check that you have a valid Bioconductor installation BiocManager::valid()
To run all the code in this vignette, you might need to install other R/Bioconductor packages, which you can do with:
BiocManager::install("spatialLIBD", dependencies = TRUE, force = TRUE)
If you want to use the development version of spatialLIBD
, you will need to use the R version corresponding to the current Bioconductor-devel branch as described in more detail on the Bioconductor website. Then you can install spatialLIBD
from GitHub using the following command.
BiocManager::install("LieberInstitute/spatialLIBD")
Please first check the Introduction to spatialLIBD vignette available through GitHub or Bioconductor.
spatialLIBD
We hope that r Biocpkg("spatialLIBD")
will be useful for your research. Please use the following information to cite the package and the overall approach. Thank you!
## Citation info citation("spatialLIBD")
In this vignette we'll show you how you can use r Biocpkg("spatialLIBD")
r Citep(bib[['spatialLIBD']])
for exploring spatially resolved transcriptomics data from the Visium platform by 10x Genomics. That is, you will learn how to use r Biocpkg("spatialLIBD")
for data beyond the one it was initially developed for r Citep(bib[['spatialLIBDpaper']])
. To illustrate these steps, we will use data that 10x Genomics made publicly available at https://support.10xgenomics.com/spatial-gene-expression/datasets. We will use files from the human lymph node example publicly available at https://support.10xgenomics.com/spatial-gene-expression/datasets/1.1.0/V1_Human_Lymph_Node.
To get started, lets load the different packages we'll need for this vignette. Here's a brief summary of why we need these packages:
r Biocpkg("BiocFileCache")
: for downloading and saving a local cache of the datar Biocpkg("SpatialExperiment")
: for reading the spaceranger
files provided by 10x Genomicsr Biocpkg("rtracklayer")
: for importing a gene annotation GTF filer CRANpkg("lobstr")
: for checking how much memory our object is usingr Biocpkg("spatialLIBD")
: for launching an interactive website to explore the data## Load packages in the order we'll use them next library("BiocFileCache") library("SpatialExperiment") library("rtracklayer") library("lobstr") library("spatialLIBD")
Next we download data from 10x Genomics available from the human lymph node example, available at https://support.10xgenomics.com/spatial-gene-expression/datasets/1.1.0/V1_Human_Lymph_Node. We don't need to download all the files listed there since r Biocpkg("SpatialExperiment")
doesn't need all of them for importing the data into R. These files are part of the output that gets generated by spaceranger
which is the processing pipeline provided by 10x Genomics for Visium data.
We'll use r Biocpkg("BiocFileCache")
to keep the data in a local cache in case we want to run this example again and don't want to re-download the data from the web.
## Download and save a local cache of the data provided by 10x Genomics bfc <- BiocFileCache::BiocFileCache() lymph.url <- paste0( "https://cf.10xgenomics.com/samples/spatial-exp/", "1.1.0/V1_Human_Lymph_Node/", c( "V1_Human_Lymph_Node_filtered_feature_bc_matrix.tar.gz", "V1_Human_Lymph_Node_spatial.tar.gz", "V1_Human_Lymph_Node_analysis.tar.gz" ) ) lymph.data <- sapply(lymph.url, BiocFileCache::bfcrpath, x = bfc)
10x Genomics provides the files in compressed tarballs (.tar.gz
file extension). Which is why we'll need to use utils::untar()
to decompress the files. This will create new directories and we will use list.files()
to see what files these directories contain.
## Extract the files to a temporary location ## (they'll be deleted once you close your R session) xx <- sapply(lymph.data, utils::untar, exdir = file.path(tempdir(), "outs")) ## The names are the URLs, which are long and thus too wide to be shown here, ## so we shorten them to only show the file name prior to displaying the ## utils::untar() output status names(xx) <- basename(names(xx)) xx ## List the files we downloaded and extracted ## These files are typically SpaceRanger outputs lymph.dirs <- file.path( tempdir(), "outs", c("filtered_feature_bc_matrix", "spatial", "raw_feature_bc_matrix", "analysis") ) list.files(lymph.dirs)
Now that we have the files that we need, we can import the data into R using read10xVisium()
from r Biocpkg("SpatialExperiment")
. We'll import the low resolution histology images produced by spaceranger
using the images = "lowres"
and load = TRUE
arguments. We'll also load the filtered gene expression data using the data = "filtered"
argument. The count matrix can still be quite large, which is why we'll use the type = "sparse"
argument to load the data into an R object that is memory-efficient for sparse data.
## Import the data as a SpatialExperiment object spe <- SpatialExperiment::read10xVisium( samples = tempdir(), sample_id = "lymph", type = "sparse", data = "filtered", images = "lowres", load = TRUE ) ## Inspect the R object we just created: class, memory, and how it looks in ## general class(spe) lobstr::obj_size(spe) / 1024^2 ## Convert to MB spe ## The counts are saved in a sparse matrix R object class(counts(spe))
Now that we have an SpatialExperiment
R object (spe
) with the data from 10x Genomics for the human lymph node example, we need to add a few features to the R object in order to create the interactive website using spatialLIBD::run_app()
. These additional elements power features in the interactive website that you might be interested in.
First we start with adding a few variables to the sample information table (colData()
) of our spe
object. We add:
key
: this labels each spot with a unique identifier. We combine the sample ID with the spot barcode ID to create this unique identifier.sum_umi
: this continuous variable contains the total number of counts for each sample prior to filtering any genes.sum_gene
: this continuous variable contains the number of genes that have at least 1 count.## Add some information used by spatialLIBD spe <- add_key(spe) spe$sum_umi <- colSums(counts(spe)) spe$sum_gene <- colSums(counts(spe) > 0)
The files SpatialExperiment::read10xVisium()
uses to read in the spaceranger
outputs into R do not include much information about the genes, such as their chromosomes, coordinates, and other gene annotation information. We thus recommend that you read in this information from a gene annotation file: typically a gtf
file. For a real case scenario, you'll mostly likely have access to the GTF file provided by 10x Genomics. However, we cannot download that file without downloading other files for this example. Thus we'll show you the code you would use if you had access to the GTF file from 10x Genomics and also show a second approach that works for this vignette.
## Initially we don't have much information about the genes rowRanges(spe)
Depending on the version of spaceranger
you used, you might have used different GTF files 10x Genomics has made available at https://support.10xgenomics.com/single-cell-gene-expression/software/downloads/latest and described at https://support.10xgenomics.com/single-cell-gene-expression/software/release-notes/build. These files are too big though and we won't download them in this example. For instance, References - 2020-A (July 7, 2020) for Human reference (GRCh38) is 11 GB in size and contains files we do not need for this vignette. If you did have the file locally, you could use the following code to read in the GTF file prepared by 10x Genomics and add the information into your spe
object that SpatialExperiment::read10xVisium()
does not include.
For example, in our computing cluster this GTF file is located at the following path and is 1.4 GB in size:
```{bash, eval = FALSE} $ cd /dcs04/lieber/lcolladotor/annotationFiles_LIBD001/10x/refdata-gex-GRCh38-2020-A $ du -sh --apparent-size genes/genes.gtf 1.4G genes/genes.gtf
If you have the GTF file from 10x Genomics, we show next how you can read the information into R, match it appropriately with the information in the `spe` object and add it back into the `spe` object. ```r ## You could: ## * download the 11 GB file from ## https://cf.10xgenomics.com/supp/cell-exp/refdata-gex-GRCh38-2020-A.tar.gz ## * decompress it ## Read in the gene information from the annotation GTF file provided by 10x gtf <- rtracklayer::import( "/path/to/refdata-gex-GRCh38-2020-A/genes/genes.gtf" ) ## Subject to genes only gtf <- gtf[gtf$type == "gene"] ## Set the names to be the gene IDs names(gtf) <- gtf$gene_id ## Match the genes match_genes <- match(rownames(spe), gtf$gene_id) ## They should all be present if you are using the correct GTF file from 10x stopifnot(all(!is.na(match_genes))) ## Keep only some columns from the gtf (you could keep all of them if you want) mcols(gtf) <- mcols(gtf)[, c( "source", "type", "gene_id", "gene_version", "gene_name", "gene_type" )] ## Add the gene info to our SPE object rowRanges(spe) <- gtf[match_genes] ## Inspect the gene annotation data we added rowRanges(spe)
In this vignette, we'll use the GTF file from Gencode v32. That's because the build notes from References - 2020-A (July 7, 2020) and Human reference, GRCh38 (GENCODE v32/Ensembl 98) at https://support.10xgenomics.com/single-cell-gene-expression/software/release-notes/build#GRCh38_2020A show that 10x Genomics used Gencode v32. They also used Ensembl version 98 which is why a few genes we have in our object are going to be missing. We show next how you can read the information into R, match it appropriately with the information in the spe
object and add it back into the spe
object.
## Download the Gencode v32 GTF file and cache it gtf_cache <- BiocFileCache::bfcrpath( bfc, paste0( "ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/", "release_32/gencode.v32.annotation.gtf.gz" ) ) ## Show the GTF cache location gtf_cache ## Import into R (takes ~1 min) gtf <- rtracklayer::import(gtf_cache) ## Subset to genes only gtf <- gtf[gtf$type == "gene"] ## Remove the .x part of the gene IDs gtf$gene_id <- gsub("\\..*", "", gtf$gene_id) ## Set the names to be the gene IDs names(gtf) <- gtf$gene_id ## Match the genes match_genes <- match(rownames(spe), gtf$gene_id) table(is.na(match_genes)) ## Drop the few genes for which we don't have information spe <- spe[!is.na(match_genes), ] match_genes <- match_genes[!is.na(match_genes)] ## Keep only some columns from the gtf mcols(gtf) <- mcols(gtf)[, c("source", "type", "gene_id", "gene_name", "gene_type")] ## Add the gene info to our SPE object rowRanges(spe) <- gtf[match_genes] ## Inspect the gene annotation data we added rowRanges(spe)
Regardless of which method you used to obtain the gene annotation information, we can now proceed by adding the gene symbol and gene ID information that helps users search for genes in the shiny app produced by spatialLIBD
. This will enable users to search genes by gene symbol or gene ID. If you didn't do this, users would only be able to search genes by gene ID which makes the web application harder to use.
We also compute the total expression for the mitochondrial chromosome (chrM) as well as the ratio of chrM expression. Both of these continuous variables are interesting to explore and in some situations could be useful for biological interpretations. For instance, in our pilot data r Citep(bib[['spatialLIBDpaper']])
, we noticed that the expr_chrM_ratio
was associated to DLPFC layers. That is, spots with high expr_chrM_ratio
were not randomly located in our Visium slides.
## Add information used by spatialLIBD rowData(spe)$gene_search <- paste0( rowData(spe)$gene_name, "; ", rowData(spe)$gene_id ) ## Compute chrM expression and chrM expression ratio is_mito <- which(seqnames(spe) == "chrM") spe$expr_chrM <- colSums(counts(spe)[is_mito, , drop = FALSE]) spe$expr_chrM_ratio <- spe$expr_chrM / spe$sum_umi
We can now continue with some filtering steps since this can help reduce the object size in memory as well as make it ready to use for downstream processing tools such as those from the r Biocpkg("scran")
and r Biocpkg("scuttle")
packages. Though these steps are not absolutely necessary.
## Remove genes with no data no_expr <- which(rowSums(counts(spe)) == 0) ## Number of genes with no counts length(no_expr) ## Compute the percent of genes with no counts length(no_expr) / nrow(spe) * 100 spe <- spe[-no_expr, , drop = FALSE] ## Remove spots without counts summary(spe$sum_umi) ## If we had spots with no counts, we would remove them if (any(spe$sum_umi == 0)) { spots_no_counts <- which(spe$sum_umi == 0) ## Number of spots with no counts print(length(spots_no_counts)) ## Percent of spots with no counts print(length(spots_no_counts) / ncol(spe) * 100) spe <- spe[, -spots_no_counts, drop = FALSE] }
Next, we add the ManualAnnotation
variable to the sample information table (colData()
) with "NA"
. That variable is used by the interactive website to store any manual annotations.
## Add a variable for saving the manual annotations spe$ManualAnnotation <- "NA"
Finally, we can now check the final object using spatialLIBD::check_spe()
. This is a helper function that will warn us if some important element is missing in spe
that we use later for the interactive website. If it all goes well, it will return the original spe
object.
## Check the final dimensions and object size dim(spe) lobstr::obj_size(spe) / 1024^2 ## Convert to MB ## Run check_spe() function check_spe(spe)
With our complete spe
object, we can now use r Biocpkg("spatialLIBD")
for visualizing our data. We can do so using functions such as vis_gene()
and vis_clus()
that are described in more detail at the Introduction to spatialLIBD vignette available through GitHub or Bioconductor.
## Example visualizations. Let's start with a continuous variable. spatialLIBD::vis_gene( spe = spe, sampleid = "lymph", geneid = "sum_umi", assayname = "counts" ) ## We next create a random cluster label to visualize set.seed(20210428) spe$random_cluster <- sample(1:7, ncol(spe), replace = TRUE) ## Next we visualize that random cluster spatialLIBD::vis_clus( spe = spe, sampleid = "lymph", clustervar = "random_cluster" )
We are now ready to create our interactive website for the human lymph node data. The interactive website is a r CRANpkg("shiny")
web application that uses r CRANpkg("plotly")
to power several of the interactive features. We can create the interactive website using the spatialLIBD::run_app()
function. The default arguments of that function are customized for the data from our initial study r Citep(bib[['spatialLIBDpaper']])
, so we will need to make some adjustments:
sce_layer
, modeling_results
and sig_genes
will be set to NULL
since do not have any pseudo-bulk results for this example data.title
: we will use a custom title that reflect our dataspe_discreate_vars
: we don't have really any discrete variables to show beyond ManualAnnotation
which is used for the manual annotations and random_cluster
that we created in the previous section.spe_continous_vars
: we have computed several continuous variables while adapting our spe
object for r Biocpkg("spatialLIBD")
, so we'll list these variables below in order to visually inspect them.default_cluster
: this is used for indicating the default discrete variable and for now we'll set it to our random_cluster
.## Run our shiny app if (interactive()) { run_app( spe, sce_layer = NULL, modeling_results = NULL, sig_genes = NULL, title = "spatialLIBD: human lymph node by 10x Genomics", spe_discrete_vars = c("random_cluster", "ManualAnnotation"), spe_continuous_vars = c("sum_umi", "sum_gene", "expr_chrM", "expr_chrM_ratio"), default_cluster = "random_cluster" ) }
We also recommend creating custom website documentation files as described in the documentation of spatialLIBD::run_app()
. Those documentation files will help you describe your project to your users in a more personalized way. The easiest way to start is to copy our documentation files to a new location and adapt them. You can locate them at the following path.
## Locate our documentation files docs_path <- system.file("app", "www", package = "spatialLIBD") docs_path list.files(docs_path)
To facilitate reading in the data and preparing it to visualize it interactively using spatialLIBD::run_app()
, we implemented read10xVisiumWrapper()
which expands SpatialExperiment::read10xVisium()
and performs the steps described in this vignette. In this example, we'll load all four images created by SpaceRanger: lowres, hires, detected, and aligned. That way we can toggle between them on the web application.
## Import the data as a SpatialExperiment object spe_wrapper <- read10xVisiumWrapper( samples = file.path(tempdir(), "outs"), sample_id = "lymph", type = "sparse", data = "filtered", images = c("lowres", "hires", "detected", "aligned"), load = TRUE, reference_gtf = gtf_cache )
## Run our shiny app if (interactive()) { vars <- colnames(colData(spe_wrapper)) run_app( spe_wrapper, sce_layer = NULL, modeling_results = NULL, sig_genes = NULL, title = "spatialLIBD: human lymph node by 10x Genomics (made with wrapper)", spe_discrete_vars = c(vars[grep("10x_", vars)], "ManualAnnotation"), spe_continuous_vars = c("sum_umi", "sum_gene", "expr_chrM", "expr_chrM_ratio"), default_cluster = "10x_graphclust" ) }
Now we have have our spe_wrapper
object and have verified that the web application works, we can share it with others through shinyapps.io by RStudio if our data is small enough. To do, you will need to:
app_dir
which will contain the script for loading the data and running the web applicationspe_wrapper
object into that directory such as app_dir/spe_wrapper.rds
app.R
script and save it at app_dir/app.R
system.file("app", "www", package = "spatialLIBD")
into app_dir/www
app_dir/deploy.R
scriptBelow we illustrate these steps
## Directory we created to host the data for the web application ## Use a directory of your preference instead of copy-pasting this code app_dir <- here::here("inst", "spe_wrapper_app") dir.create(app_dir, showWarnings = FALSE) ## Code we used to save the data saveRDS(spe_wrapper, file = file.path(app_dir, "spe_wrapper.rds")) ## Copy the contents of system.file("app", "www", package = "spatialLIBD") file.copy(system.file("app", "www", package = "spatialLIBD"), app_dir, recursive = TRUE) ## Manually edit them to your liking.
Next, create an app_dir/app.R
script with these contents.
cat(paste0(readLines(system.file("spe_wrapper_app", "app.R", package = "spatialLIBD")), "\n"))
Next create an app_dir/deploy.R
script with these contents. You will need to login to your shinyapps.io account to get the exact authentication details for your access token.
cat(paste0(readLines(system.file("spe_wrapper_app", "deploy.R", package = "spatialLIBD")), "\n"))
Next, you should test that app_dir/app.R
works locally before running app_dir/deploy.R
.
You'll end up with a website just like this one. In our case, we further configured our website through the shinyapps.io dashboard. We selected the following options:
Instance Size
: 3X-Large (8GB)Max Worker Processes
: 1. Setting this to one helps reduce the memory burden.Max Connections
: 15. You don't want Max Worker Processes
multiplied by Max Connections
and the object memory to exceed the Instance Size
. So this this value to 8 / (lobstr::obj_size(spe_wrapper) / 1024^3)
or lower.Startup Timeout
: 300. This one is important otherwise your app might error out before even loading since the default is 5 seconds, which is typically too short.Idle Timeout
: 60. This is useful since you might want to give users as much time for their manual annotations as possible.spatialLIBD::run_app()
has limitations that are inherent to the methods used to implement it, such as:
Regarding the memory limitation, you can estimate how much memory you need per user by considering the memory required for the spe
and sce_layer
objects.
lobstr::obj_size(spe) / 1024^2 ## Convert to MB
In our pilot data r Citep(bib[['spatialLIBDpaper']])
our object uses about 2.1 GB of RAM since it contains the data for 12 Visium slides and we considered using about 3 GB of RAM per user. You could filter the genes more aggressively to drop lowly expressed genes or if you have many Visium slides, you could consider making multiple websites for different sets of slides. You could also have multiple mirrors to support several users, though in that case, we recommend linking users to a stable website instead of one that might not be available if you have too many users: for us our stable website is http://research.libd.org/spatialLIBD/ which includes the links to all the mirrors.
Given these memory limitations, we chose to deploy our main web application at http://spatial.libd.org/spatialLIBD/ using an Amazon EC2 instance: an 'r5.4xlarge' EC2 instance with 16 vCPUs, 128 GB DRAM, 10 Gb max network, 1.008 USD/Hour. We also have deployed mirrors at https://www.shinyapps.io/ using the "3X-Large (8 GB)" instances they provide.
This limitation is mostly due to the number of spots shown under the "clusters (interactive)" section of the interactive website powered by r CRANpkg("plotly")
. Each spot is shown four times which is about 16 thousand spots for one Visium slide (depending on any filter steps you applied). The response time will depend on your own computer RAM memory, that is, the client side. This limitation might be more noticeable if you have a computer with 8GB of RAM or lower, as well as if you have other high-memory software open. Furthermore, if you are running web application locally through spatialLIBD::run_app()
then you also need to consider the required memory for the R objects. That is, the server side memory use.
Thanks to Jesús Vélez Santiago, the app is more responsive as of version 1.3.15 by using plotly::toWebGL()
.
When you construct the SpatialExperiment spe
object with r Biocpkg("SpatialExperiment")
, you can read in higher resolution images. However, the benefit of loading the raw histology images (500 MB to 20 GB per image) is likely non-existent in this web application. The memory required would likely become prohibitive. Other solutions load these raw histology images in chunks and display the chunks necessary for a given visualization area. We thus recommend using other software if you want to zoom in at the spot and/or cell resolution.
While the documentation, title, icon and HTML footer are all customizable at spatialLIBD::run_app()
, ultimately the panels shown are not unless you fork and adapt the internal code of this package. Thus, the interactive web applications powered by r Biocpkg("spatialLIBD")
are not as easy to customize as say r Biocpkg("iSEE")
web applications are. We think of our web application as a good enough prototype that can be useful for initial explorations of 10x Genomics Visium data. We welcome additions to our code, though we recognize that you might want to build your own production-level solution.
The r Biocpkg("spatialLIBD")
package r Citep(bib[["spatialLIBD"]])
was made possible thanks to:
r Citep(bib[["R"]])
r Biocpkg("BiocFileCache")
r Citep(bib[["BiocFileCache"]])
r Biocpkg("BiocStyle")
r Citep(bib[["BiocStyle"]])
r CRANpkg("knitr")
r Citep(bib[["knitr"]])
r CRANpkg("lobstr")
r Citep(bib[["lobstr"]])
r CRANpkg("RefManageR")
r Citep(bib[["RefManageR"]])
r CRANpkg("rmarkdown")
r Citep(bib[["rmarkdown"]])
r Biocpkg("rtracklayer")
r Citep(bib[["rtracklayer"]])
r CRANpkg("sessioninfo")
r Citep(bib[["sessioninfo"]])
r Biocpkg("SpatialExperiment")
r Citep(bib[["SpatialExperiment"]])
r CRANpkg("testthat")
r Citep(bib[["testthat"]])
This package was developed using r BiocStyle::Biocpkg("biocthis")
.
Code for creating the vignette
## Create the vignette library("rmarkdown") system.time(render("TenX_data_download.Rmd", "BiocStyle::html_document")) ## Extract the R code library("knitr") knit("TenX_data_download.Rmd", tangle = TRUE)
Date the vignette was generated.
## Date the vignette was generated Sys.time()
Wallclock time spent generating the vignette.
## Processing time in seconds totalTime <- diff(c(startTime, Sys.time())) round(totalTime, digits = 3)
R
session information.
## Session info library("sessioninfo") options(width = 120) session_info()
This vignette was generated using r Biocpkg("BiocStyle")
r Citep(bib[["BiocStyle"]])
with r CRANpkg("knitr")
r Citep(bib[["knitr"]])
and r CRANpkg("rmarkdown")
r Citep(bib[["rmarkdown"]])
running behind the scenes.
Citations made with r CRANpkg("RefManageR")
r Citep(bib[["RefManageR"]])
.
## Print bibliography PrintBibliography(bib, .opts = list(hyperlink = "to.doc", style = "html"))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.