nnSVG | R Documentation |
Function to run 'nnSVG' method to identify spatially variable genes (SVGs) in spatially-resolved transcriptomics data.
nnSVG(
input,
spatial_coords = NULL,
X = NULL,
assay_name = "logcounts",
n_neighbors = 10,
order = "AMMD",
n_threads = 1,
BPPARAM = NULL,
verbose = FALSE
)
input |
|
spatial_coords |
|
X |
|
assay_name |
|
n_neighbors |
|
order |
|
n_threads |
|
BPPARAM |
|
verbose |
|
Function to run 'nnSVG' method to identify spatially variable genes (SVGs) in spatially-resolved transcriptomics data.
The 'nnSVG' method is based on nearest-neighbor Gaussian processes (Datta et al. 2016) and uses the BRISC algorithm (Saha and Datta 2018) for model fitting and parameter estimation. The method scales linearly with the number of spatial locations, and can be applied to datasets containing thousands or more spatial locations. For more details on the method, see our paper.
This function runs 'nnSVG' for a full dataset. The function fits a separate model for each gene, using parallelization with BiocParallel for faster runtime. The parameter estimates from BRISC (sigma.sq, tau.sq, phi) for each gene are stored in 'Theta' in the BRISC output.
Note that the method and this function are designed for a single tissue section. For an example of how to run nnSVG in a dataset consisting of multiple tissue sections, see the tutorial in the nnSVG package vignette.
'nnSVG' performs inference on the spatial variance parameter estimates (sigma.sq) using a likelihood ratio (LR) test against a simpler linear model without spatial terms (i.e. without tau.sq or phi). The estimated LR statistics can then be used to rank SVGs. P-values are calculated from the LR statistics using the asymptotic chi-squared distribution with 2 degrees of freedom, and multiple testing adjusted p-values are calculated using the Benjamini-Hochberg method. We also calculate an effect size, defined as the proportion of spatial variance, 'prop_sv = sigma.sq / (sigma.sq + tau.sq)'.
The function assumes the input is provided either as a
SpatialExperiment
object or a numeric
matrix of values. If the
input is a SpatialExperiment
object, it is assumed to contain an
assay
slot containing either log-transformed normalized counts (also
known as logcounts, e.g. from the scran
package) or deviance residuals
(e.g. from the scry
package), which have been preprocessed, quality
controlled, and filtered to remove low-quality spatial locations. If the
input is a numeric
matrix of values, these values are assumed to
already be normalized and transformed (e.g. logcounts).
If the input was provided as a SpatialExperiment
object, the
output values are returned as additional columns in the rowData
slot
of the input object. If the input was provided as a numeric
matrix
of values, the output is returned as a numeric
matrix. The output
values include spatial variance parameter estimates, likelihood ratio (LR)
statistics, effect sizes (proportion of spatial variance), p-values, and
multiple testing adjusted p-values.
library(SpatialExperiment)
library(STexampleData)
library(scran)
### Example 1
### for more details see extended example in vignette
# load example dataset from STexampleData package
spe1 <- Visium_humanDLPFC()
# preprocessing steps
# keep only spots over tissue
spe1 <- spe1[, colData(spe1)$in_tissue == 1]
# skip spot-level quality control (already performed in this dataset)
# filter low-expressed and mitochondrial genes
spe1 <- filter_genes(spe1)
# calculate logcounts using library size factors
spe1 <- computeLibraryFactors(spe1)
spe1 <- logNormCounts(spe1)
# select small number of genes for fast runtime in this example
set.seed(123)
ix <- c(
which(rowData(spe1)$gene_name %in% c("PCP4", "NPY")),
sample(seq_len(nrow(spe1)), 2)
)
spe1 <- spe1[ix, ]
# run nnSVG
set.seed(123)
spe1 <- nnSVG(spe1)
# show results
rowData(spe1)
### Example 2: With covariates
### for more details see extended example in vignette
# load example dataset from STexampleData package
spe2 <- SlideSeqV2_mouseHPC()
# preprocessing steps
# remove spots with NA cell type labels
spe2 <- spe2[, !is.na(colData(spe2)$celltype)]
# skip spot-level quality control (already performed in this dataset)
# filter low-expressed and mitochondrial genes
spe2 <- filter_genes(
spe2, filter_genes_ncounts = 1, filter_genes_pcspots = 1,
filter_mito = TRUE
)
# calculate logcounts using library size normalization
spe2 <- computeLibraryFactors(spe2)
spe2 <- logNormCounts(spe2)
# select small number of genes for fast runtime in this example
set.seed(123)
ix <- c(
which(rowData(spe2)$gene_name %in% c("Cpne9", "Rgs14")),
sample(seq_len(nrow(spe2)), 2)
)
spe2 <- spe2[ix, ]
# create model matrix for cell type labels
X <- model.matrix(~ colData(spe2)$celltype)
# run nnSVG with covariates
set.seed(123)
spe2 <- nnSVG(spe2, X = X)
# show results
rowData(spe2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.