library(spacexr) library(Matrix) library(doParallel) library(ggplot2) datadir <- system.file("extdata",'SpatialRNA/Vignette',package = 'spacexr')# directory for sample Slide-seq dataset datadir2 <- system.file("extdata",'SpatialRNA/Vignette_rep2',package = 'spacexr') datadir3 <- system.file("extdata",'SpatialRNA/Vignette_rep3',package = 'spacexr') datadir4 <- system.file("extdata",'SpatialRNA/Vignette_rep4',package = 'spacexr') if(!dir.exists(datadir)) dir.create(datadir) savedir <- 'RCTD_results' if(!dir.exists(savedir)) dir.create(savedir)
knitr::opts_chunk$set( collapse = TRUE, comment = "#>", cache = TRUE, out.width = "100%" )
Cell type-Specific Inference of Differential Expression, or CSIDE, is part of the spacexr R package for learning cell type-specific differential expression from spatial transcriptomics data. In this Vignette, we will demonstrate running RCTD and CSIDE on multiple replicates (slices) of cerebellum Slide-seq toy data.
The ultimate goal will be to test for differential expression across two regions in the cerbellum. First, we will first use RCTD to assign cell types to a cerebellum Slide-seq dataset. We will define cell type profiles using an annotated single nucleus RNA-sequencing (snRNA-seq) cerebellum dataset. We will test for differential expression between the nodulus and anterior lobe regions.
First, we run RCTD on the data to annotated cell types (doublet mode). Please refer to the spatial transcriptomics vignette for more explanation on the RCTD algorithm, such as precise details for loading in the SpatialRNA
object for this dataset. Since we need to load in the SpatialRNA
object for four replicates, we will apply the function read.SpatialRNA
to the directories of each of the four datasets to generate puck1
, puck2
, puck3
, and puck4
. Please note that in general, we recommend applying to each replicate a custom function that can load in a single replicate.
### Load in/preprocess your data, this might vary based on your file type puck1 <- read.SpatialRNA(datadir, count_file = "MappedDGEForR.csv", coords_file = 'BeadLocationsForR.csv') puck2 <- read.SpatialRNA(datadir2, count_file = "MappedDGEForR.csv", coords_file = 'BeadLocationsForR.csv') puck3 <- read.SpatialRNA(datadir3, count_file = "MappedDGEForR.csv", coords_file = 'BeadLocationsForR.csv') puck4 <- read.SpatialRNA(datadir4, count_file = "MappedDGEForR.csv", coords_file = 'BeadLocationsForR.csv')
In order to run CSIDE, we will for each replicate create a explanatory variable (i.e. covariate) used for predicting differential expression in CSIDE. In general one should set the explanatory variable to biologically relevant predictors of gene expression such as spatial position. In this case, we will use the explanatory variable two divide each cerebellum replicate into two regions (left and right). In general, one can also create a custom function that will compute the explanatory variable on each SpatialRNA
replicate. We will pass in a list of the explanatory variables to the run.CSIDE.replicates
function.
Here, we also artifically upregulate the expression of the Kcnip4 gene in each dataset (in regions of high explanatory variable) to see whether CSIDE can detect this differentially expressed gene.
explanatory.variable1 <- as.integer(puck1@coords$x > 3000) names(explanatory.variable1) <- rownames(puck1@coords) explanatory.variable2 <- as.integer(puck2@coords$x > 3000) names(explanatory.variable2) <- rownames(puck2@coords) explanatory.variable3 <- as.integer(puck3@coords$x > 3000) names(explanatory.variable3) <- rownames(puck3@coords) explanatory.variable4 <- as.integer(puck4@coords$x > 3000) names(explanatory.variable4) <- rownames(puck4@coords) exvar_list <- list(explanatory.variable1, explanatory.variable2, explanatory.variable3, explanatory.variable4) # Differentially upregulate one gene change_gene <- 'Kcnip4' high_barc1 <- names(explanatory.variable1[explanatory.variable1 > 0.5]) puck1@counts[change_gene, high_barc1] <- puck1@counts[change_gene, high_barc1] * 2 high_barc2 <- names(explanatory.variable2[explanatory.variable2 > 0.5]) puck2@counts[change_gene, high_barc2] <- puck1@counts[change_gene, high_barc2] * 2 high_barc3 <- names(explanatory.variable3[explanatory.variable3 > 0.5]) puck3@counts[change_gene, high_barc3] <- puck3@counts[change_gene, high_barc3] * 2 high_barc4 <- names(explanatory.variable4[explanatory.variable4 > 0.5]) puck4@counts[change_gene, high_barc4] <- puck4@counts[change_gene, high_barc4] * 2
We also load in the Reference
object, as follows.
refdir <- system.file("extdata",'Reference/Vignette',package = 'spacexr') # directory for the reference counts <- read.csv(file.path(refdir,"dge.csv")) # load in counts matrix rownames(counts) <- counts[,1]; counts[,1] <- NULL # Move first column to rownames meta_data <- read.csv(file.path(refdir,"meta_data.csv")) # load in meta_data (barcodes, clusters, and nUMI) cell_types <- meta_data$cluster; names(cell_types) <- meta_data$barcode # create cell_types named list cell_types <- as.factor(cell_types) # convert to factor data type nUMI <- meta_data$nUMI; names(nUMI) <- meta_data$barcode # create nUMI named list reference <- Reference(counts, cell_types, nUMI)
Now, we are in a position to create a RCTD.replicates
object, an object that can store multiple SpatialRNA
objects allowing for joint processing with RCTD and CSIDE. We pass in spatial.replicates
, a list of the four SpatialRNA
objects, replicate_names
, a list of names for each replicate, and group_ids
, a list of the group number for each replicate. In this case, all replicates will come from the same group. However, we have the option to assign different replicates to different groups, which will allow for extra variation to be modeled across groups.
spatial.replicates <- list(puck1, puck2, puck3, puck4) replicate_names <- c('1','2','3','4') group_ids <- c(1,1,1,1) myRCTD.reps <- create.RCTD.replicates(spatial.replicates, reference, replicate_names, group_ids = group_ids, max_cores = 2)
We are now ready to run RCTD with the run.RCTD.replicates
function to assign cell types to these replicates. This function runs RCTD on each replicate in the RCTD.replicate
object.
myRCTD.reps <- run.RCTD.replicates(myRCTD.reps)
After creating the explanatory variables and running RCTD, we are now ready to run CSIDE using the run.CSIDE.replicates
function. Here, we pass in exvar_list
, the list of explanatory variables for each replicate. We will use cell types 10 and 18 and a false discovery rate of 0.25. Next, we will set a gene threshold (i.e. minimum gene expression) of 0.01, and we will set a cell_type_threshold (minimum instances per cell type) of 3.
Warning: On this toy dataset, we have made several choices of parameters that are not recommended for regular use. On real datasets, we recommend first consulting the CSIDE default parameters. This includes gene_threshold
(default 5e-5), cell_type_threshold
(default 125), and fdr
(default 0.01). Please see ?run.CSIDE.replicates
for more information on these parameters.
#de cell_types <- c('10','18', '6') myRCTD.reps <- run.CSIDE.replicates(myRCTD.reps, cell_types, exvar_list, gene_threshold = .01, cell_type_threshold = 3, fdr = 0.25, weight_threshold = 0.8) saveRDS(myRCTD.reps,file.path(savedir,'myRCTDde_reps.rds'))
Note that the RCTD and CSIDE results on individual replicates are stored in myRCTD.reps@RCTD.reps
, which is a list of RCTD
objects containing individual CSIDE results as discussed in other vignettes.
After running CSIDE on each replicate, the next step is to compute population-level differential expression across all samples using CSIDE.population.inference
. There are two ways to run this function, and we will explore both.
The default mode for CSIDE.population.inference
is to aggregate differential expression estimates across samples to form a population "average" of differential expression:
myRCTD.reps <- CSIDE.population.inference(myRCTD.reps, fdr = 0.25)
We will focus on cell type 10. Furthermore, we will examine the original Kcnip4 gene, which was detected to be significantly differentially expressed in cell type 10. Other genes, including nonsignificant genes, can be observed in myRCTD.reps@population_de_results
. As shown below, DE results on significant genes are stored in population_sig_gene_df
. In particular, notice the columns Z_est
(Z-score), log_fc_est
(estimated DE coefficient), sd_est
(standard error of DE coefficient), p
(p-value), and q_val
(q-value). Other columns of interest include tau
, the estimated variance across samples, and mean i
and sd i
, which represent the DE point estimate and standard error on each of the four replicates.
#print results for cell type '18' cell_type <- '10' sig_gene <- change_gene print(paste("following results hold for", sig_gene)) print(myRCTD.reps@population_sig_gene_df[[cell_type]])
Finally, we will save population CSIDE results in the savedir
directory:
if(!dir.exists(file.path(savedir,'population'))) dir.create(file.path(savedir,'population')) save.CSIDE.replicates(myRCTD.reps, file.path(savedir,'population'))
The second way to run CSIDE population inference is meta regression, which is more flexible than aggregating across samples. For example, one may control for covariates such as age and gender, as well as test across covariates such as timepoints or case vs. control. In this example, we will assume that the four samples came from two cases and two controls. Meta regression works by including a design matrix, meta.design.matrix
, of named covariates across samples. In CSIDE.population.inference
, we set the option meta = TRUE
, and meta.test_var = 'treat'
, the variable we would like to test.
meta.design.matrix <- data.frame('treat' = c(0,1,0,1)) print(meta.design.matrix) myRCTD.reps_meta <- CSIDE.population.inference(myRCTD.reps, fdr = 0.25, meta = TRUE, meta.design.matrix = meta.design.matrix, meta.test_var = 'treat') head(myRCTD.reps_meta@population_de_results$`10`)
Compared to the default analysis, this population-level analysis is measuring something different, the effect of treatment on differential expression. For example, the lowest p-value gene (yet not < 0.05), Snap25, has an estimated effect of treatment of 0.56. Notice that Kcnip4 is no longer estimated to have a large effect, because the question of difference between cases and controls is fundamentally different from estimating the average across samples.
Finally, we will save population CSIDE results in the savedir
directory:
if(!dir.exists(file.path(savedir,'population_meta'))) dir.create(file.path(savedir,'population_meta')) save.CSIDE.replicates(myRCTD.reps_meta, file.path(savedir,'population_meta'))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.