knitr::opts_chunk$set(echo = TRUE)
Clean, normalize and cluster the Socrates
object that is automatically loaded into the namespace with the package. In this example, we will use tfidf
for a bit of a speed-up over other available options.
# load Socrates library(Socrates) # filter, normalize, and reduce dimensions obj <- cleanData(obj, min.c=1000, min.t=0.001, max.t=0.005) obj <- tfidf(obj) obj <- reduceDims(obj, n.pcs=50, cor.max=0.7) obj <- projectUMAP(obj) # cluster obj <- callClusters(obj, res=0.6)
To find co-accessible ACRs, Socrates
needs a path to the reference genome. A chromosome lengths file for hg19 is provided in Socrates
.
# get genome annotation genome <- system.file("extdata", "hg19.txt", package="Socrates")
Socrates
relies on the R package cicero
, developed by the Trapenll and Shendure labs to identify co-accessible ACRs. Please cite their work here if using Socrates
to identify co-accessible regions. For more information and documentation for cicero
, visit the software website site here.
In the example below, we will identify co-accessible ACRs using all cells by aggregating k=30 nearest neighbors from the UMAP embedding. Users can adjust various parameters that are provided by cicero
, in addition to added functionalities explored later in this tutorial.
# find co-accessible ACRs in the entire data set (not subset by cluster) obj <- coAccess(obj, genome=genome)
In addition to standard co-accessible ACR calling across all cells, Socrates
facilitate cluster/cell type-level resolution for co-accessible ACR identification. Essentially, Socrates
splits cells into their cognate groups, and runs cicero
on each of these groups, independtly. Running Socrates
on each cluster/cell type is simple as specifying an additional parameter in the function coAccess
. Co-accessible ACRs are identified for each group in the groupID
column of the meta data table obj$Clusters
slot. The default groupID
is "LouvainClusters", which is the default column name for the function callClusters
. See ?callClusters
and ?coAccess
for more details. An example command is illustrated below.
# find co-accessible ACRs by cluster obj <- coAccess(obj, genome=genome, byGroup=T)
Removing false positive links based solely on the degree of co-accessibility is a challenge because different clusters/cell types are afflicted by varying degrees of technical variation, such as read depth. A simple approach to estimate the false discovery rate is to shuffle the ACR x barcode matrix, keeping the number of accessible ACRs per cell and the number of accessible cells per ACR identical to the original matrix. Co-accessible ACR identification is repeated and heuristic thresholds removing 95% (or other FDRs of the users choice) of co-accessible ACRs from the shuffled matrix (for positive and negatively associated ACRs) are identified. These thresholds are then applied to the original, non-shuffled matrix, resembling FDR controlled co-accessible ACRs. Users can apply the eFDR method to co-accessible ACRs identification on a per cluster/cell type basis by specifying the byGroup
and groupID
arguments, as described above. An example of co-accessible ACR calling with eFDR filtering is illustrated below.
# find co-accessible ACRs with eFDR obj <- coAccess(obj, genome=genome, eFDR=TRUE, fdr_thresh=0.05, verbose=T) # co-accessible ACRs are stored in the 'obj$coACR' slot. head(obj$coACRs)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.