The Signac package is an extension of Seurat designed for the analysis of genomic single-cell assays. This includes any assay that generates signal mapped to genomic coordinates, such as scATAC-seq, scCUT&Tag, scACT-seq, and other methods.
As the analysis of these single-cell chromatin datasets presents some unique
challenges in comparison to the analysis of scRNA-seq data, we have created an
extended Assay
class to store the additional information needed, including:
A major advantage of the Signac design is its interoperability with existing functions in the Seurat package, and other packages that are able to use the Seurat object. This enables straightforward analysis of multimodal single-cell data through the addition of different assays to the Seurat object.
Here we outline the design of each class defined in the Signac package, and demonstrate methods that can be run on each class.
ChromatinAssay
ClassThe ChromatinAssay
class extends the standard Seurat Assay
class and adds
several additional slots for data useful for the analysis of single-cell
chromatin datasets. The class includes all the slots present in a standard
Seurat Assay,
with the following additional slots:
ranges
: A GRanges
object containing the genomic coordinates of each feature in the data
matrix.motifs
: A Motif
objectfragments
: A list of Fragment
objectsseqinfo
: A Seqinfo
object containing information about the genome that the data was mapped toannotation
: A GRanges
object containing gene annotationsbias
: A vector containing Tn5 integration bias information (the frequency of
Tn5 integration at different hexamers)positionEnrichment
: A named list of matrices containing positional
enrichment scores for Tn5 integration (for example, enrichment at the TSS or at
different TF motifs)links
: A GRanges
object describing linked genomic positions, such as co-accessible sites or
enhancer-gene regulatory relationships.library(Seurat) library(Signac)
ChromatinAssay
A ChromatinAssay
object can be constructed using the
CreateChromatinAssay()
function.
# get some data to use in the following examples counts <- LayerData(atac_small, layer = "counts")
Note that older versions of Seurat (<5) use the GetAssayData()
function
instead of LayerData()
.
# create a standalone ChromatinAssay object chromatinassay <- CreateChromatinAssay(counts = counts, genome = "hg19")
Here the genome
parameter can be used to set the seqinfo
slot. We can pass
the name of a genome present in UCSC (e.g., "hg19" or "mm10"), or we can pass
a Seqinfo
-class object.
To create a Seurat object that contains a ChromatinAssay
rather than a
standard Assay
, we can initialize the object using the ChromatinAssay
rather
than a count matrix. Note that this feature was added in Seurat 3.2.
# create a Seurat object containing a ChromatinAssay object <- CreateSeuratObject(counts = chromatinassay)
ChromatinAssay
to a Seurat
objectTo add a new ChromatinAssay
object to an existing Seurat object, we can use
the standard assignment operation used for adding standard Assay
objects and
other data types to the Seurat object.
# create a chromatin assay and add it to an existing Seurat object object[["peaks"]] <- CreateChromatinAssay(counts = counts, genome = "hg19")
ChromatinAssay
dataWe can get/set data for the ChromatinAssay
in much the same way we do for a
standard Assay
object: using the LayerData
function
defined in Seurat
. For example:
## Getting # access the data slot, found in standard Assays and ChromatinAssays data <- LayerData(atac_small, layer = "data") # access the bias slot, unique to the ChromatinAssay bias <- LayerData(atac_small, layer = "bias") ## Setting # set the data slot LayerData(atac_small, layer = "data") <- data # atac_small <- SetAssayData(atac_small, slot = "data", new.data = data) # set the bias slot bias <- rep(1, 100) # create a dummy bias vector LayerData(atac_small, layer = "bias") <- bias # atac_small <- SetAssayData(atac_small, slot = "bias", new.data = bias)
We also have a variety of convenience functions defined for getting/setting
data in specific slots. This includes the Fragments()
, Motifs()
, Links()
,
and Annotation()
functions. For example, to get or set gene annotation data we
can use the Annotation()
getter and Annotation<-
setter functions:
# first get some gene annotations for hg19 library(EnsDb.Hsapiens.v75) # convert EnsDb to GRanges gene.ranges <- GetGRangesFromEnsDb(ensdb = EnsDb.Hsapiens.v75) # convert to UCSC style seqlevels(gene.ranges) <- paste0('chr', seqlevels(gene.ranges)) genome(gene.ranges) <- "hg19" # set gene annotations Annotation(atac_small) <- gene.ranges # get gene annotation information Annotation(atac_small)
The Fragments()
, Motifs()
, and Links()
functions are demonstrated in other sections
below.
ChromatinAssay
methodsAs the ChromatinAssay
object uses Bioconductor objects like GRanges
and Seqinfo
, we can also call standard Bioconductor functions defined in the
IRanges
, GenomicRanges
, and GenomeInfoDb
packages on the ChromatinAssay
object (or a Seurat object with a ChromatinAssay
as the default assay).
The following methods use the genomic ranges stored in a ChromatinAssay
object.
# extract the genomic ranges associated with each feature in the data matrix granges(atac_small) # find the nearest range nearest(atac_small, subject = Annotation(atac_small)) # distance to the nearest range distanceToNearest(atac_small, subject = Annotation(atac_small)) # find overlaps with another set of genomic ranges findOverlaps(atac_small, subject = Annotation(atac_small))
Many other methods are defined, see the documentation for nearest-methods
,
findOverlaps-methods
, inter-range-methods
, and coverage
in Signac for a
full list.
The following methods use the seqinfo
data stored in a ChromatinAssay
object.
# get the full seqinfo information seqinfo(atac_small) # get the genome information genome(atac_small) # find length of each chromosome seqlengths(atac_small) # find name of each chromosome seqnames(atac_small) # assign a new genome genome(atac_small) <- "hg19"
Again, several other methods are available that are not listed here. See the
documentation for seqinfo-methods
in Signac for a full list.
For a full list of methods for the ChromatinAssay
class run:
methods(class = 'ChromatinAssay')
ChromatinAssay
We can use the standard subset()
function or the [
operator to subset Seurat
object containing ChromatinAssay
s. This works the same way as for standard
Assay
objects.
# subset using the subset() function # this is meant for interactive use subset.obj <- subset(atac_small, subset = nCount_peaks > 100) # subset using the [ extract operator # this can be used programmatically subset.obj <- atac_small[, atac_small$nCount_peaks > 100]
Assay
and ChromatinAssay
To convert from a ChromatinAssay
to a standard Assay
use the as()
function
# convert a ChromatinAssay to an Assay assay <- as(object = atac_small[["peaks"]], Class = "Assay") assay
To convert from a standard Assay
to a ChromatinAssay
we use the
as.ChromatinAssay()
function. This takes a standard assay object, as well as
information to fill the additional slots in the ChromatinAssay
class.
# convert an Assay to a ChromatinAssay chromatinassay <- as.ChromatinAssay(assay, seqinfo = "hg19") chromatinassay
Fragment
ClassThe Fragment
class is designed for storing and interacting with a
fragment file
commonly used for single-cell chromatin data. It contains the path to an indexed
fragment file on disk, a MD5 hash for the fragment file and the fragment file
index, and a vector of cell names contained in the fragment file. Importantly,
this is a named vector where the elements of the vector are the cell names as
they appear in the fragment file, and the name of each element is the cell
name as it appears in the ChromatinAssay
object storing the Fragment
object. This allows a mapping of cell names on disk to cell names in R, and
avoids the need to alter fragment files on disk. This path can also be a remote
file accessible by http
or ftp
.
Fragment
classA Fragment
object can be constructed using the CreateFragmentObject()
function.
frag.path <- system.file("extdata", "fragments.tsv.gz", package="Signac") fragments <- CreateFragmentObject( path = frag.path, cells = colnames(atac_small), validate.fragments = TRUE )
The validate.fragments
parameter controls whether the file is inspected to
check whether the expected cell names are present. This can help avoid assigning
the wrong fragment file to the object. If you're sure that the file is correct,
you can set this value to FALSE
to skip this step and save some time. This
check is typically only run once when the Fragment
object is created, and is
not normally run on existing Fragment
files.
To extract the first few lines of a fragment file on-disk, we can use the
head()
method defined for Fragment
objects. This is useful for quickly
checking the chromosome naming style in our fragment file, or checking how the
cell barcodes are named:
head(fragments)
Fragment
object to the ChromatinAssay
A ChromatinAssay
object can contain a list of Fragment
objects. This avoids
the need to merge fragment files on disk and simplifies processes of merging
or integrating different Seurat objects containing ChromatinAssay
s. To add
a new Fragment
object to a ChromatinAssay
, or a Seurat object containing a
ChromatinAssay
, we can use the Fragments<-
assignment function. This will
do a few things:
Fragment
object was created.Fragment
object being added
are already contained in another Fragment
object stored in the
ChromatinAssay
. All fragments from a cell must be present in only one fragment
file.Fragment
object to the list of Fragment
objects stored in the
ChromatinAssay
.Fragments(atac_small) <- fragments
The show()
method for Fragment
-class objects prints the number of cells that
the Fragment
object contains data for.
fragments
Alternatively, we can initialize the ChromatinAssay
with a Fragment
object
in a couple of ways. We can either pass a vector of Fragment
objects to the
fragments
parameter in CreateChromatinAssay()
, or pass the path to a
single fragment file. If we pass the path to a fragment file we assume that the
file contains fragments for all cells in the ChromatinAssay
and that the cell
names are the same in the fragment file on disk and in the ChromatinAssay
.
For example:
chrom_assay <- CreateChromatinAssay( counts = counts, genome = "hg19", fragments = frag.path ) object <- CreateSeuratObject( counts = chrom_assay, assay = "peaks" )
This will create a Seurat object containing a ChromatinAssay
, with a single
Fragment
object.
Fragment
object from the ChromatinAssay
All the Fragment
objects associated with a ChromatinAssay
can be removed
by assigning NULL
using the Fragment<-
assignment function. For example:
Fragments(chrom_assay) <- NULL Fragments(chrom_assay)
To remove a subset of Fragment
object from the list of Fragment
objects
stored in the ChromatinAssay
, you will need to extract the list of Fragment
objects using the Fragments()
function, subset the list of objects, then
assign the subsetted list to the assay using the LayerData()
function. For example:
LayerData(chrom_assay, layer = "fragments") <- fragments # chrom_assay <- SetAssayData(chrom_assay, slot = "fragments", new.data = fragments) Fragments(chrom_assay)
Fragment
objectThe path to the fragment file can be updated using the UpdatePath()
function.
This can be useful if you move the fragment file to a new directory, or if you
copy a stored Seurat object containing a ChromatinAssay
to a different server.
fragments <- UpdatePath(fragments, new.path = frag.path)
To change the path to fragment files in an object, you will need to remove the fragment objects, update the paths, and then add the fragment objects back to the object. For example:
frags <- Fragments(object) # get list of fragment objects Fragments(object) <- NULL # remove fragment information from assay # create a vector with all the new paths, in the correct order for your list of fragment objects # In this case we only have 1 new.paths <- list(frag.path) for (i in seq_along(frags)) { frags[[i]] <- UpdatePath(frags[[i]], new.path = new.paths[[i]]) # update path } Fragments(object) <- frags # assign updated list back to the object Fragments(object)
Fragment files hosted on remote servers accessible via http or ftp can also be
added to the ChromatinAssay
in the same way as for locally-hosted fragment
files. This can enable the exploration of large single-cell datasets without
the need for downloading large files. For example, we can create a Fragment
object using a file hosted on the 10x Genomics website:
fragments <- CreateFragmentObject( path = "http://cf.10xgenomics.com/samples/cell-atac/1.1.0/atac_v1_pbmc_10k/atac_v1_pbmc_10k_fragments.tsv.gz" ) fragments
When files are hosted remotely, the checks described in the section above (MD5 hash and expected cells) are not performed.
Fragment
dataTo access the cell names stored in a Fragment
object, we can use the Cells()
function. Importantly, this returns the cell names as they appear in the
ChromatinAssay
, rather than as they appear in the fragment file itself.
fragments <- CreateFragmentObject( path = frag.path, cells = colnames(atac_small), validate.fragments = TRUE ) cells <- Cells(fragments) head(cells)
Similarly, we can set the cell name information in a Fragment
object using the
Cells<-
assignment function. This will set the named vector of cells stored in
the Fragment
object. Here we must supply a named vector.
names(cells) <- cells Cells(fragments) <- cells
To extract any of the data stored in a Fragment
object we can also use the
GetFragmentData()
function. For example, we can find the path to the fragment
file on disk:
GetFragmentData(object = fragments, slot = "path")
For a full list of methods for the Fragment
class run:
methods(class = 'Fragment')
Motif
ClassThe Motif
class stores information needed for DNA sequence motif analysis, and
has the following slots:
data
: a sparse feature by motif matrix, where entries are 1 if the feature
contains the motif, and 0 otherwisepwm
: A named list of position weight or position frequency matricesmotif.names
: a list of motif IDs and their common namespositions
: A GRangesList
object containing the exact positions of each
motifmeta.data
: Additional information about the motifs Many of these slots are optional and do not need to be filled, but are only
required when running certain functions. For example, the positions
slot
will be needed if running TF footprinting.
Motif
classA Motif
object can be constructed using the CreateMotifObject()
function.
Much of the data needed for constructing a Motif
object can be generated using
functions from the TFBSTools
and motifmatchr
packages. Position frequency matrices for motifs can be loaded using the
JASPAR packages on
Bioconductor
or the chromVARmotifs package.
For example:
if (!requireNamespace("JASPAR2020", quietly = TRUE)) BiocManager::install("JASPAR2020") if (!requireNamespace("TFBSTools", quietly = TRUE)) BiocManager::install("TFBSTools") if (!requireNamespace("motifmatchr", quietly = TRUE)) BiocManager::install("motifmatchr") if (!requireNamespace("BSgenome.Hsapiens.UCSC.hg19", quietly = TRUE)) BiocManager::install("BSgenome.Hsapiens.UCSC.hg19")
library(JASPAR2020) library(TFBSTools) library(motifmatchr) # Get a list of motif position frequency matrices from the JASPAR database pfm <- getMatrixSet( x = JASPAR2020, opts = list(species = 9606) # 9606 is the species code for human ) # Scan the DNA sequence of each peak for the presence of each motif motif.matrix <- CreateMotifMatrix( features = granges(atac_small), pwm = pfm, genome = 'hg19' ) # Create a new Mofif object to store the results motif <- CreateMotifObject( data = motif.matrix, pwm = pfm )
The show()
method for the Motif
class prints the total number of motifs and
regions included in the object:
motif
Motif
object to the ChromatinAssay
We can add a Motif
object to the ChromatinAssay
, or a Seurat object
containing a ChromatinAssay
using the Motifs<-
assignment operator.
Motifs(atac_small) <- motif
Motif
dataData stored in a Motif
object can be accessed using the GetMotifData()
and
SetMotifData()
functions.
# extract data from the Motif object pfm <- GetMotifData(object = motif, slot = "pwm") # set data in the Motif object motif <- SetMotifData(object = motif, slot = "pwm", new.data = pfm)
We can access the set of motifs and set of features used in the Motif
object
using the colnames()
and rownames()
functions:
# look at the motifs included in the Motif object head(colnames(motif))
# look at the features included in the Motif object head(rownames(motif))
To quickly convert between motif IDs (like MA0497.1
) and motif common names
(like MEF2C), we can use the ConvertMotifID()
function. For example:
# convert ID to common name ids <- c("MA0025.1","MA0030.1","MA0031.1","MA0051.1","MA0056.1","MA0057.1") names <- ConvertMotifID(object = motif, id = ids) names
# convert names to IDs ConvertMotifID(object = motif, name = names)
For a full list of methods for the Motif
class run:
methods(class = 'Motif')
Session Info
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.