The Signac package is an extension of Seurat designed for the analysis of genomic single-cell assays. This includes any assay that generates signal mapped to genomic coordinates, such as scATAC-seq, scCUT&Tag, scACT-seq, and other methods.

As the analysis of these single-cell chromatin datasets presents some unique challenges in comparison to the analysis of scRNA-seq data, we have created an extended Assay class to store the additional information needed, including:

A major advantage of the Signac design is its interoperability with existing functions in the Seurat package, and other packages that are able to use the Seurat object. This enables straightforward analysis of multimodal single-cell data through the addition of different assays to the Seurat object.

Here we outline the design of each class defined in the Signac package, and demonstrate methods that can be run on each class.

The ChromatinAssay Class

The ChromatinAssay class extends the standard Seurat Assay class and adds several additional slots for data useful for the analysis of single-cell chromatin datasets. The class includes all the slots present in a standard Seurat Assay, with the following additional slots:

library(Seurat)
library(Signac)

Constructing the ChromatinAssay

A ChromatinAssay object can be constructed using the CreateChromatinAssay() function.

# get some data to use in the following examples
counts <- LayerData(atac_small, layer = "counts")

Note that older versions of Seurat (<5) use the GetAssayData() function instead of LayerData().

# create a standalone ChromatinAssay object
chromatinassay <- CreateChromatinAssay(counts = counts, genome = "hg19")

Here the genome parameter can be used to set the seqinfo slot. We can pass the name of a genome present in UCSC (e.g., "hg19" or "mm10"), or we can pass a Seqinfo-class object.

To create a Seurat object that contains a ChromatinAssay rather than a standard Assay, we can initialize the object using the ChromatinAssay rather than a count matrix. Note that this feature was added in Seurat 3.2.

# create a Seurat object containing a ChromatinAssay
object <- CreateSeuratObject(counts = chromatinassay)

Adding a ChromatinAssay to a Seurat object

To add a new ChromatinAssay object to an existing Seurat object, we can use the standard assignment operation used for adding standard Assay objects and other data types to the Seurat object.

# create a chromatin assay and add it to an existing Seurat object
object[["peaks"]] <- CreateChromatinAssay(counts = counts, genome = "hg19")

Getting and setting ChromatinAssay data

We can get/set data for the ChromatinAssay in much the same way we do for a standard Assay object: using the LayerData function defined in Seurat. For example:

## Getting

# access the data slot, found in standard Assays and ChromatinAssays
data <- LayerData(atac_small, layer = "data")

# access the bias slot, unique to the ChromatinAssay
bias <- LayerData(atac_small, layer = "bias")

## Setting

# set the data slot
LayerData(atac_small, layer = "data") <- data
# atac_small <- SetAssayData(atac_small, slot = "data", new.data = data)

# set the bias slot
bias <- rep(1, 100)  # create a dummy bias vector
LayerData(atac_small, layer = "bias") <- bias
# atac_small <- SetAssayData(atac_small, slot = "bias", new.data = bias)

We also have a variety of convenience functions defined for getting/setting data in specific slots. This includes the Fragments(), Motifs(), Links(), and Annotation() functions. For example, to get or set gene annotation data we can use the Annotation() getter and Annotation<- setter functions:

# first get some gene annotations for hg19
library(EnsDb.Hsapiens.v75)

# convert EnsDb to GRanges
gene.ranges <- GetGRangesFromEnsDb(ensdb = EnsDb.Hsapiens.v75)

# convert to UCSC style
seqlevels(gene.ranges) <- paste0('chr', seqlevels(gene.ranges))
genome(gene.ranges) <- "hg19"

# set gene annotations
Annotation(atac_small) <- gene.ranges

# get gene annotation information
Annotation(atac_small)

The Fragments(), Motifs(), and Links() functions are demonstrated in other sections below.

Other ChromatinAssay methods

As the ChromatinAssay object uses Bioconductor objects like GRanges and Seqinfo , we can also call standard Bioconductor functions defined in the IRanges, GenomicRanges, and GenomeInfoDb packages on the ChromatinAssay object (or a Seurat object with a ChromatinAssay as the default assay).

The following methods use the genomic ranges stored in a ChromatinAssay object.

# extract the genomic ranges associated with each feature in the data matrix
granges(atac_small)

# find the nearest range
nearest(atac_small, subject = Annotation(atac_small))

# distance to the nearest range
distanceToNearest(atac_small, subject = Annotation(atac_small))

# find overlaps with another set of genomic ranges
findOverlaps(atac_small, subject = Annotation(atac_small))

Many other methods are defined, see the documentation for nearest-methods, findOverlaps-methods, inter-range-methods, and coverage in Signac for a full list.

The following methods use the seqinfo data stored in a ChromatinAssay object.

# get the full seqinfo information
seqinfo(atac_small)

# get the genome information
genome(atac_small)

# find length of each chromosome
seqlengths(atac_small)

# find name of each chromosome
seqnames(atac_small)

# assign a new genome
genome(atac_small) <- "hg19"

Again, several other methods are available that are not listed here. See the documentation for seqinfo-methods in Signac for a full list.

For a full list of methods for the ChromatinAssay class run:

methods(class = 'ChromatinAssay')

Subsetting a ChromatinAssay

We can use the standard subset() function or the [ operator to subset Seurat object containing ChromatinAssays. This works the same way as for standard Assay objects.

# subset using the subset() function
# this is meant for interactive use
subset.obj <- subset(atac_small, subset = nCount_peaks > 100)

# subset using the [ extract operator
# this can be used programmatically
subset.obj <- atac_small[, atac_small$nCount_peaks > 100]

Converting between Assay and ChromatinAssay

To convert from a ChromatinAssay to a standard Assay use the as() function

# convert a ChromatinAssay to an Assay
assay <- as(object = atac_small[["peaks"]], Class = "Assay")
assay

To convert from a standard Assay to a ChromatinAssay we use the as.ChromatinAssay() function. This takes a standard assay object, as well as information to fill the additional slots in the ChromatinAssay class.

# convert an Assay to a ChromatinAssay
chromatinassay <- as.ChromatinAssay(assay, seqinfo = "hg19")
chromatinassay

The Fragment Class

The Fragment class is designed for storing and interacting with a fragment file commonly used for single-cell chromatin data. It contains the path to an indexed fragment file on disk, a MD5 hash for the fragment file and the fragment file index, and a vector of cell names contained in the fragment file. Importantly, this is a named vector where the elements of the vector are the cell names as they appear in the fragment file, and the name of each element is the cell name as it appears in the ChromatinAssay object storing the Fragment object. This allows a mapping of cell names on disk to cell names in R, and avoids the need to alter fragment files on disk. This path can also be a remote file accessible by http or ftp.

Constructing the Fragment class

A Fragment object can be constructed using the CreateFragmentObject() function.

frag.path <- system.file("extdata", "fragments.tsv.gz", package="Signac")
fragments <- CreateFragmentObject(
  path = frag.path,
  cells = colnames(atac_small), 
  validate.fragments = TRUE
)

The validate.fragments parameter controls whether the file is inspected to check whether the expected cell names are present. This can help avoid assigning the wrong fragment file to the object. If you're sure that the file is correct, you can set this value to FALSE to skip this step and save some time. This check is typically only run once when the Fragment object is created, and is not normally run on existing Fragment files.

Inspecting the fragment file

To extract the first few lines of a fragment file on-disk, we can use the head() method defined for Fragment objects. This is useful for quickly checking the chromosome naming style in our fragment file, or checking how the cell barcodes are named:

head(fragments)

Adding a Fragment object to the ChromatinAssay

A ChromatinAssay object can contain a list of Fragment objects. This avoids the need to merge fragment files on disk and simplifies processes of merging or integrating different Seurat objects containing ChromatinAssays. To add a new Fragment object to a ChromatinAssay, or a Seurat object containing a ChromatinAssay, we can use the Fragments<- assignment function. This will do a few things:

  1. Re-compute the MD5 hash for the fragment file and index and verify that it matches the hash computed when the Fragment object was created.
  2. Check that none of the cells contained in the Fragment object being added are already contained in another Fragment object stored in the ChromatinAssay. All fragments from a cell must be present in only one fragment file.
  3. Append the Fragment object to the list of Fragment objects stored in the ChromatinAssay.
Fragments(atac_small) <- fragments

The show() method for Fragment-class objects prints the number of cells that the Fragment object contains data for.

fragments

Alternatively, we can initialize the ChromatinAssay with a Fragment object in a couple of ways. We can either pass a vector of Fragment objects to the fragments parameter in CreateChromatinAssay(), or pass the path to a single fragment file. If we pass the path to a fragment file we assume that the file contains fragments for all cells in the ChromatinAssay and that the cell names are the same in the fragment file on disk and in the ChromatinAssay. For example:

chrom_assay <- CreateChromatinAssay(
  counts = counts,
  genome = "hg19",
  fragments = frag.path
)
object <- CreateSeuratObject(
  counts = chrom_assay,
  assay = "peaks"
)

This will create a Seurat object containing a ChromatinAssay, with a single Fragment object.

Removing a Fragment object from the ChromatinAssay

All the Fragment objects associated with a ChromatinAssay can be removed by assigning NULL using the Fragment<- assignment function. For example:

Fragments(chrom_assay) <- NULL
Fragments(chrom_assay)

To remove a subset of Fragment object from the list of Fragment objects stored in the ChromatinAssay, you will need to extract the list of Fragment objects using the Fragments() function, subset the list of objects, then assign the subsetted list to the assay using the LayerData() function. For example:

LayerData(chrom_assay, layer = "fragments") <- fragments
# chrom_assay <- SetAssayData(chrom_assay, slot = "fragments", new.data = fragments)
Fragments(chrom_assay)

Changing the fragment file path in an existing Fragment object

The path to the fragment file can be updated using the UpdatePath() function. This can be useful if you move the fragment file to a new directory, or if you copy a stored Seurat object containing a ChromatinAssay to a different server.

fragments <- UpdatePath(fragments, new.path = frag.path)

To change the path to fragment files in an object, you will need to remove the fragment objects, update the paths, and then add the fragment objects back to the object. For example:

frags <- Fragments(object)  # get list of fragment objects
Fragments(object) <- NULL  # remove fragment information from assay

# create a vector with all the new paths, in the correct order for your list of fragment objects
# In this case we only have 1
new.paths <- list(frag.path)
for (i in seq_along(frags)) {
  frags[[i]] <- UpdatePath(frags[[i]], new.path = new.paths[[i]]) # update path
}

Fragments(object) <- frags # assign updated list back to the object
Fragments(object)

Using remote fragment files

Fragment files hosted on remote servers accessible via http or ftp can also be added to the ChromatinAssay in the same way as for locally-hosted fragment files. This can enable the exploration of large single-cell datasets without the need for downloading large files. For example, we can create a Fragment object using a file hosted on the 10x Genomics website:

fragments <- CreateFragmentObject(
  path = "http://cf.10xgenomics.com/samples/cell-atac/1.1.0/atac_v1_pbmc_10k/atac_v1_pbmc_10k_fragments.tsv.gz"
)
fragments

When files are hosted remotely, the checks described in the section above (MD5 hash and expected cells) are not performed.

Getting and setting Fragment data

To access the cell names stored in a Fragment object, we can use the Cells() function. Importantly, this returns the cell names as they appear in the ChromatinAssay, rather than as they appear in the fragment file itself.

fragments <- CreateFragmentObject(
  path = frag.path,
  cells = colnames(atac_small), 
  validate.fragments = TRUE
)

cells <- Cells(fragments)
head(cells)

Similarly, we can set the cell name information in a Fragment object using the Cells<- assignment function. This will set the named vector of cells stored in the Fragment object. Here we must supply a named vector.

names(cells) <- cells
Cells(fragments) <- cells

To extract any of the data stored in a Fragment object we can also use the GetFragmentData() function. For example, we can find the path to the fragment file on disk:

GetFragmentData(object = fragments, slot = "path")

For a full list of methods for the Fragment class run:

methods(class = 'Fragment')

The Motif Class

The Motif class stores information needed for DNA sequence motif analysis, and has the following slots:

Many of these slots are optional and do not need to be filled, but are only required when running certain functions. For example, the positions slot will be needed if running TF footprinting.

Constructing the Motif class

A Motif object can be constructed using the CreateMotifObject() function. Much of the data needed for constructing a Motif object can be generated using functions from the TFBSTools and motifmatchr packages. Position frequency matrices for motifs can be loaded using the JASPAR packages on Bioconductor or the chromVARmotifs package. For example:

if (!requireNamespace("JASPAR2020", quietly = TRUE))
    BiocManager::install("JASPAR2020")
if (!requireNamespace("TFBSTools", quietly = TRUE))
    BiocManager::install("TFBSTools")
if (!requireNamespace("motifmatchr", quietly = TRUE))
    BiocManager::install("motifmatchr")
if (!requireNamespace("BSgenome.Hsapiens.UCSC.hg19", quietly = TRUE))
    BiocManager::install("BSgenome.Hsapiens.UCSC.hg19")
library(JASPAR2020)
library(TFBSTools)
library(motifmatchr)

# Get a list of motif position frequency matrices from the JASPAR database
pfm <- getMatrixSet(
  x = JASPAR2020,
  opts = list(species = 9606) # 9606 is the species code for human
)

# Scan the DNA sequence of each peak for the presence of each motif
motif.matrix <- CreateMotifMatrix(
  features = granges(atac_small),
  pwm = pfm,
  genome = 'hg19'
)

# Create a new Mofif object to store the results
motif <- CreateMotifObject(
  data = motif.matrix,
  pwm = pfm
)

The show() method for the Motif class prints the total number of motifs and regions included in the object:

motif

Adding a Motif object to the ChromatinAssay

We can add a Motif object to the ChromatinAssay, or a Seurat object containing a ChromatinAssay using the Motifs<- assignment operator.

Motifs(atac_small) <- motif

Getting and setting Motif data

Data stored in a Motif object can be accessed using the GetMotifData() and SetMotifData() functions.

# extract data from the Motif object
pfm <- GetMotifData(object = motif, slot = "pwm")

# set data in the Motif object
motif <- SetMotifData(object = motif, slot = "pwm", new.data = pfm)

We can access the set of motifs and set of features used in the Motif object using the colnames() and rownames() functions:

# look at the motifs included in the Motif object
head(colnames(motif))
# look at the features included in the Motif object
head(rownames(motif))

To quickly convert between motif IDs (like MA0497.1) and motif common names (like MEF2C), we can use the ConvertMotifID() function. For example:

# convert ID to common name
ids <- c("MA0025.1","MA0030.1","MA0031.1","MA0051.1","MA0056.1","MA0057.1")
names <- ConvertMotifID(object = motif, id = ids)
names
# convert names to IDs
ConvertMotifID(object = motif, name = names)

For a full list of methods for the Motif class run:

methods(class = 'Motif')

Session Info

sessionInfo()



timoast/signac documentation built on Aug. 23, 2024, 1:48 a.m.