In demuellae/ChrAccR: Analyzing chromatin accessibility data in R

knitr::opts_chunk$set(
  collapse = FALSE,
  comment = ">"
)

htmltools::img(src = knitr::image_uri(system.file(file.path("extdata", "chraccr_logo.png"), package="ChrAccR")), 
               alt = 'logo', 
               style = 'position:absolute; top:0; right:0; padding:10px; height:96px;')

Introduction

ChrAccR is an R package that provides tools for the comprehensive analysis chromatin accessibility data. The package implements methods for data quality control, exploratory analyses (including unsupervised methods for dimension reduction, clustering and quantifying transcription factor activities) and the identification and characterization of differentially accessible regions. It can be used for the analysis of large bulk datasets comprising hundreds of samples as well as for single-cell datasets with 10s to 100s of thousands of cells.

Requiring only a limited set of R commands, ChrAccR generates analysis reports that can be interactively explored, facilitate a comprehensive analysis overview of a given dataset and are easily shared with collaborators. The package is therefore particularly useful for users with limited bioinformatic expertise, researchers new to chromatin analysis or institutional core facilities providing ATAC-seq as a service. Additionally, the package provides numerous utility functions for custom R scripting that allow more in-depth analyses of chromatin accessibility datasets.

Overview of ATAC-seq data analysis in ChrAccR

This vignette focuses on the analysis of ATAC-seq data. This data can be directly imported from aligned sequencing reads into a DsATAC object, the main data structure for working with ATAC-seq data. Using a small example dataset of chromatin accessibility in human immune cells, this vignette covers filtering, normalizing and aggregating count data across genomic regions of interest and introduces convenient utility functions for quality control (QC), data exploration and differential analysis. Additionally, ChrAccR facilitates the automatic generation of analysis reports using several preconfigured workflows.

We first provide instructions for running the entire analysis in a ''vanilla'' workflow and then dive a little deeper into the underlying data structures and custom analysis methods that ChrAccR provides for highly individualized analyses.

Installation

To install ChrAccR and its dependencies, use the devtools installation routine:

# install devtools if not previously installed
if (!is.element('devtools', installed.packages()[,"Package"])) install.packages('devtools')

# install dependencies
devtools::install_github("demuellae/muLogR")
devtools::install_github("demuellae/muRtools")
devtools::install_github("demuellae/muReportR")

# install ChrAccR
devtools::install_github("GreenleafLab/ChrAccR")

We also provide complementary packages that contain genome-specific annotations. We highly recommend installing these annotation packages in order to speed up certain steps of the analysis:

# hg38 annotation package
install.packages("https://muellerf.s3.amazonaws.com/data/ChrAccR/data/annotation/ChrAccRAnnotationHg38_0.0.1.tar.gz")

Preparing the workspace

Loading ChrAccR works just like loading any other R package

library(ChrAccR)

Example data

The ChrAccRex data package contains a small example ATAC-seq datasets for ChrAccR. If you haven't already installed ChrAccRex use

devtools::install_github("GreenleafLab/ChrAccRex")

Note that many of the plots created in this vignette use the excellent ggplot2 package. The following code loads the package and sets a different plotting theme:

library(ggplot2)
# use a grid-less theme
theme_set(muRtools::theme_nogrid())

Vanilla analysis

We start by running the entire analysis pipeline on an example dataset. After annotating the samples to be used in the analysis and specifying a couple of analysis parameters, this can essentially be done using just one command (run_atac(...)).

Example dataset

We provide an example dataset of ATAC-seq profiles of a subset of 33 human T cell samples taken from [@Calderon:2019cb]. This dataset contains BAM files as well as a sample annotation table. If you have not already done so, please download and unzip the dataset:

# download and unzip the dataset
datasetUrl <- "https://s3.amazonaws.com/muellerf/data/ChrAccR/data/tutorial/tcells.zip"
downFn <- "tcells.zip"
download.file(datasetUrl, downFn)
unzip(downFn, exdir=".")

Configuring the analysis

A properly curated sample annotation table is essential for the entire analysis. Such an annotation at the minimum should contain sample identifiers, but ideally contains all sorts of available information for each sample, such as phenotypical and technical metadata. This especially includes information relevant for sample gouping and differential analysis. In the example dataset, we provide such an annotation table:

# prepare the sample annotation table
sampleAnnotFn <- file.path("tcells", "samples.tsv")
bamDir <- file.path("tcells", "bam")
sampleAnnot <- read.table(sampleAnnotFn, sep="\t", header=TRUE, stringsAsFactors=FALSE)
# add a column that ChrAccR can use to find the correct bam file for each sample
sampleAnnot[,"bamFilenameFull"] <- file.path(bamDir, sampleAnnot[,"bamFilename"])

Let us now also set a couple of analysis parameters and reporting preferences. These options can be set using the setConfigElement function and can be queried using getConfigElement. A list of all available options is contained in the documentation of setConfigElement (e.g. use ?setConfigElement). Here, we tell ChrAccR which annotation columns to focus on, which color schemes to use for some of these annotation, set filtering thresholds for low-coverage regions, the normalization method to be used and specify what comparisons to use for differential analysis:

setConfigElement("annotationColumns", c("cellType", "stimulus", "donor"))
setConfigElement("colorSchemes", c(
  getConfigElement("colorSchemes"), 
  list(
    "stimulus"=c("U"="royalblue", "S"="tomato"),
    "cellType"=c("TCD8naive"="purple3", "TCD8EM"="purple4", "TeffNaive"="seagreen3", "TeffMem"="seagreen4")
  )
))
setConfigElement("filteringCovgCount", 1L)
setConfigElement("filteringCovgReqSamples", 0.25)
setConfigElement("filteringSexChroms", TRUE)
setConfigElement("normalizationMethod", "quantile")
diffCompNames <- c(
  "TeffNaive_U vs TeffMem_U [sampleGroup]",
  "TCD8naive_U vs TCD8EM_U [sampleGroup]"
)
setConfigElement("differentialCompNames", diffCompNames)

ChrAccR uses sets of predefined genomic regions to summarize count data. By default, this includes 200bp-tiling windows. The following code generates a list of custom region sets that we can optionally use in the analysis. Here, we use 500-bp tiling regions a consensus set of ATAC-seq peaks that were identified in a large cancer dataset [@Corces:2018cp].

# Download an ATAC peak set from a Pan-cancer dataset [doi:10.1126/science.aav1898]
cancerPeaks <- read.table("https://api.gdc.cancer.gov/data/116ebba2-d284-485b-9121-faf73ce0a4ec", sep="\t", header=TRUE, comment.char="", quote="", stringsAsFactors=FALSE)
cancerPeaks <- GenomicRanges::GRanges(cancerPeaks[,"seqnames"], IRanges::IRanges(start=cancerPeaks[,"start"], end=cancerPeaks[,"end"], names=cancerPeaks[,"name"]))
cancerPeaks <- sort(cancerPeaks) # sort
# a list of custom region sets
regionSetList <- list(
  tiling500bp = muRtools::getTilingRegions("hg38", width=500L, onlyMainChrs=TRUE),
  cancerPeaks = cancerPeaks
)

Running the analysis

After these preparatory steps, we can run through the default ChrAccR workflow with a single command using the run_atac function. Supplied with the sample annotation table and a reference to BAM files (or BED files containing fragments), it will automatically run the entire pipeline including importing the dataset, quality control analysis, filtering, count normalization, exploratory analysis and (optionally) differential accessibility analysis.

run_atac("ChrAccR_analysis", "bamFilenameFull", sampleAnnot, genome="hg38", sampleIdCol="sampleId", regionSets=regionSetList)

Overview of `ChrAccR` data structures

ChrAccR implements data structures for the convenient handling of chromatin accessibility data. Its main class is the DsAcc which represents the prototype for more specialized S4 classes for different types of accessibility protocols. Here, we focus on ATAC-seq data and the corresponding data structure is called DsATAC. The example data loaded above is such a DsATAC object.

Each DsATAC object contains

A sample annotation table
Genomic coordinates for one ore more region sets, such as promoters, enhancers, acessibility peaks or genomic tiling windows
Aggregate counts for each genomic region as quantification of accessibility
Optionally, the coordinates of sequencing fragments/reads

For playing around with the ChrAccR datastructures, we use a reduced version of the [@Calderon:2019cb] dataset which focuses on only a subset of the human genome (approximately 1%). It is provided in the ChrAccRex package and can be loaded using:

dsa <- ChrAccRex::loadExample("dsAtac_ia_example")

To get an overview of what information is stored in the example dataset, you can just type the object name:

dsa

Accessing information in `DsATAC` datasets

Here are some utility functions that summarize the samples contained in the dataset:

# get the number of samples
length(dsa)
# get the sample names
head(getSamples(dsa))
# sample annotation table
str(getSampleAnnot(dsa))

To obtain information on the region sets that are contained in the dataset, use:

# the names of the different region types
getRegionTypes(dsa)
# number of regions for a particular region type
getNRegions(dsa, "promoters_gc_protein_coding")
# genomic coordinates of a particular region type
getCoord(dsa, "promoters_gc_protein_coding")

You can obtain a region-by-sample matrix containing Tn5 insertion counts using the getCounts function:

cm <- getCounts(dsa, "promoters_gc_protein_coding")
str(cm)

If the dataset contains fragment information, you can obtain the coordinates of sequencing fragments for each sample:

# number of fragments for the first 10 samples
getFragmentNum(dsa, sampleIds=getSamples(dsa)[1:10])
# start and end coordinates of all fragments for a given sample
getFragmentGr(dsa, "TCD8EM_U_1002")
# all Tn5 insertion sites (i.e. the concatenation of fragment start and end coordinates)
# for a set of samples
getInsertionSites(dsa, getSamples(dsa)[1:3])

Manipulating `DsATAC` datasets

ChrAccR provides a number of functions for filtering a dataset and for adding additional data.

Filtering for samples and genomic regions

For instance, if you want to work with only a subset of the data, you can remove samples and genomic regions from the dataset. The example contains data for resting (unstimulated) and stimulated T cells. The following code removes the stimulated cells and retains only the resting state cells:

isStim <- getSampleAnnot(dsa)[,"stimulus"] == "S"
dsa_resting <- removeSamples(dsa, isStim)
dsa_resting

Sometimes you will want to focus on certain regions. For instance the following code filters the dataset, such that only the most variably accessible peaks are retained

peakCounts <- getCounts(dsa, "IA_prog_peaks")
# identify the 1000 most variable peaks
isVar <- rank(-matrixStats::rowVars(peakCounts)) <= 1000 
# remove the less-variable peaks from the dataset
dsa_varPeaks <- removeRegions(dsa, !isVar, "IA_prog_peaks")
getNRegions(dsa_varPeaks, "IA_prog_peaks")

There are also shortcuts for filtering certain regions from the dataset, for example:

# remove those regions that do not have at least 10 insertions in at least 75% of the samples
dsa_filtered <- filterLowCovg(dsa, thresh=10L, reqSamples=0.75)
# remove regions on chromosome 21
dsa_filtered <- filterChroms(dsa, exclChrom=c("chr21"))

To remove a region type entirely, use removeRegionType:

dsa_not10k <- removeRegionType(dsa, "t10k")
getRegionTypes(dsa_not10k)

Normalizing count data

ChrAccR provides several methods for normalizing and adjusting the region count data. These methods can be accessed through the transformCounts function:

# quantile normalization for all region types
dsa_qnorm <- transformCounts(dsa, method="quantile")
# RPKM normalization of peak counts
dsa_rpkm <- transformCounts(dsa, method="RPKM", regionTypes=c("IA_prog_peaks"))
# log2(RPKM)
dsa_l2rpkm <- transformCounts(dsa_rpkm, method="log2", regionTypes=c("IA_prog_peaks"))
# DESeq2 variance stabilizing transformation (vst)
dsa_vst <- transformCounts(dsa, method="vst", regionTypes=c("IA_prog_peaks"))
# correct for potential batch effects based on the annotated donor
dsa_vst <- transformCounts(dsa_vst, method="batchCorrect", batch=getSampleAnnot(dsa_qnorm)[,"donor"], regionTypes=c("IA_prog_peaks"))

Adding a region type for count data aggregation

Sometimes you want to aggregate count data over a region set after the DsATAC object has been created. Maybe you came up with a better peak set or your collaborator provided you with a highly interesting list of putative enhancer elements. These types of region sets can be easily added to DsATAC objects using the regionAggregation function. It is most useful, if the dataset contains insertion/fragment data. The following example downloads a set of putative regulatory elements defined by the Ensembl Regulatory Build [@Zerbino:2015bx] of BLUEPRINT epigenome data (http://www.blueprint-epigenome.eu/) and aggregates the ATAC-seq insertions within these regions.

# download the file and prepare a GRanges object
regionFileUrl <- "http://medical-epigenomics.org/software/rnbeads/materials//data/regiondb/annotation_hg38_ensembleRegBuildBPall.RData"
downFn <- tempfile(pattern="regions", fileext="RData")
download.file(regionFileUrl, downFn)
regionEnv <- new.env()
load(downFn, envir=regionEnv)
regionGr <- unlist(regionEnv$regions)

# aggregate insertions across the new region type
dsa_erb <- regionAggregation(dsa, regionGr, "ERB", signal="insertions")
dsa_erb

Merging samples

You can also merge samples, e.g. from different technical or biological replicates. In the following example, we merge all biological replicates (originating from different donors) for the same cell type and stimulation conditions. This grouping information is contained in the sampleGroup annotation column. By default, counts will be added up for each aggregated region, but you can specify other aggregation methods such as the mean or median in the countAggrFun argument.

dsa_merged <- mergeSamples(dsa, "sampleGroup", countAggrFun="sum")
getSamples(dsa_merged)

Saving and loading datasets

In many analysis workflow, you want to save an R dataset to disk in order to use it in downstream analysis using custom R scripts. Note that R's workflow of saveRDS readRDS is generally not applicable for saving and loading DsATAC datasets, because ChrAccR optionally uses disk-backed data structures in order to be more memory efficient in the analysis. Instead, please use ChrAccR's functions:

dest <- file.path(tempdir(), "myDsAtacDataset")
saveDsAcc(dsa, dest)

dsa_reloaded <- loadDsAcc(dest)
dsa_reloaded

If you ran the vanilla analysis using ` as described above, you automatically generated and savedDsATACdatasets. They are stored in thedatasubdirectory of your analysis directory (ChrAccR_analysis`):

dsa_vanilla <- loadDsAcc(file.path("ChrAccR_analysis", "data", "dsATAC_processed"))
dsa_vanilla

Importing ATAC-seq data

Importing data from `bam` files

ATAC-seq data can be read from BAM files that contain aligned reads. These reads originate from fragments whose endpoints resemble Tn5 insertion sites. All ChrAccR needs to read the input files is a sample annotation table and pointers to the BAM files. The following code reads the sample annotation table from the supplied text file and adds a column that contains the BAM file locations on your disk for each sample. Additionally, we prepare a list of region sets (genome-wide tiling regions with window sizes of 500bp and 5kb) that we want to aggregate the insertion counts over. Using the dataset downloaded and prepared in the 'Vanilla analysis' section of this vignette, we can easily create a DsATAC dataset using the DsATAC.bam function. Supplied with the sample annotation table and a reference to the BAM files, it will automatically determine Tn5 insertion sites from the BAM files and aggregate counts over the optionally supplied list of region sets (if no such list is provided, 200bp-tiling windows will be used as the default):

dsa_fromBam <- DsATAC.bam(sampleAnnot, "bamFilenameFull", "hg38", regionSets=regionSetList, sampleIdCol="sampleId")

Analysis reports

ChrAccR provides a way to bundle certain types of analyses into workflows that automatically create analysis reports. These HTML-based analysis reports can be viewed in any web browser and provide a convenient overview of the analysis results that can easily shared with collaborators. ChrAccR provides the following workflows:

summary: dataset characterization and quality control
exploratory: exploratory analyses such as dimension reduction, clustering, TF motif enrichment, etc.
differential: differential accessibility analysis between groups of samples

Provided with a DsATAC dataset, the workflows can be started using the createReport_X() functions (where X is the name of the workflow). Workflow options can be specified using the setConfigElement() function for setting ChrAccR's options. Here are some option examples:

# exclude the promoter and 't10k' (10kb tiling windows) region types from the analysis
setConfigElement("regionTypes", setdiff(getRegionTypes(dsa), c("promoters_gc_protein_coding", "t10k")))
# see the current option setting
getConfigElement("regionTypes")
# use the following sample annotation columns for exploratory analyses
setConfigElement("annotationColumns", c("cellType", "donor", "stimulus"))
# add a custom color scheme for the 'stimulus' and 'cellType' annotation columns
setConfigElement("colorSchemes", c(
    getConfigElement("colorSchemes"), 
    list(
        "stimulus"=c("U"="#7fbc41", "S"="#de77ae"),
        "cellType"=c("TCD8naive"="#9ecae1", "TCD8EM"="#08519c", "TeffNaive"="#016c59", "TeffMem"="#014636")
    )
))

To see which options are available use the help function for setConfigElement

?setConfigElement

The createReport_X() take a DsATAC dataset as input and write their results to a report directory. Not that this directory can only contain one report for each analysis workflow and the function will result in an error if the directory already contains a report of that type.

Summary analysis

To get an overview of an ATAC-seq dataset use createReport_summary(). The resulting report will contain a summary of the sample annotation, genomic coverage by sequencing reads and QC metrics such as a summary of the fragment size distribution and enrichment of Tn5 insertions at transcription start sites (TSS enrichment).

reportDir <- file.path(".", "ChrAccR_reports")
# create the report (takes ~10 min on the example dataset)
createReport_summary(dsa, reportDir)

Exploratory analysis

The exploratory analysis reports comprises unsupervised analysis results, such as dimension reduction plots and sample clustering. Additionally, TF motif activity for peak region sets (those region sets that contain 'peak' in their name) is assessed using the chromVAR package [@Schep:2017je].

# create the report (takes ~1 min on the example dataset)
createReport_exploratory(dsa_qnorm, reportDir)

Differential analysis

The differential analysis workflow focuses on the pairwise comparison between groups of samples. The resulting report contains summary tables and plot for each comparison (e.g. volcano plots). Differentially accessible regions are identified using DESeq2 and multiple p-value and rank cutoffs.

# differential analysis option settings
setConfigElement("differentialColumns", c("stimulus", "cellType"))
# adjust for the donor annotation in the differential test
setConfigElement("differentialAdjColumns", c("donor"))
# create the report (takes ~18 min on the example dataset)
createReport_differential(dsa, reportDir)

Utility functions

Principal component plot

To quickly obtain a dimension reduction plot based on a specific set of regions in a DsATAC dataset, you can use the getDimRedCoords.X functions from the muRtools package:

cm <- getCounts(dsa_qnorm, "IA_prog_peaks")
pcaCoord <- muRtools::getDimRedCoords.pca(t(cm))
muRtools::getDimRedPlot(pcaCoord, annot=getSampleAnnot(dsa_qnorm), colorCol="cellType", shapeCol="stimulus")

Plot the fragment size distribution

The fragment size distribution of an ATAC-seq can provide valuable quality control information. Typically, we expect a large number of fragments with sub-nucleosomal length (length < 200bp) and further peaks corresponding to 1,2, ... nucleosomes (roughly multiples of 200bp in length). Here is how you can plot a corresponding density estimation of the fragment length distribution for a specific sample in the dataset.

plotInsertSizeDistribution(dsa, "TCD8EM_U_1002")

TSS enrichment plots

The enrichment of the number of fragments at transcription start sites (TSS) over genomic background is a valuable indicator for the signal-to-noise ratio in ATAC-seq data. In ChrAccR, the TSS enrichment can be computed for each sample in a dataset using the getTssEnrichment function. You will need the coordinates of TSSs in the genome as input and the function will return an object comprising a ggplot object as well as a numeric score that quantifies the enrichment at the promoter (default definition: TSS +/- 2kb) vs a background (default: 100bp outside the promoter window). Absolute and smoothed counts will be used for the computation.

# prepare a GRanges object of TSS coordinates
tssGr <- muRtools::getAnnotGrl.gencode("gencode.v27")[["gene"]]
tssGr <- tssGr[elementMetadata(tssGr)[,"gene_type"]=="protein_coding"]
tssGr <- promoters(tssGr, upstream=0, downstream=1)
tssGr

# compute TSS enrichment
tsse <- getTssEnrichment(dsa, "TCD8EM_U_1002", tssGr)
# enrichment score: number of insertions at the TSS
# over number of insertion in background regions
tsse$tssEnrichment
# plot
tsse$plot

Differential accessibility

daTab <- getDiffAcc(dsa, "IA_prog_peaks", "stimulus", grp1Name="S", grp2Name="U", adjustCols=c("cellType", "donor"))

Transcription factor motif enrichment using `ChromVAR`

Genome-wide aggregates across potential binding sites for transcription factors can be extremely useful in the interpretation of chromatin accessibility. The chromVAR package is designed to quantify the overall TF motif accessibility for each sample in a dataset [@Schep:2017je]. Using the getChromVarDev function, we can obtain these deviation scores from DsATAC datasets:

cvRes <- getChromVarDev(dsa, "IA_prog_peaks", motifs="jaspar")
devZ <- chromVAR::deviations(cvRes)
str(devZ)
# plot a clustered heatmap of the first 20 motifs
pheatmap::pheatmap(devZ[1:20,])

TF motif footprinting

To get an overview of the activity of certain transcription factors, you can generate footprint plots which are aggreagtions of insertion counts in windows surrounding all occurrences of a motif genome-wide. ChrAccR normalizes these counts using kmer frequencies and also plots the background distribution using all insertion sites (not just the ones occurring near motifs). You can compute the data used to generate these plots using the getMotifFootprints function. Note that currently the computation of the kmer biases is quite compute intense. We therefor recommend to limit the analysis to a few samples and motifs.

motifNames <- c("MA1419.1_IRF4", "MA0139.1_CTCF", "MA0037.3_GATA3")
# motifNames <- grep("(IRF4|CTCF|GATA3)$", names(prepareMotifmatchr("hg38", "jaspar")$motifs), value=TRUE) # alternative by searching for patterns
samples <- c("TeffNaive_U_1001", "TeffNaive_U_1002", "TeffMem_U_1001", "TeffMem_U_1002")
fps <- getMotifFootprints(dsa, motifNames, samples)
# the results of the footprinting as data.frame for later plotting
str(fps[["MA0139.1_CTCF"]]$footprintDf)
# the result also contains a plot object, which can directly be drawn
fps[["MA0139.1_CTCF"]]$plot

Exporting to high-level Bioconductor data structures

ChrAccR offers utility functions to convert accessibility data to other commonly used data structures for Bioconductor-based workflows. For instance, to export count data aggregated over a certain region type as a SummarizedExperiments object you can use:

se <- getCountsSE(dsa, "IA_prog_peaks")
se

To export data as a DESeq2 dataset for differential accessibility analysis:

dds <- getDESeq2Dataset(dsa, "IA_prog_peaks", designCols=c("donor", "stimulus", "cellType"))
dds

Session Info

Sys.Date()

sessionInfo()

References

link-citations: yes references: - id: Zerbino:2015bx title: The Ensembl Regulatory Build author: - family: Zerbino given: Daniel R - family: Wilder given: Steven P - family: Johnson given: Nathan - family: Juettemann given: Thomas - family: Flicek given: Paul R container-title: Genome Biology DOI: 10.1186/s13059-015-0621-5 volume: 16 number: 1 page: 56 issued: year: 2015 type: article-journal - id: Schep:2017je title: "chromVAR: inferring transcription-factor-associated accessibility from single-cell epigenomic data" author: - family: Schep given: Alicia N - family: Wu given: Beijing - family: Buenrostro given: Jason D - family: Greenleaf given: William J container-title: Nature methods DOI: 10.1038/nmeth.4401 volume: 14 number: 10 page: 975-978 issued: year: 2017 type: article-journal - id: Calderon:2019cb title: Landscape of stimulation-responsive chromatin across diverse human immune cells author: - family: Calderon - family: Nguyen - family: Mezger - family: Kathiria - family: Müller - family: Nguyen - family: Lescano - family: Wu - family: Trombetta - family: Ribado - family: Knowles - family: Gao - family: Blaeschke - family: Parent - family: Burt - family: Anderson - family: Criswell - family: Greenleaf - family: Marson - family: Pritchard container-title: Nature Genetics DOI: 10.1038/s41588-019-0505-9 volume: 51 number: 10 page: 1494-1505 issued: year: 2019 type: article-journal - id: Corces:2018cp title: The chromatin accessibility landscape of primary human cancers author: - family: Corces - family: Granja - family: Shams - family: Louie - family: Seoane - family: Zhou - family: Silva - family: Groeneveld - family: Wong - family: Cho - family: Satpathy - family: Mumbach - family: Hoadley - family: Robertson - family: Sheffield - family: Felau - family: Castro - family: Berman - family: Staudt - family: Zenklusen - family: Laird - family: Curtis - family: Cancer Genome Atlas Network - family: Greenleaf - family: Chang container-title: Science DOI: 10.1126/science.aav1898 volume: 362 number: 6413 page: eaav1898 issued: year: 2018 type: article-journal

demuellae/ChrAccR documentation built on Jan. 20, 2020, 3:03 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

demuellae/ChrAccR
Analyzing chromatin accessibility data in R

In demuellae/ChrAccR: Analyzing chromatin accessibility data in R

Introduction

Installation

Preparing the workspace

Example data

Vanilla analysis

Example dataset

Configuring the analysis

Running the analysis

Overview of `ChrAccR` data structures

Accessing information in `DsATAC` datasets

Manipulating `DsATAC` datasets

Filtering for samples and genomic regions

Normalizing count data

Adding a region type for count data aggregation

Merging samples

Saving and loading datasets

Importing ATAC-seq data

Importing data from `bam` files

Analysis reports

Summary analysis

Exploratory analysis

Differential analysis

Utility functions

Principal component plot

Plot the fragment size distribution

TSS enrichment plots

Differential accessibility

Transcription factor motif enrichment using `ChromVAR`

TF motif footprinting

Exporting to high-level Bioconductor data structures

Session Info

References

R Package Documentation

Browse R Packages

We want your feedback!

demuellae/ChrAccR Analyzing chromatin accessibility data in R

In demuellae/ChrAccR: Analyzing chromatin accessibility data in R

Introduction

Installation

Preparing the workspace

Example data

Vanilla analysis

Example dataset

Configuring the analysis

Running the analysis

Overview of ChrAccR data structures

Accessing information in DsATAC datasets

Manipulating DsATAC datasets

Filtering for samples and genomic regions

Normalizing count data

Adding a region type for count data aggregation

Merging samples

Saving and loading datasets

Importing ATAC-seq data

Importing data from bam files

Analysis reports

Summary analysis

Exploratory analysis

Differential analysis

Utility functions

Principal component plot

Plot the fragment size distribution

TSS enrichment plots

Differential accessibility

Transcription factor motif enrichment using ChromVAR

TF motif footprinting

Exporting to high-level Bioconductor data structures

Session Info

References

R Package Documentation

Browse R Packages

We want your feedback!

demuellae/ChrAccR
Analyzing chromatin accessibility data in R

Overview of `ChrAccR` data structures

Accessing information in `DsATAC` datasets

Manipulating `DsATAC` datasets

Importing data from `bam` files

Transcription factor motif enrichment using `ChromVAR`