build_tornado: Acquire tornado plot data

Description Usage Arguments Details Value Examples

View source: R/build_tornado.R

Description

Aggregates coverage data in an array. Coverage can be read in from BigWigFiles or computed from a GRanges object or a TabixFile.

Usage

1
2
3
4
5
6
7
8
9
build_tornado(
  features,
  data,
  width = 4000,
  binwidth = 25,
  nbin = NULL,
  ...,
  pad_value = 0
)

Arguments

features

A GRanges or GRangesList object containing genomic loci of interest.

data

The data source to get coverages. Either a GRanges, GRangesList, TabixFile, BigWigFile, BigWigFileList object or character path to .bw files.

width, binwidth, nbin

An integer(1) in basepairs with respectively a common feature width to centre features in, a size of bins to summarise coverage in or the number of bins to summarise coverage in. Only two of these need to be defined.

...

Arguments used for TabixFile input (see single cell data section in details):

barcode_column

An integer(1) for which column in the TabixFile are barcodes, typically describing cells in single cell assays.

barcode_groups

A (named) list of barcodes wherein every element is a character vector with barcodes that belong to the same groups, typically clusters of cells.

pad_value

A numeric(1) to use for padding when seqlenghts(features) are unknown or features exceed known sequence lengths.

Details

Features

The genomic ranges provided as the features argument get resized to all have widths equal to the (computed) width argument, with fixed centres. Features with a negative strand will have their output flipped in the output, such that stranded features yield 5' -> 3' coverage data. If this is undesired, the features can be unstranded to prevent this flipping. When the features are provided as a GRangesList, every list element is interpreted as part of a different feature set. Internally these get unlisted an their set membership is tracked as the feature_set column in the rowData slot.

Data

Bulk data

If the data is a (set of) bigwig files, the output is constructed through the summary() method for bigwig files. For other types of data, the coverage() is computed at the locations of the features. This basepair resolution coverage is subsequently binned and averaged.

Single cell data

A popular format for storing single cell chromatin data is as a TabixFile. For example, the 10x Genomics 'cellranger' pipeline for single cell ATAC-seq produces a fragments tabix file, wherein the 4th column indicates the barcode of a cell. When the data argument is a tabix file, the barcode_column argument instructs this function where to look for barcodes. The list elements of the barcode_groups argument can then be matched against that column in the tabix file to extract the data for every group of cells. Next, the coverage for every group of cells is calculated, binned and averaged in the same way bulk data is.

Value

A TornadoExperiment object with the following populated slots:

assays

Has an n features × m samples × o bins 3D array with coverage data.

colData

Information that could be derived from the data argument.

rowRanges

Flattened and resized GRanges derived from the features argument

rowData

Has a feature_set column. See the 'Features' details subsection.

binData

A DataFrame with a bin_id and range column.

Examples

1
2
3
4
5
6
7
8
9
# Some very small features and data that works
feats <- dummy_features()
dat   <- dummy_granges_data()

# Make a tornado
tor   <- build_tornado(feats, dat, width = 2000)

# Plotting the tornado
autoplot(tor)

teunbrand/tornadoplot documentation built on Dec. 23, 2021, 8:48 a.m.