View source: R/signal_counting.R
getCountsByRegions | R Documentation |
Get the sum of the signal in dataset.gr
that overlaps each range in
regions.gr
. If expand_regions = FALSE
,
getCountsByRegions
is written to calculate readcounts
overlapping each region, while expand_regions = TRUE
will calculate
"coverage signal" (see details below).
getCountsByRegions(
dataset.gr,
regions.gr,
field = "score",
NF = NULL,
blacklist = NULL,
melt = FALSE,
region_names = NULL,
expand_ranges = FALSE,
ncores = getOption("mc.cores", 2L)
)
dataset.gr |
A GRanges object in which signal is contained in metadata (typically in the "score" field), or a named list of such GRanges objects. If a list is given, a dataframe is returned containing the counts in each region for each dataset. |
regions.gr |
A GRanges object containing regions of interest. |
field |
The metadata field of |
NF |
An optional normalization factor by which to multiply the counts.
If given, |
blacklist |
An optional GRanges object containing regions that should be excluded from signal counting. |
melt |
If |
region_names |
If |
expand_ranges |
Logical indicating if ranges in |
ncores |
Multiple cores will only be used if |
An atomic vector the same length as regions.gr
containing the
sum of the signal overlapping each range of regions.gr
. If
dataset.gr
is a list of multiple GRanges, or if length(field)
> 1
, a dataframe is returned. If melt = FALSE
(the default),
dataframes have a column for each dataset and a row for each region. If
melt = TRUE
, dataframes contain one column to indicate regions
(either by their indices, or by region_names
, if given), another
column to indicate signal, and a third column containing the sample name
(unless dataset.gr
is a single GRanges object).
expand_ranges = FALSE
In this configuration,
getCountsByRegions
is designed to work with data in which each range
represents one type of molecule, whether it's a single base (e.g. the 5'
ends, 3' ends, or centers of reads) or entire reads (i.e. paired 5' and 3'
ends of reads).
This is in contrast to standard run-length compressed GRanges object, as
imported using rtracklayer::import.bw
,
in which a single range can represent multiple contiguous positions that
share the same signal information.
As an example, a range of covering 10 bp with a score of 2 is treated as 2 reads (each spanning the same 10 bases), not 20 reads.
expand_ranges = TRUE
In this configuration, this function
assumes that ranges in dataset.gr
that cover multiple bases are
compressed representations of multiple adjacent positions that contain the
same signal. This type of representation is typical of "coverage" objects,
including bedGraphs and bigWigs generated by many command line utilities,
but not bigWigs as they are imported by
BRGenomics::import_bigWig
.
As an example, a range covering 10 bp with a score of 2 is treated as representing 20 signal counts, i.e. there are 10 adjacent positions that each contain a signal of 2.
If the data truly represents basepair-resolution coverage, the "coverage signal" is equivalent to readcounts. However, users should consider how they interpret results from whole-read coverage, as the "coverage signal" is determined by both the read counts as well as read lengths.
Mike DeBerardine
getCountsByPositions
data("PROseq") # load included PROseq data
data("txs_dm6_chr4") # load included transcripts
counts <- getCountsByRegions(PROseq, txs_dm6_chr4)
length(txs_dm6_chr4)
length(counts)
head(counts)
# Assign as metadata to the transcript GRanges
txs_dm6_chr4$PROseq <- counts
txs_dm6_chr4[1:6]
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.