getPausingIndices: Calculate pausing indices from user-supplied promoters &...
In BRGenomics: Tools for the Efficient Analysis of High-Resolution Genomics Data

Description Usage Arguments Value Author(s) See Also Examples

Pausing index (PI) is calculated for each gene (within matched promoters.gr and genebodies.gr) as promoter-proximal (or pause region) signal counts divided by genebody signal counts. If length.normalize = TRUE (recommended), the signal counts within each range in promoters.gr and genebodies.gr are divided by their respective range widths (region lengths) before pausing indices are calculated.

getPausingIndices(
  dataset.gr,
  promoters.gr,
  genebodies.gr,
  field = "score",
  length.normalize = TRUE,
  remove.empty = FALSE,
  blacklist = NULL,
  melt = FALSE,
  region_names = NULL,
  expand_ranges = FALSE,
  ncores = getOption("mc.cores", 2L)
)

`dataset.gr`	A GRanges object in which signal is contained in metadata (typically in the "score" field), or a named list of such GRanges objects.
`promoters.gr`	A GRanges object containing promoter-proximal regions of interest.
`genebodies.gr`	A GRanges object containing genebody regions of interest.
`field`	The metadata field of `dataset.gr` to be counted. If `length(field) > 1`, a dataframe is returned containing the pausing indices for each region in each field. If `field` not found in `names(mcols(dataset.gr))`, will default to using all fields found in `dataset.gr`. If `dataset.gr` is a list, a single `field` should be given, or `length(field)` should be the equal to the number of datasets in `dataset.gr`.
`length.normalize`	A logical indicating if signal counts within regions of interest should be length normalized. The default is `TRUE`, which is recommended, especially if input regions don't all have the same width.
`remove.empty`	A logical indicating if genes without any signal in `promoters.gr` should be removed. No genes are filtered by default. If `dataset.gr` is a list of datasets, or if `length(field) > 1`, regions are filtered unless they have promoter signal in all datasets.
`blacklist`	An optional GRanges object containing regions that should be excluded from signal counting. If `length.normalize = TRUE`, blacklisted positions will be excluded from length calculations. Users should take care to note if regions of interest substantially overlap blacklisted positions.
`melt`	If `melt = TRUE`, a dataframe is returned containing a column for regions and another column for pausing indices. If multiple datasets are given (if `dataset.gr` is a list or if `length(field) > 1`), the output dataframe is melted to contain a third column indicating the sample names. (See section on return values below).
`region_names`	If `melt = TRUE`, an optional vector of names for the regions in `regions.gr`. If left as `NULL`, indices of `regions.gr` are used instead.
`expand_ranges`	Logical indicating if ranges in `dataset.gr` should be treated as descriptions of single molecules (`FALSE`), or if ranges should be treated as representing multiple adjacent positions with the same signal (`TRUE`). See `getCountsByRegions`.
`ncores`	Multiple cores will only be used if `dataset.gr` is a list of multiple datasets, or if `length(field) > 1`.

A vector parallel to the input genelist, unless remove.empty = TRUE, in which case the vector may be shorter. If dataset.gr is a list, or if length(field) > 1, a dataframe is returned, containing a column for each field. However, if melt = TRUE, dataframes contain one column to indicate regions (either by their indices, or by region_names, if given), another column to indicate signal, and a third column containing the sample name (unless dataset.gr is a single GRanges object).

Mike DeBerardine

getCountsByRegions

data("PROseq") # load included PROseq data
data("txs_dm6_chr4") # load included transcripts

#--------------------------------------------------#
# Get promoter-proximal and genebody regions
#--------------------------------------------------#

# genebodies from +300 to 300 bp before the poly-A site
gb <- genebodies(txs_dm6_chr4, 300, -300, min.window = 400)

# get the transcripts that are large enough (>1kb in size)
txs <- subset(txs_dm6_chr4, tx_name %in% gb$tx_name)

# for the same transcripts, promoter-proximal region from 0 to +100
pr <- promoters(txs, 0, 100)

#--------------------------------------------------#
# Calculate pausing indices
#--------------------------------------------------#

pidx <- getPausingIndices(PROseq, pr, gb)

length(txs)
length(pidx)
head(pidx)

#--------------------------------------------------#
# Without length normalization
#--------------------------------------------------#

head( getPausingIndices(PROseq, pr, gb, length.normalize = FALSE) )

#--------------------------------------------------#
# Removing empty means the values no longer match the genelist
#--------------------------------------------------#

pidx_signal <- getPausingIndices(PROseq, pr, gb, remove.empty = TRUE)

length(pidx_signal)