findRegions: Find non-zero regions in a Rle
In lcolladotor/derfinder: Annotation-agnostic differential expression analysis of RNA-seq data at base-pair resolution via the DER Finder approach

findRegions

R Documentation

Find non-zero regions in a Rle

Description

Find genomic regions for which a numeric vector is above (or below) predefined thresholds. In other words, this function finds the candidate Differentially Expressed Regions (candidate DERs). This is similar to regionFinder and is a helper function for calculatePvalues.

Usage

findRegions(
  position = NULL,
  fstats,
  chr,
  oneTable = TRUE,
  maxClusterGap = 300L,
  cutoff = quantile(fstats, 0.99, na.rm = TRUE),
  segmentIR = NULL,
  smooth = FALSE,
  weights = NULL,
  smoothFunction = bumphunter::locfitByCluster,
  ...
)

Arguments

`position`	A logical Rle of genomic positions. This is generated in loadCoverage. Note that it gets updated in preprocessCoverage if `colsubset` is not `NULL`.
`fstats`	A numeric Rle with the F-statistics. Usually obtained using calculateStats.
`chr`	A single element character vector specifying the chromosome name.
`oneTable`	If `TRUE` only one GRanges is returned. Otherwise, a GRangesList with two components is returned: one for the regions with positive values and one for the negative values.
`maxClusterGap`	This determines the maximum gap between candidate DERs. It should be greater than `maxRegionGap` (0 by default).
`cutoff`	Threshold applied to the `fstats` used to determine the regions.
`segmentIR`	An IRanges object with the genomic positions that are potentials DERs. This is used in calculatePvalues to speed up permutation calculations.
`smooth`	Whether to smooth the F-statistics (`fstats`) or not. This is by default `FALSE`. For RNA-seq data we recommend using `FALSE`.
`weights`	Weights used by the smoother as described in smoother.
`smoothFunction`	A function to be used for smoothing the F-statistics. Two functions are provided by the `bumphunter` package: loessByCluster and runmedByCluster. If you are using your own custom function, it has to return a named list with an element called `⁠$fitted⁠` that contains the smoothed F-statistics and an element claled `⁠$smoothed⁠` that is a logical vector indicating whether the F-statistics were smoothed or not. If they are not smoothed, the original values will be used.
`...`	Arguments passed to other methods and/or advanced arguments. Advanced arguments: verbose If `TRUE` basic status updates will be printed along the way. basic If `TRUE` a DataFrame is returned that has only basic information on the candidate DERs. This is used in calculatePvalues to speed up permutation calculations. Default: `FALSE`. maxRegionGap This determines the maximum number of gaps between two genomic positions to be considered part of the same candidate region. The default is 0L. Passed to extendedMapSeqlevels and the internal function `.getSegmentsRle` that has by default `verbose = FALSE`. When `smooth = TRUE`, `...` is passed to the internal function `.smootherFstats`. This internal function has the advanced argument `maxClusterGap` (same as above) and passes `...` to define_cluster and the formal arguments of `smoothFun`.

Details

regionFinder adapted to Rle world.

Value

Either a GRanges or a GRangesList as determined by oneTable. Each of them has the following metadata variables.

value: The mean of the values of y for the given region.
area: The absolute value of the sum of the values of y for the given region.
indexStart: The start position of the region in terms of the index for y.
indexEnd: The end position of the region in terms of the index for y.
cluster: The cluser ID.
clusterL: The total length of the cluster.

Author(s)

Leonardo Collado-Torres

References

Rafael A. Irizarry, Martin Aryee, Hector Corrada Bravo, Kasper D. Hansen and Harris A. Jaffee. bumphunter: Bump Hunter. R package version 1.1.10.

Examples

## Preprocess the data
prep <- preprocessCoverage(genomeData,
    cutoff = 0, scalefac = 32, chunksize = 1e3,
    colsubset = NULL
)

## Get the F statistics
fstats <- genomeFstats

## Find the regions
regs <- findRegions(prep$position, fstats, "chr21", verbose = TRUE)
regs
## Not run: 
## Once you have the regions you can proceed to annotate them
library("bumphunter")
genes <- annotateTranscripts(TxDb.Hsapiens.UCSC.hg19.knownGene::TxDb.Hsapiens.UCSC.hg19.knownGene)
annotation <- matchGenes(regs, genes)
annotation

## End(Not run)

# Find regions with smoothing the F-statistics by bumphunter::runmedByCluster
regs_smooth <- findRegions(prep$position, fstats, "chr21",
    verbose = TRUE,
    smoothFunction = bumphunter::runmedByCluster
)
## Compare against the object regs obtained earlier
regs_smooth

lcolladotor/derfinder documentation built on Dec. 17, 2024, 4:53 p.m.