coverageToExon: Extract coverage information for exons
In lcolladotor/derfinder: Annotation-agnostic differential expression analysis of RNA-seq data at base-pair resolution via the DER Finder approach

coverageToExon

R Documentation

Extract coverage information for exons

Description

This function extracts the coverage information calculated by fullCoverage for a set of exons determined by makeGenomicState. The underlying code is similar to getRegionCoverage with additional tweaks for calculating RPKM values.

Usage

coverageToExon(
  fullCov = NULL,
  genomicState,
  L = NULL,
  returnType = "raw",
  files = NULL,
  ...
)

Arguments

`fullCov`	A list where each element is the result from loadCoverage used with `returnCoverage = TRUE`. Can be generated using fullCoverage. Alternatively, specify `files` to extract the coverage information from the regions of interest. This can be helpful if you do not wish to store `fullCov` for memory reasons.
`genomicState`	A GRanges object created with makeGenomicState. It can be either the `genomicState$fullGenome` or `genomicState$codingGenome` component.
`L`	The width of the reads used. Either a vector of length 1 or length equal to the number of samples.
`returnType`	If `raw`, then the raw coverage information per exon is returned. If `rpkm`, RPKM values are calculated for each exon.
`files`	A character vector with the full path to the sample BAM files (or BigWig files). The names are used for the column names of the DataFrame. Check rawFiles for constructing `files`. `files` can also be a `BamFileList` object created with BamFileList or a `BigWigFileList` object created with BigWigFileList.
`...`	Arguments passed to other methods and/or advanced arguments. Advanced arguments: verbose If `TRUE` basic status updates will be printed along the way. BPPARAM.strandStep A BPPARAM object to use for the strand step. If not specified, then `strandCores` specifies the number of cores to use for the strand step. The actual number of cores used is the minimum of `strandCores`, `mc.cores` and the number of strands in the data. BPPARAM.chrStep A BPPRAM object to use for the chr step. If not specified, then `mc.cores` specifies the number of cores to use for the chr step. The actual number of cores used is the minimum of `mc.cores` and the number of samples. Passed to extendedMapSeqlevels and define_cluster.

Details

Parallelization is used twice. First, it is used by strand. Second, for processing the exons by chromosome. So there is no gain in using mc.cores greater than the maximum of the number of strands and number of chromosomes.

If fullCov is NULL and files is specified, this function will attempt to read the coverage from the files. Note that if you used 'totalMapped' and 'targetSize' before, you will have to specify them again to get the same results.

Value

A matrix (nrow = number of exons in genomicState corresponding to the chromosomes in fullCov, ncol = number of samples) with the number of reads (or RPKM) per exon. The row names correspond to the row indexes of genomicState$fullGenome (if fullOrCoding='full') or genomicState$codingGenome (if fullOrCoding='coding').

Author(s)

Andrew Jaffe, Leonardo Collado-Torres

Examples

## Obtain fullCov object
fullCov <- list("21" = genomeDataRaw$coverage)

## Use only the first two exons
smallGenomicState <- genomicState
smallGenomicState$fullGenome <- smallGenomicState$fullGenome[
    which(smallGenomicState$fullGenome$theRegion == "exon")[1:2]
]

## Finally, get the coverage information for each exon
exonCov <- coverageToExon(
    fullCov = fullCov,
    genomicState = smallGenomicState$fullGenome, L = 36
)

lcolladotor/derfinder documentation built on Dec. 17, 2024, 4:53 p.m.