featureCoverage: Genome read coverage by transcript models
In systemPipeR: systemPipeR: NGS workflow and report generation environment

Description Usage Arguments Value Author(s) See Also Examples

Computes read coverage along single and multi component features based on genomic alignments. The coverage segments of component features are spliced to continuous ranges, such as exons to transcripts or CDSs to ORFs. The results can be obtained with single nucleotide resolution (e.g. around start and stop codons) or as mean coverage of relative bin sizes, such as 100 bins for each feature. The latter allows comparisons of coverage trends among transcripts of variable length. The results can be obtained for single or many features (e.g. any number of transcritpts) at once. Visualization of the coverage results is facilitated by a downstream plotfeatureCoverage function.

1
2
3

featureCoverage(bfl, grl, resizereads = NULL, readlengthrange = NULL, Nbins = 20, 
                method = mean, fixedmatrix, resizefeatures, upstream, downstream, 
                outfile, overwrite = FALSE)

`bfl`	Paths to BAM files provided as `BamFileList` object. The name slot of the BAM files will be used for naming samples in the results.
`grl`	Genomic ranges provided as `GRangesList` typically generated form `txdb` instances with operations like: `cdsBy(txdb, "tx")` or `exonsBy(txdb, "tx")`. Single component features will be processed the same way as multi component features.
`resizereads`	Positive integer defining the length alignments should be resized to prior to the coverage calculation. `NULL` will omit the resizing step.
`readlengthrange`	Positive integer of length 2 determining the read length range to use for the coverage calculation. Reads falling outside of the specified length range will be excluded from the coverage calculation. For instance, `readlengthrange=c(30:40)` will base the coverage calculation on reads between 30 to 40 bps. Assigning `NULL` will skip this filtering step.
`Nbins`	Single positive integer defining the number of segments the coverage of each feature should be binned into in order to obtain coverage summaries of constant length, e.g. for plotting purposes.
`method`	Defines the summary statistics to use for binning. The default is `method=mean`.
`fixedmatrix`	If set to `TRUE`, a coverage matrix with single nucleotide resolution will be returned for any number of transcripts centered around precise anchor points in a genome annotation, such a stop/start codons or transcription start sites. For instance, a matrix with coverage information 20bps upstream and downstream of the stop/start codons can be obtained with `fixedmatrix=TRUE, upstream=20, downstream=20` along with a `grl` instance containing the CDS exon ranges required for this operation, e.g. generated with `cdsBy(txdb, "tx")`.
`resizefeatures`	Needs to be set to `TRUE` when `fixedmatrix=TRUE`. Internally, this will use the `systemPipeR::.resizeFeature` function to extend single and multi component features at their most left and most right end coordinates. The corresponding extension values are specified under the `upstream` and `downstream` arguments.
`upstream`	Single positive integer specifying the upstream extension length relative to the orientation of each feature in the genome. More details are given above.
`downstream`	Single positive integer specifying the downstream extension length relative to the orientation of each feature in the genome. More details are given above.
`outfile`	Default `NULL` omits writing of the results to a file. If a file name is specified then the results are written to a tabular file. If `bfl` contains the paths to several BAM files then the results will be appended to the same file where the first column specifies the sample labels. Redirecting the results to file is particularly useful when processing large files of many sample where computation times can be significant.
`overwrite`	If set to `TRUE` any existing file assigned to `outfile` will be overwritten.

The function allows to return the following four distinct outputs. The settings to return these instances are illustrated below in the example section.

`(A)`	`data.frame` containing binned coverage where rows are features and columns coverage bins. The first four columns contain (i) the sample names, (ii) the number of total aligned reads in the corresponding BAM files (useful for normalization), (iii) the feature IDs, (iv) strand of the coverage. All following columns are numeric and contain the actual coverage data for the sense and antisense strand of each feature.
`(B)`	`data.frame` containing coverage with single nucleotide resolution around anchor points such as start and stop codons. The two matrix components are appended column-wise. To clearly distinguish the two data components, they are separated by a specialty column containing pipe characters. The first four columns are the same as described under (A). The column title for the anchor point is 0. For instance, if the features are CDSs then the first 0 corresponds to the first nucleotide of the start codon and the second 0 to the last nucleotide of the stop codon. Upstream and downstream positions are indicated by negative and positive column numbers, respectively.
`(C)`	`data.frame` containing combined results of (A) and (B) where the first set of columns contains to the coverage around the start codons, the second one the binned coverage of the CDSs and the third one the coverage around the stop codons separated by the same pipe columns mentioned under (B).
`(D)`	`Rle` list containing the nucleotide level coverage of each feature

Thomas Girke

plotfeatureCoverage

## Construct SYSargs object from param and targets files 
param <- system.file("extdata", "tophat.param", package="systemPipeR")
targets <- system.file("extdata", "targets.txt", package="systemPipeR")
args <- systemArgs(sysma=param, mytargets=targets)

## Not run: 
## Features from sample data of systemPipeRdata package
library(GenomicFeatures)
file <- system.file("extdata/annotation", "tair10.gff", package="systemPipeRdata")
txdb <- makeTxDbFromGFF(file=file, format="gff3", organism="Arabidopsis")

## (A) Generate binned coverage for two BAM files and 4 transcripts
grl <- cdsBy(txdb, "tx", use.names=TRUE)
fcov <- featureCoverage(bfl=BamFileList(outpaths(args)[1:2]), grl=grl[1:4], resizereads=NULL, 
                    readlengthrange=NULL, Nbins=20, method=mean, fixedmatrix=FALSE, 
                    resizefeatures=TRUE, upstream=20, downstream=20)
plotfeatureCoverage(covMA=fcov, method=mean, scales="fixed", scale_count_val=10^6)

## (B) Coverage matrix upstream and downstream of start/stop codons
fcov <- featureCoverage(bfl=BamFileList(outpaths(args)[1:2]), grl=grl[1:4], resizereads=NULL, 
                    readlengthrange=NULL, Nbins=NULL, method=mean, fixedmatrix=TRUE, 
                    resizefeatures=TRUE, upstream=20, downstream=20)
plotfeatureCoverage(covMA=fcov, method=mean, scales="fixed", scale_count_val=10^6)

## (C) Combined matrix for both binned and start/stop codon 
fcov <- featureCoverage(bfl=BamFileList(outpaths(args)[1:2]), grl=grl[1:4], resizereads=NULL, 
                    readlengthrange=NULL, Nbins=20, method=mean, fixedmatrix=TRUE, 
                    resizefeatures=TRUE, upstream=20, downstream=20)
plotfeatureCoverage(covMA=fcov, method=mean, scales="fixed", scale_count_val=10^6)

## (D) Rle coverage objects one for each query feature
fcov <- featureCoverage(bfl=BamFileList(outpaths(args)[1:2]), grl=grl[1:4], resizereads=NULL, 
                    readlengthrange=NULL, Nbins=NULL, method=mean, fixedmatrix=FALSE, 
                    resizefeatures=TRUE, upstream=20, downstream=20)


## End(Not run)