reportBadRegionsDetailed: Gives a detailed report on the coverage quality

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/reports.R

Description

The function reportBadRegionsDetailed creates a detailed report containing all regions of interest (basewise), the coverage of each sample at the corresponding positions, the indicator whether the bases were originally targeted, their coverage quality and the corresponding gene (name and geneID).

Usage

1
2
reportBadRegionsDetailed(threshold1, threshold2, percentage1, percentage2, 
                         coverage_indicators, mart, samples, output)

Arguments

threshold1

Integer, threshold defining the number of reads that have to be registered for a sample that its coverage is classified as acceptable.

threshold2

Integer, threshold defining the number of reads that have to be registered for a sample that its coverage is classified as good.

percentage1

Float, defining the percentage of samples that have to feature a coverage of at least threshold1 so that the position is classified as acceptably covered.

percentage2

Float, defining the percentage of samples that have to feature a coverage of at least threshold2 so that the position is classified as well covered.

coverage_indicators

List object, return value of function determineCoverageQuality or determineRegionsOfInterest.

mart

mart as defined in the manual for package 'biomaRt'. If the human genome (hg19) shall be used, an empty string may be provided and the mart is automatically generated.

samples

Data frame object containing the names of the samples to be analyzed (in one column).

output

The folder to write the output files into. If output is just an empty string, no output file is written out.

Details

To gain more detailed information of the coverage quality, a file for every chromosome to be analyzed may be created by the function reportBadRegionsDetailed. The function may either take information on the whole genome (output from determineCoverage with TRonly=FALSE, processed using determineCoverageQuality) as an input, or information on the target regions (output from determineCoverage with TRonly=TRUE, processed using determineCoverageQuality), or information on a selection of regions of interest (output from determineRegionsOfInterest).

Different from the summed-up variant reportBadRegionsSummary, information on every single base of interest gets reported (except for completely uncovered and untargeted regions, which are summed up). For every base its position, the coverage of each sample, information on whether this base was originally targeted (value 1) or not (value 0), the coverage quality and the most likely gene (name and geneID) that was targeted by the original experiment get reported. Information on the gene names and the geneIDs results from biomaRt. If no gene can be found at a position, "NA" is reported for the corresponding base.

The output files are saved as: "BadCoverageChromosome<chromosomename>;threshold1;percentage1;threshold2;percentage2.txt". The output file may be visualized using plotDetailed.

Value

A list is returned. Every component contains the coverage information of one chromosome as a GRanges object. The metadata columns contain information on the concrete coverage of each sample at a specific position. Furthermore, the column 'TargetBases' contains information on whether the considered region or position contains target bases (value 1) or not (value 0). The column 'indicator' contains information on the coverage quality of the corresponding region/position (0: bad region off target; 1: bad region on target; 2: acceptable region off target; 3: acceptable region on target; 4: good region off target; 5: good region on target). Furthermore, the name and the geneID of the gene that is located at the corresponding position is saved.

If a chromosome is not covered and was not targeted as well, the component is "NA".

Author(s)

Sarah Sandmann <sarah.sandmann@uni-muenster.de>

References

More information on the R/Bioconductor package 'biomaRt' may be found at:

http://www.bioconductor.org/packages/release/bioc/html/biomaRt.html

See Also

BadRegionFinder, determineCoverage, determineCoverageQuality, determineRegionsOfInterest, reportBadRegionsSummary, reportBadRegionsGenes, plotSummary, plotDetailed, plotSummaryGenes, determineQuantiles

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
library("BSgenome.Hsapiens.UCSC.hg19")

threshold1 <- 20
threshold2 <- 100
percentage1 <- 0.80
percentage2 <- 0.90
sample_file <- system.file("extdata", "SampleNames.txt", 
                           package = "BadRegionFinder")
samples <- read.table(sample_file)
bam_input <- system.file("extdata", package = "BadRegionFinder")
output <- system.file("extdata", package = "BadRegionFinder")
target_regions <- system.file("extdata", "targetRegions.bed",
                              package = "BadRegionFinder")
targetRegions <- read.table(target_regions, header = FALSE,
                            stringsAsFactors = FALSE)

coverage_summary <- determineCoverage(samples, bam_input, targetRegions, output,
                                      TRonly = TRUE)
coverage_indicators <- determineCoverageQuality(threshold1, threshold2,
                                                percentage1, percentage2,
                                                coverage_summary)
coverage_indicators_temp <- reportBadRegionsDetailed(threshold1, threshold2,
                                                     percentage1, percentage2,
                                                     coverage_indicators, "",
                                                     samples, output)

BadRegionFinder documentation built on Nov. 8, 2020, 5:24 p.m.