GetTranscriptomeCoverage: Calculate a coverage score based on read depth and also gene...

Description Usage Arguments Value Author(s) Examples

Description

Calculate a coverage score based on read depth and also gene length (i.e. number of coding nucleotides). See the Value section of this help page to see how the coverage score is calculated.

Usage

1
GetTranscriptomeCoverage(BedGraphRefSeqOverlapsDF, TranscriptLengths)

Arguments

BedGraphRefSeqOverlapsDF

list (1 data frame per file) of overlap scores for a bedgraph coverage file (*.bedgraph.gz) compared to RefSeq defined gene regions.

TranscriptLengths

lengths (i.e. number of coding nucleotides) of each gene defined in RefSeq.

Value

A data.frame containing a coverage score for each gene. The coverage score is calculated as follows:

Coverage Score = sum(width * score) / TranscriptLengths;

where:

score = overlap score per exon as determined by GenomicRanges findOverlaps in GetBedGraphRefSeqOverlaps;

width = width of exon (i.e. number of nucleotides);

TranscriptLengths = lengths (i.e. number of coding nucleotides) of each gene defined in RefSeq;

Author(s)

Nathaniel J. Madrid, Jason Byars

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
refseq <- import.gff("RefSeq_hg19_exons_nodups_021114g1k.gff")
refseq$gene <- sub("_exon_[0-9]+", "", refseq$gene_id)
refseq$width <- width(refseq)
txlength <- tapply(refseq$width, refseq$gene, sum)

# Get file paths of aligned bedgraph files
BedGraphFiles <- dir(path="/home/ubuntu/Projects/RNAseqPackage/bedgraphs", full.names=T)

# coverage calculated w/ bedtools genomecov -bga -split
BedgraphList <- lapply(BedGraphFiles[1:3], function(bg) import.bedGraph(bg))
BedGraphRefSeqOverlapsList <- lapply(BedgraphList, function(bg) GetBedGraphRefSeqOverlaps(bg, refseq))
BedGraphRefSeqOverlapsDFList <- lapply(BedGraphRefSeqOverlapsList, function(bg) as.data.frame(bg@elementMetadata@listData))
for (i in 1:length(BedGraphRefSeqOverlapsDFList)) { BedGraphRefSeqOverlapsDFList[[i]]$width <- BedGraphRefSeqOverlapsList[[i]]@ranges@width }
CoverageDFList <- lapply(BedGraphRefSeqOverlapsDFList, function(bg) GetTranscriptomeCoverage(bg, txlength))
avgReadDepthPerGene <- unlist(lapply(CoverageDFList, function(df) mean(df$score)))
mmdOverTranscriptome <- unlist(lapply(CoverageDFList, function(df) sum(df$TotalCoverage) / sum(df$TranscriptLengths)))

njmadrid/RNAseqQuality documentation built on May 20, 2019, 3:32 p.m.