Predict coverage for a single-isoform gene

Share:

Description

Predict coverage for a single-isoform gene given fitted bias parameters in a set of models, and compare to the observed fragment coverage.

Usage

1
predictCoverage(gene, bam.files, fitpar, genome, model.names)

Arguments

gene

a GRangesList with the exons of different genes

bam.files

a character string pointing to indexed BAM files

fitpar

the output of running fitBiasModels

genome

a BSgenome object

model.names

a character vector listing the models, see same argument in estimateAbundance

Details

Note that if the range between minsize and maxsize does not cover most of the fragment length distribution, the predicted coverage will underestimate the observed coverage.

Value

a list with elements frag.cov, the observed fragment coverage from the bam.files and pred.cov, a list with the predicted fragment coverage for each of the models.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# these next lines just write out a BAM file from R
# typically you would already have a BAM file
library(alpineData)
library(GenomicAlignments)
library(rtracklayer)
gap <- ERR188088()
dir <- system.file(package="alpineData", "extdata")
bam.file <- c("ERR188088" = file.path(dir,"ERR188088.bam"))
export(gap, con=bam.file)

data(preprocessedData)
library(BSgenome.Hsapiens.NCBI.GRCh38)

model.names <- c("fraglen","fraglen.vlmm","GC","all")

pred.cov <- predictCoverage(gene=ebt.fit[["ENST00000379660"]],
                            bam.files=bam.file,
                            fitpar=fitpar.small,
                            genome=Hsapiens,
                            model.names=model.names)

# plot the coverage:
# note that, because [125,175] bp range specified in fitpar.small
# does not cover the fragment width distribution, the predicted curves
# will underestimate the observed. we correct here post-hoc

frag.cov <- pred.cov[["ERR188088"]][["frag.cov"]]
plot(frag.cov, type="l", lwd=3, ylim=c(0,max(frag.cov)*1.5))
for (i in seq_along(model.names)) {
  m <- model.names[i]
  pred <- pred.cov[["ERR188088"]][["pred.cov"]][[m]]
  lines(pred/mean(pred)*mean(frag.cov), col=i+1, lwd=3)
}
legend("topright", legend=c("observed",model.names),
       col=seq_len(length(model.names)+1), lwd=3)