fitBiasModels: Fit bias models over single-isoform genes
In mikelove/alpine: alpine

fitBiasModels

R Documentation

Fit bias models over single-isoform genes

Description

This function estimates parameters for one or more bias models for a single sample over a set of single-isoform genes. ~100 medium to highly expressed genes should be sufficient to estimate the parameters robustly.

Usage

fitBiasModels(genes, bam.file, fragtypes, genome, models, readlength, minsize,
  maxsize, speedglm = TRUE, gc.knots = seq(from = 0.4, to = 0.6, length =
  3), gc.bk = c(0, 1), relpos.knots = seq(from = 0.25, to = 0.75, length =
  3), relpos.bk = c(0, 1))

Arguments

`genes`	a GRangesList with the exons of different single-isoform genes
`bam.file`	a character string pointing to an indexed BAM file
`fragtypes`	the output of buildFragtypes. must contain the potential fragment types for the genes named in `genes`
`genome`	a BSgenome object
`models`	a list of lists: the outer list describes multiple models. each element of the inner list has two elements: `formula` and `offset`. `formula` should be a character strings of an R formula describing the bias models, e.g. `"count ~ ns(gc) + gene"`. The end of the string is required to be `"+ gene"`. `offset` should be a character vector listing possible bias offsets to be used (`"fraglen"` or `"vlmm"`). Either `offset` or `formula` can be NULL for a model. See vignette for recommendations and details.
`readlength`	the read length
`minsize`	the minimum fragment length to model
`maxsize`	the maximum fragment length to model
`speedglm`	logical, whether to use speedglm to estimate the coefficients. Default is TRUE.
`gc.knots`	knots for the GC splines
`gc.bk`	boundary knots for the GC splines
`relpos.knots`	knots for the relative position splines
`relpos.bk`	boundary knots for the relative position splines

Value

a list with elements: coefs, summary, models, model.params, and optional offets: fraglen.density, vlmm.fivep, and vlmm.threep.

coefs gives the estimated coefficients for the different models that specified formula.
summary gives the tables with coefficients, standard errors and p-values,
models stores the incoming models list,
model.params stores parameters for the models, such as knot locations
fraglen.density is a estimated density object for the fragment length distribution,
vlmm.fivep and vlmm.threep store the observed and expected tabulations for the different orders of the VLMM for read start bias.

References

The complete bias model including fragment sequence bias is described in detail in the Supplemental Note of the following publication:

Love, M.I., Hogenesch, J.B., and Irizarry, R.A., Modeling of RNA-seq fragment sequence bias reduces systematic errors in transcript abundance estimation. Nature Biotechnologyh (2016) doi: 10.1038/nbt.3682

The read start variable length Markov model (VLMM) for addressing bias introduced by random hexamer priming was introduced in the following publication (the sequence bias model used in Cufflinks):

Roberts, A., Trapnell, C., Donaghey, J., Rinn, J.L., and Pachter, L., Improving RNA-Seq expression estimates by correcting for fragment bias. Genome Biology (2011) doi: 10.1186/gb-2011-12-3-r22

Examples


# see vignette for a more realistic example

# these next lines just write out a BAM file from R
# typically you would already have a BAM file
library(alpineData)
library(GenomicAlignments)
library(rtracklayer)
gap <- ERR188088()
dir <- system.file(package="alpineData", "extdata")
bam.file <- c("ERR188088" = file.path(dir,"ERR188088.bam"))
export(gap, con=bam.file)

library(GenomicRanges)
library(BSgenome.Hsapiens.NCBI.GRCh38)
data(preprocessedData)

readlength <- 75
minsize <- 125 # see vignette how to choose
maxsize <- 175 # see vignette how to choose

# here a very small subset, should be ~100 genes
gene.names <- names(ebt.fit)[6:8] 
names(gene.names) <- gene.names
fragtypes <- lapply(gene.names, function(gene.name) {
                      buildFragtypes(ebt.fit[[gene.name]],
                                     Hsapiens, readlength,
                                     minsize, maxsize)
})
models <- list(
  "GC" = list(formula = "count ~ ns(gc,knots=gc.knots, Boundary.knots=gc.bk) + gene",
              offset=c("fraglen","vlmm"))
)

fitpar <- fitBiasModels(genes=ebt.fit[gene.names],
                        bam.file=bam.file,
                        fragtypes=fragtypes,
                        genome=Hsapiens,
                        models=models,
                        readlength=readlength,
                        minsize=minsize,
                        maxsize=maxsize)

mikelove/alpine documentation built on June 9, 2024, 11:37 a.m.