polymeraseWave: Given GRO-seq data, identifies the location of the polymerase...

Description Usage Arguments Details Value Author(s) Examples

Description

The model is a three state hidden Markov model (HMM). States represent: (1) the 5' end of genes upstream of the transcription start site, (2) up-regulated sequence, and (3) the 3' end of the gene through the polyadenylation site.

Usage

1
2
3
4
polymeraseWave(reads1, reads2, genes, approxDist, size = 50,
  upstreamDist = 10000, TSmooth = NA, emissionDistAssumption = "gamma",
  filterWindowSize = 10000, progress = TRUE, within = TRUE,
  BPPARAM = BiocParallel::bpparam())

Arguments

reads1

Mapped reads in time point 1.

reads2

Mapped reads in time point 2.

genes

A set of genes in which to search for the wave.

approxDist

The approximate position of the wave. Suggest using 2000 [bp/ min] * time [min], for mammalian data.

size

The size of the moving window. Suggest using 50 for direct ligation data, and 200 for circular ligation data. Default: 50.

upstreamDist

The amount of upstream sequence to include Default: 10 kb.

TSmooth

Optionally, outlying windows are set a maximum value over the inter-quantile interval, specified by TSmooth. Reasonable value: 20; Default: NA (for no smoothing). Users are encouraged to use this parameter ONLY in combination with the normal distribution assumptions.

emissionDistAssumption

Takes values "norm", "normExp", and "gamma". Specifies the functional form of the 'emission' distribution for states I and II (i.e. 5' of the gene, and inside of the wave). In our experience, "gamma" works best for highly-variable 'spiky' data, and "norm" works for smooth data. As a general rule of thumb, "gamma" is used for libraries made using the direct ligation method, and "norm" for circular ligation data. Default: "gamma".

filterWindowSize

Method returns 'quality' information for each gene to which a wave was fit. Included in these metrics are several that define a moving window. The moving window size is specified by filterWindowSize. Default: 10 kb.

progress

Whether to show progress bar. Default: TRUE

within

Whether the wave must be within the gene. Default: TRUE

BPPARAM

Registered backend for BiocParallel. Default: BiocParallel::bpparam()

Details

The model computes differences in read counts between the two conditions. Differences are assumed fit a functional form which can be specified by the user (using the emissionDistAssumption argument). Currently supported functional forms include a normal distribution (good for GRO-seq data prepared using the circular ligation protocol), a gamma distribution (good for 'spiky' ligation based GRO-seq data), and a long-tailed normal+exponential distribution was implemented, but never deployed.

Initial parameter estimates are based on initial assumptions of transcription rates taken from the literature. Subsequently all parameters are fit using Baum-Welch expectation maximization.

Reference: Danko CG, Hah N, Luo X, Martins AL, Core L, Lis JT, Siepel A, Kraus WL. Signaling Pathways Differentially Affect RNA Polymerase II Initiation, Pausing, and Elongation Rate in Cells. Mol Cell. 2013 Mar 19. doi:pii: S1097-2765(13)00171-8. 10.1016/j.molcel.2013.02.015.

Value

Returns list of GRanges with Pol II wave positions and any BiocParallel errors caught.

Author(s)

Charles G. Danko

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
library(GenomicAlignments)
library(BiocParallel)
genes <- GRanges("chr7", IRanges(2394474,2420377), strand="+",
                 SYMBOL="CYP2W1", ID="54905")
reads1 <- as(readGAlignments(system.file("extdata", "S0mR1.bam",
                             package="groHMM")), "GRanges")
reads2 <- as(readGAlignments(system.file("extdata", "S40mR1.bam",
                             package="groHMM")), "GRanges")
approxDist <- 2000*10
## Often, HMMs fails to converge or distributions fail to fit. Therefore
## don't stop on error.  SerialParam is being used here for building, but
## MulticoreParam or SnowParam are of course more desirable in practice.
bpparam <- SerialParam(stop.on.error = FALSE)
bpresult <- polymeraseWave(reads1, reads2, genes, approxDist, BPPARAM=bpparam)
## Collect successful fits.
gr <- unlist(List(bpresult[bpok(bpresult)]))
gr

omsai/groHMM documentation built on May 24, 2019, 2:18 p.m.