callPeaksUnivariate: Fit a Hidden Markov Model to a ChIP-seq sample.

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/callPeaksUnivariate.R

Description

Fit a HMM to a ChIP-seq sample to determine the modification state of genomic regions, e.g. call peaks in the sample.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
callPeaksUnivariate(
  binned.data,
  control.data = NULL,
  prefit.on.chr = NULL,
  short = TRUE,
  eps = 0.1,
  init = "standard",
  max.time = NULL,
  max.iter = 5000,
  num.trials = 1,
  eps.try = NULL,
  num.threads = 1,
  read.cutoff = TRUE,
  read.cutoff.quantile = 1,
  read.cutoff.absolute = 500,
  max.mean = Inf,
  post.cutoff = 0.5,
  control = FALSE,
  keep.posteriors = FALSE,
  keep.densities = FALSE,
  verbosity = 1
)

Arguments

binned.data

A GRanges-class object with binned read counts or a file that contains such an object.

control.data

Input control for the experiment. A GRanges-class object with binned read counts or a file that contains such an object.

prefit.on.chr

A chromosome that is used to pre-fit the Hidden Markov Model. Set to NULL if you don't want to prefit but use the whole genome instead.

short

If TRUE, the second fitting step is only done with one iteration.

eps

Convergence threshold for the Baum-Welch algorithm.

init

One of the following initialization procedures:

standard

The negative binomial of state 'unmodified' will be initialized with mean=mean(counts), var=var(counts) and the negative binomial of state 'modified' with mean=mean(counts)+1, var=var(counts). This procedure usually gives the fastest convergence.

random

Mean and variance of the negative binomials will be initialized with random values (in certain boundaries, see source code). Try this if the 'standard' procedure fails to produce a good fit.

empiric

Yet another way to initialize the Baum-Welch. Try this if the other two methods fail to produce a good fit.

max.time

The maximum running time in seconds for the Baum-Welch algorithm. If this time is reached, the Baum-Welch will terminate after the current iteration finishes. The default NULL is no limit.

max.iter

The maximum number of iterations for the Baum-Welch algorithm. The default NULL is no limit.

num.trials

The number of trials to run the HMM. Each time, the HMM is seeded with different random initial values. The HMM with the best likelihood is given as output.

eps.try

If code num.trials is set to greater than 1, eps.try is used for the trial runs. If unset, eps is used.

num.threads

Number of threads to use. Setting this to >1 may give increased performance.

read.cutoff

The default (TRUE) enables filtering of high read counts. Set read.cutoff=FALSE to disable this filtering.

read.cutoff.quantile

A quantile between 0 and 1. Should be near 1. Read counts above this quantile will be set to the read count specified by this quantile. Filtering very high read counts increases the performance of the Baum-Welch fitting procedure. However, if your data contains very few peaks they might be filtered out. If option read.cutoff.absolute is also specified, the minimum of the resulting cutoff values will be used. Set read.cutoff=FALSE to disable this filtering.

read.cutoff.absolute

Read counts above this value will be set to the read count specified by this value. Filtering very high read counts increases the performance of the Baum-Welch fitting procedure. However, if your data contains very few peaks they might be filtered out. If option read.cutoff.quantile is also specified, the minimum of the resulting cutoff values will be used. Set read.cutoff=FALSE to disable this filtering.

max.mean

If mean(counts)>max.mean, bins with low read counts will be set to 0. This is a workaround to obtain good fits in the case of large bin sizes.

post.cutoff

False discovery rate. codeNULL means that the state with maximum posterior probability will be chosen, irrespective of its absolute probability (default=codeNULL).

control

If set to TRUE, the binned data will be treated as control experiment. That means only state 'zero-inflation' and 'unmodified' will be used in the HMM.

keep.posteriors

If set to TRUE (default=FALSE), posteriors will be available in the output. This is useful to change the post.cutoff later, but increases the necessary disk space to store the result.

keep.densities

If set to TRUE (default=FALSE), densities will be available in the output. This should only be needed debugging.

verbosity

Verbosity level for the fitting procedure. 0 - No output, 1 - Iterations are printed.

Details

This function is similar to callPeaksUnivariateAllChr but allows to pre-fit on a single chromosome instead of the whole genome. This gives a significant performance increase and can help to converge into a better fit in case of unsteady quality for some chromosomes.

Value

A uniHMM object.

Author(s)

Aaron Taudt, Maria Colome Tatche

See Also

uniHMM, callPeaksMultivariate

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
## Get an example BAM file with ChIP-seq reads
file <- system.file("extdata", "euratrans",
                      "lv-H3K27me3-BN-male-bio2-tech1.bam",
                       package="chromstaRData")
## Bin the BED file into bin size 1000bp
data(rn4_chrominfo)
data(experiment_table)
binned <- binReads(file, experiment.table=experiment_table,
                  assembly=rn4_chrominfo, binsizes=1000,
                  stepsizes=500, chromosomes='chr12')
## Fit the univariate Hidden Markov Model
hmm <- callPeaksUnivariate(binned, max.time=60, eps=1)
## Check if the fit is ok
plotHistogram(hmm)

ataudt/chromstaR documentation built on Dec. 26, 2021, 12:07 a.m.