Fit a Hidden Markov Model to a ChIP-seq sample.

Share:

Description

Fit a HMM to a ChIP-seq sample to determine the modification state of genomic regions, e.g. call peaks in the sample.

Usage

1
2
3
4
5
6
callPeaksUnivariateAllChr(binned.data, input.data = NULL, eps = 0.01,
  init = "standard", max.time = NULL, max.iter = NULL, num.trials = 1,
  eps.try = NULL, num.threads = 1, read.cutoff = TRUE,
  read.cutoff.quantile = 1, read.cutoff.absolute = 500, max.mean = Inf,
  post.cutoff = 0.5, control = FALSE, keep.posteriors = FALSE,
  keep.densities = FALSE, verbosity = 1)

Arguments

binned.data

A GRanges object with binned read counts or a file that contains such an object.

input.data

Input control for the experiment. A GRanges object with binned read counts or a file that contains such an object.

eps

Convergence threshold for the Baum-Welch algorithm.

init

One of the following initialization procedures:

standard

The negative binomial of state 'unmodified' will be initialized with mean=mean(counts), var=var(counts) and the negative binomial of state 'modified' with mean=mean(counts)+1, var=var(counts). This procedure usually gives the fastest convergence.

random

Mean and variance of the negative binomials will be initialized with random values (in certain boundaries, see source code). Try this if the 'standard' procedure fails to produce a good fit.

empiric

Yet another way to initialize the Baum-Welch. Try this if the other two methods fail to produce a good fit.

max.time

The maximum running time in seconds for the Baum-Welch algorithm. If this time is reached, the Baum-Welch will terminate after the current iteration finishes. The default NULL is no limit.

max.iter

The maximum number of iterations for the Baum-Welch algorithm. The default NULL is no limit.

num.trials

The number of trials to run the HMM. Each time, the HMM is seeded with different random initial values. The HMM with the best likelihood is given as output.

eps.try

If code num.trials is set to greater than 1, eps.try is used for the trial runs. If unset, eps is used.

num.threads

Number of threads to use. Setting this to >1 may give increased performance.

read.cutoff

The default (TRUE) enables filtering of high read counts. Set read.cutoff=FALSE to disable this filtering.

read.cutoff.quantile

A quantile between 0 and 1. Should be near 1. Read counts above this quantile will be set to the read count specified by this quantile. Filtering very high read counts increases the performance of the Baum-Welch fitting procedure. However, if your data contains very few peaks they might be filtered out. If option read.cutoff.absolute is also specified, the minimum of the resulting cutoff values will be used. Set read.cutoff=FALSE to disable this filtering.

read.cutoff.absolute

Read counts above this value will be set to the read count specified by this value. Filtering very high read counts increases the performance of the Baum-Welch fitting procedure. However, if your data contains very few peaks they might be filtered out. If option read.cutoff.quantile is also specified, the minimum of the resulting cutoff values will be used. Set read.cutoff=FALSE to disable this filtering.

max.mean

If mean(counts)>max.mean, bins with low read counts will be set to 0. This is a workaround to obtain good fits in the case of large bin sizes.

post.cutoff

False discovery rate. codeNULL means that the state with maximum posterior probability will be chosen, irrespective of its absolute probability (default=codeNULL).

control

If set to TRUE, the binned data will be treated as control experiment. That means only state 'zero-inflation' and 'unmodified' will be used in the HMM.

keep.posteriors

If set to TRUE (default=FALSE), posteriors will be available in the output. This is useful to change the post.cutoff later, but increases the necessary disk space to store the result.

keep.densities

If set to TRUE (default=FALSE), densities will be available in the output. This should only be needed debugging.

verbosity

Verbosity level for the fitting procedure. 0 - No output, 1 - Iterations are printed.

Details

The Hidden Markov Model which is used to classify the bins uses 3 states: state 'zero-inflation' with a delta function as emission densitiy (only zero read counts), 'unmodified' and 'modified' with Negative Binomials as emission densities. A Baum-Welch algorithm is employed to estimate the parameters of the distributions. Please refer to our manuscript at http://dx.doi.org/10.1101/038612 for a detailed description of the method.

Value

A uniHMM object.

Author(s)

Aaron Taudt, Maria Coome Tatche

See Also

uniHMM, callPeaksMultivariate

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.