Fit a multivariate Hidden Markov Model to multiple ChIP-seq replicates

Share:

Description

Fit an HMM to multiple ChIP-seq replicates and derive correlation measures. Input is a list of uniHMMs generated by callPeaksUnivariate.

Usage

1
2
3
callPeaksReplicates(hmm.list, max.states = 32, force.equal = FALSE,
  eps = 0.01, max.iter = NULL, max.time = NULL, keep.posteriors = TRUE,
  num.threads = 1, max.distance = 0.2, per.chrom = TRUE)

Arguments

hmm.list

A list of uniHMMs generated by callPeaksUnivariate, e.g. list(hmm1,hmm2,...) or c("file1","file2",...). Alternatively, this parameter also accepts a multiHMM and will check if the distance between replicates is greater than max.distance.

max.states

The maximum number of combinatorial states to consider. The default (32) is sufficient to treat up to 5 replicates exactly and more than 5 replicates approximately.

force.equal

The default (FALSE) allows replicates to differ in their peak-calls, although the majority will usually be identical. If force.equal=TRUE, all peaks will be identical among all replicates.

eps

Convergence threshold for the Baum-Welch algorithm.

max.iter

The maximum number of iterations for the Baum-Welch algorithm. The default NULL is no limit.

max.time

The maximum running time in seconds for the Baum-Welch algorithm. If this time is reached, the Baum-Welch will terminate after the current iteration finishes. The default NULL is no limit.

keep.posteriors

If set to TRUE, posteriors will be available in the output. This is useful to change the post.cutoff later, but increases the necessary disk space to store the result immense.

num.threads

Number of threads to use. Setting this to >1 may give increased performance.

max.distance

This number is used as a cutoff to group replicates based on their distance matrix. The lower this number, the more similar replicates have to be to be grouped together.

per.chrom

If per.chrom=TRUE chromosomes will be treated separately. This tremendously speeds up the calculation but results might be noisier as compared to per.chrom=FALSE, where all chromosomes are concatenated for the HMM.

Value

Output is a multiHMM object with additional entry replicateInfo. If only one uniHMM was given as input, a simple list() with the replicateInfo is returned.

Author(s)

Aaron Taudt

See Also

multiHMM, callPeaksUnivariate, callPeaksMultivariate

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
# Let's get some example data with 3 replicates
file.path <- system.file("extdata","euratrans", package='chromstaRData')
files <- list.files(file.path, pattern="H3K27me3.*SHR.*bam$", full.names=TRUE)[1:3]
# Obtain chromosome lengths. This is only necessary for BED files. BAM files are
# handled automatically.
data(rn4_chrominfo)
# Define experiment structure
exp <- data.frame(file=files, mark='H3K27me3', condition='SHR', replicate=1:3,
                 pairedEndReads=FALSE, controlFiles=NA)
# We use bin size 1000bp and chromosome 12 to keep the example quick
binned.data <- list()
for (file in files) {
 binned.data[[basename(file)]] <- binReads(file, binsizes=1000, experiment.table=exp,
                                              assembly=rn4_chrominfo, chromosomes='chr12')
}
# The univariate fit is obtained for each replicate
models <- list()
for (i1 in 1:length(binned.data)) {
 models[[i1]] <- callPeaksUnivariate(binned.data[[i1]], max.time=60, eps=1)
}
# Obtain peak calls considering information from all replicates
multi.model <- callPeaksReplicates(models, force.equal=TRUE, max.time=60, eps=1)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.