Wrapper function for the chromstaR package

Description

This function performs binning, univariate peak calling and multivariate peak calling from a list of input files.

Usage

1
2
3
4
5
6
7
Chromstar(inputfolder, experiment.table, outputfolder, configfile = NULL,
  numCPU = 1, binsize = 1000, assembly = NULL, chromosomes = NULL,
  remove.duplicate.reads = TRUE, min.mapq = 10, prefit.on.chr = NULL,
  eps.univariate = 0.1, max.time = NULL, max.iter = 5000,
  read.cutoff.absolute = 500, keep.posteriors = TRUE,
  mode = "differential", max.states = 128, per.chrom = TRUE,
  eps.multivariate = 0.01, exclusive.table = NULL)

Arguments

inputfolder

Folder with either BAM or BED-6 (see readBedFileAsGRanges files.

experiment.table

A data.frame or tab-separated text file with the structure of the experiment. See experiment.table for an example.

outputfolder

Folder where the results and intermediate files will be written to.

configfile

A file specifying the parameters of this function (without inputfolder, outputfolder and configfile). Having the parameters in a file can be handy if many samples with the same parameter settings are to be run. If a configfile is specified, it will take priority over the command line parameters.

numCPU

Number of threads to use for the analysis. Beware that more CPUs also means more memory is needed. If you experience crashes of R with higher numbers of this parameter, leave it at numCPU=1.

binsize

An integer specifying the bin size that is used for the analysis.

assembly

A data.frame or tab-separated file with columns 'chromosome' and 'length'. Alternatively a character specifying the assembly, see fetchExtendedChromInfoFromUCSC for available assemblies. Specifying an assembly is only necessary when importing BED files. BAM files are handled automatically.

chromosomes

If only a subset of the chromosomes should be imported, specify them here.

remove.duplicate.reads

A logical indicating whether or not duplicate reads should be removed.

min.mapq

Minimum mapping quality when importing from BAM files. Set min.mapq=0 to keep all reads.

prefit.on.chr

A chromosome that is used to pre-fit the Hidden Markov Model. Set to NULL if you don't want to prefit but use the whole genome instead.

eps.univariate

Convergence threshold for the univariate Baum-Welch algorithm.

max.time

The maximum running time in seconds for the Baum-Welch algorithm. If this time is reached, the Baum-Welch will terminate after the current iteration finishes. The default NULL is no limit.

max.iter

The maximum number of iterations for the Baum-Welch algorithm. The default NULL is no limit.

read.cutoff.absolute

Read counts above this value will be set to the read count specified by this value. Filtering very high read counts increases the performance of the Baum-Welch fitting procedure. However, if your data contains very few peaks they might be filtered out. If option read.cutoff.quantile is also specified, the minimum of the resulting cutoff values will be used. Set read.cutoff=FALSE to disable this filtering.

keep.posteriors

If set to TRUE (default=FALSE), posteriors will be available in the output. This is useful to change the post.cutoff later, but increases the necessary disk space to store the result.

mode

One of c('differential','combinatorial','full'). The modes determine how the multivariate part is run. Here is some advice which mode to use:

combinatorial

Each condition is analyzed separately with all marks combined. Choose this mode if you have more than ~7 conditions or you want to have a high sensitivity for detecting combinatorial states. Differences between conditions will be more noisy (more false positives) than in mode 'differential' but combinatorial states are more precise.

differential

Each mark is analyzed separately with all conditions combined. Choose this mode if you are interested in accurate differences. Combinatorial states will be more noisy (more false positives) than in mode 'combinatorial' but differences are more precise.

full

Full analysis of all marks and conditions combined. Best of both, but: Choose this mode only if (number of conditions * number of marks 8), otherwise it might be too slow or crash due to memory limitations.

separate

Only replicates are analyzed multivariately. Combinatorial states are constructed by a simple post-hoc combination of peak calls.

max.states

The maximum number of states to use in the multivariate part. If set to NULL, the maximum number of theoretically possible states is used. CAUTION: This can be very slow or crash if you have too many states. chromstaR has a built in mechanism to select the best states in case that less states than theoretically possible are specified.

per.chrom

If set to TRUE chromosomes will be treated separately in the multivariate part. This tremendously speeds up the calculation but results might be noisier as compared to per.chrom=FALSE, where all chromosomes are concatenated for the HMM.

eps.multivariate

Convergence threshold for the multivariate Baum-Welch algorithm.

exclusive.table

A data.frame or tab-separated file with columns 'mark' and 'group'. Histone marks with the same group will be treated as mutually exclusive.

Value

NULL

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
## Prepare the file paths. Exchange this with your input and output directories.
inputfolder <- system.file("extdata","euratrans", package="chromstaRData")
outputfolder <- file.path(tempdir(), 'SHR-example')
## Define experiment structure
data(experiment_table_SHR)
## Define assembly
# This is only necessary if you have BED files, BAM files are handled automatically.
# For common assemblies you can also specify them as 'hg19' for example.
data(rn4_chrominfo)
## Run ChromstaR
Chromstar(inputfolder, experiment.table=experiment_table_SHR,
         outputfolder=outputfolder, numCPU=4, binsize=1000, assembly=rn4_chrominfo,
         prefit.on.chr='chr12', chromosomes='chr12', mode='combinatorial', eps.univariate=1,
         eps.multivariate=1)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.