simChIP: Simulate ChIP-seq experiments
In ChIPsim: Simulation of ChIP-seq experiments

Description Usage Arguments Details Value Author(s) See Also Examples

This function acts as driver for the simulation. It takes all required arguments and passes them on to the functions for the different stages of the simulation. The current defaults will simulate a nucleosome positioning experiment.

1 2	simChIP(nreads, genome, file, functions = defaultFunctions(), control = defaultControl(), verbose = TRUE, load = FALSE)

`nreads`	Number of reads to generate.
`genome`	An object of class 'DNAStringSet' or the name of a fasta file containing the genome sequence.
`file`	Base of output file names (see Details).
`functions`	Named list of functions to use for various stages of the simulation, expected names are: ‘features’, ‘bindDens’, ‘readDens’, ‘sampleReads’, ‘readNames’, ‘readSequence’
`control`	Named list of arguments to be passed to simulation functions (one list per function).
`verbose`	Logical indicating whether progress messages should be printed.
`load`	Logical indicating whether an attempt should be made to load intermediate results from a previous run.

The simulation consists of six of stages:

generate feature sequence (for each chromosome): chromosome length -> feature sequence (list)
compute binding site density: feature sequence -> binding site density (vector)
compute read density: binding site density -> read density (two column matrix, one column for each strand)
sample read start sites: read density -> read positions (list)
create read names: number of reads -> unique names
obtain read sequence and quality: read positions, genome sequence, [qualities] -> output file

After each of the first three stages the results of the stage are written to a file and can be reused later. File names are created by appending ‘_features.rdata’, ‘_bindDensity.rdata’ and ‘_readDensity.rdata’ to file respectively. Previous results will be loaded for reuse if load is TRUE and files with matching names are found. This is useful to sample repeatedly from the same read density or to recover partial results from an interrupted run.

The creation of files can be prevented by setting file = “”. In this case all results will be returned in a list at the end. Note that this will require more memory since all intermediate results have to be held until the end.

The behaviour of the simulation is mainly controlled through the functions and control arguments. They are expected to be lists of the same length with matching names. The names indicate the stage of the simulation for which the function should be used; elements of control will be used as arguments for the corresponding functions.

A list. The components are typically either lists (with one component per chromosome) or file names but note that this may depend on the return value of functions listed in functions. The components of the returned list are:

`features`	Either a list of generated features or the name of a file containing that list;
`bindDensity`	Either a list with binding site densities or the name of a file containing that list;
`readDensity`	Either a list of read densities or the name of a file containing that list;
`readPosition`	Either a list of read start sites or the name of a file containing that list;
`readSequence`	The return value of the function listed as ‘`readSequence`’. The default for this the name of the fastq file containing the read sequences;
`readNames`	Either a list of read names or the name of a file containing that list.

Peter Humburg

defaultFunctions, defaultControl

## Not run: 
## To run the default nucleosome positioning simulation 
## we can simply run something like the line below.
## This will result in 10 million reads sampled from the genome.
## Of course the file names have to be changed as appropriate. 
simChIP(1e7, genome = "reference.fasta", file = "output/sim_10M")

## End(Not run)