simChIP: Simulate ChIP-seq experiments

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/nucSim.R

Description

This function acts as driver for the simulation. It takes all required arguments and passes them on to the functions for the different stages of the simulation. The current defaults will simulate a nucleosome positioning experiment.

Usage

1
2
simChIP(nreads, genome, file, functions = defaultFunctions(), 
	control = defaultControl(), verbose = TRUE, load = FALSE)

Arguments

nreads

Number of reads to generate.

genome

An object of class 'DNAStringSet' or the name of a fasta file containing the genome sequence.

file

Base of output file names (see Details).

functions

Named list of functions to use for various stages of the simulation, expected names are: ‘features’, ‘bindDens’, ‘readDens’, ‘sampleReads’, ‘readNames’, ‘readSequence’

control

Named list of arguments to be passed to simulation functions (one list per function).

verbose

Logical indicating whether progress messages should be printed.

load

Logical indicating whether an attempt should be made to load intermediate results from a previous run.

Details

The simulation consists of six of stages:

  1. generate feature sequence (for each chromosome): chromosome length -> feature sequence (list)

  2. compute binding site density: feature sequence -> binding site density (vector)

  3. compute read density: binding site density -> read density (two column matrix, one column for each strand)

  4. sample read start sites: read density -> read positions (list)

  5. create read names: number of reads -> unique names

  6. obtain read sequence and quality: read positions, genome sequence, [qualities] -> output file

After each of the first three stages the results of the stage are written to a file and can be reused later. File names are created by appending ‘_features.rdata’, ‘_bindDensity.rdata’ and ‘_readDensity.rdata’ to file respectively. Previous results will be loaded for reuse if load is TRUE and files with matching names are found. This is useful to sample repeatedly from the same read density or to recover partial results from an interrupted run.

The creation of files can be prevented by setting file = “”. In this case all results will be returned in a list at the end. Note that this will require more memory since all intermediate results have to be held until the end.

The behaviour of the simulation is mainly controlled through the functions and control arguments. They are expected to be lists of the same length with matching names. The names indicate the stage of the simulation for which the function should be used; elements of control will be used as arguments for the corresponding functions.

Value

A list. The components are typically either lists (with one component per chromosome) or file names but note that this may depend on the return value of functions listed in functions. The components of the returned list are:

features

Either a list of generated features or the name of a file containing that list;

bindDensity

Either a list with binding site densities or the name of a file containing that list;

readDensity

Either a list of read densities or the name of a file containing that list;

readPosition

Either a list of read start sites or the name of a file containing that list;

readSequence

The return value of the function listed as ‘readSequence’. The default for this the name of the fastq file containing the read sequences;

readNames

Either a list of read names or the name of a file containing that list.

Author(s)

Peter Humburg

See Also

defaultFunctions, defaultControl

Examples

1
2
3
4
5
6
7
8
## Not run: 
## To run the default nucleosome positioning simulation 
## we can simply run something like the line below.
## This will result in 10 million reads sampled from the genome.
## Of course the file names have to be changed as appropriate. 
simChIP(1e7, genome = "reference.fasta", file = "output/sim_10M")

## End(Not run)

ChIPsim documentation built on Nov. 8, 2020, 8:09 p.m.