syntheticNucReadsFromDist: Generate a synthetic nucleosome map containing forward and...

View source: R/nucleoSimFunctions.R

syntheticNucReadsFromDistR Documentation

Generate a synthetic nucleosome map containing forward and reverse reads (paired-end reads)

Description

Generate a synthetic nucleosome map, a map with forward and reverses reads (paired-end reads) covering the nucleosome regions, using the distribution selected by the user. The distribution is used to assign the start position to the forward reads associated with the nucleosomes. The user has choice between three different distributions: Normal, Student and Uniform. The final map is composed of paired-end reads.

#' The synthetic nucleosome map creation is separated into 3 steps :

1. Adding well-positioned nucleosomes following specified parameters. The nucleosomes are all positioned at equidistance. Assigning the starting positions of forward reads using the specified distribution and parameters. The distance between starting positions of paired-end reads is assigned using a normal distribution and specified variance.

2. Deleting some well-positioned nucleosomes following specified parameters. Each nucleosome has an equal probability to be selected.

3. Adding fuzzy nucleosomes following an uniform distribution and specified parameters. Assigning the starting positions of forward reads using the specified distribution and parameters. The distance between starting positions of paired-end reads is assigned using a normal distribution and specified variance.

This function has been largely inspired by the Generating synthetic maps section of the nucleR package (Flores et Orozco, 2011).

Usage

syntheticNucReadsFromDist(
  wp.num,
  wp.del,
  wp.var,
  fuz.num,
  fuz.var,
  max.cover = 100,
  nuc.len = 147,
  len.var = 10,
  lin.len = 20,
  read.len = 40,
  rnd.seed = NULL,
  distr = c("Uniform", "Normal", "Student"),
  offset
)

Arguments

wp.num

a non-negative integer, the number of well-positioned (non-overlapping) nucleosomes.

wp.del

a non-negative integer, the number of well-positioned nucleosomes to remove to create uncovered regions.

wp.var

a non-negative integer, the variance associated with the distribution used to assign the start position to the forward reads of the well-positioned nucleosomes. This parameter introduces some variation in the starting positions.

fuz.num

a non-negative numeric, the number of fuzzy nucleosomes. Those nucleosomes are distributed accordingly to an uniform distribution all over the region. Those nucleosomes can overlap other well-positioned or fuzzy nucleosomes.

fuz.var

a non-negative numeric, the maximum variance of the fuzzy nucleosomes. This variance can be different than the one used for the well-positioned nucleosome reads.

max.cover

a positive numeric, the maximum coverage for one nucleosome. The final coverage can have a higher value than max.cover since reads from different nucleosomes can be overlapping. Default = 100.

nuc.len

a positive integer, the nucleosome length. Default = 147.

len.var

a positive numeric, the variance of the distance between a forward read and its paired reverse read. Default = 10.

lin.len

a non-negative integer, the length of the DNA linker. Default = 20.

read.len

a positive integer, the length of each of the paired-end reads. Default = 40.

rnd.seed

a single value, interpreted as an integer, or NULL. If an integer is given, the value is used to set the seed of the random number generator. By fixing the seed, the generated results can be reproduced. Default = NULL.

distr

the name of the distribution used to generate the nucleosome map. The choices are : "Uniform", "Normal" and "Student". Default = "Uniform".

offset

a non-negative integer, the number of bases used to offset all nucleosomes and reads. This is done to ensure that all nucleosome positions and read alignment are of positive values.

Value

an list of class "syntheticNucReads" containing the following elements:

  • call the matched call.

  • dataIP a data.frame with the chromosome name, the starting and ending positions and the direction of all forward and reverse reads for all well-positioned and fuzzy nucleosomes. Paired-end reads are identified with an unique id.

  • wp a data.frame with the positions of all the well-positioned nucleosomes, as well as the number of paired-reads associated to each one.

  • fuz a data.frame with the positions of all the fuzzy nucleosomes, as well as the number of paired-reads associated to each one.

  • paired a data.frame with the starting and ending positions of the reads used to generate the paired-end reads. Paired-end reads are identified with an unique id.

Author(s)

Pascal Belleau, Rawane Samb, Astrid Deschenes

Examples


## Generate a synthetic map with 20 well-positioned + 10 fuzzy nucleosomes
## using a Normal distribution with a variance of 30 for the well-positioned
## nucleosomes, a variance of 40 for the fuzzy nucleosomes and a seed of 15.
## Because of the fixed seed, each time is going to be run, the results
## are going to be the seed.
res <- syntheticNucReadsFromDist(wp.num = 20, wp.del = 0, wp.var = 30,
fuz.num = 10, fuz.var = 40, rnd.seed = 15, distr = "Normal",
offset = 1000)


ArnaudDroitLab/nucleoSim documentation built on March 17, 2022, 11 p.m.