add_platform_error: Simulate sequencing error using empirical error model

Description Usage Arguments Details Value References See Also Examples

View source: R/add_platform_error.R

Description

Given a sequencing platform and a set of sequencing reads, add sequencing errors to the reads given a known error profile from the platform.

Usage

1
add_platform_error(tFrags, platform, paired, path = NULL)

Arguments

tFrags

DNAStringSetList containing error-free sequencing reads. If simulating a paired-end experiment, mate-pairs should appear next to each other in tFrags.

platform

Which sequencing platform should the error model be estimated from? Currently supports 'illumina4', 'illumina5', 'roche454', and 'custom'.

paired

Does tFrags contain paired end reads, with mate pairs next to each other? (TRUE if yes.)

path

if platform is 'custom', provide the path to the error model. After processing the error model with build_error_models.py, you will have either two files (ending in _mate1 and _mate2, if your model was for paired-end reads) or one file (ending in _single, if your model was for single-end reads). The path argument should be the path to the error model up to but not including _mate1/_mate2/_single.

Details

This function adds sequencing error to a set of reads based on the position in the read and the true nucleotide at that location. Position-specific probabilities of making each possible sequencing error (reading a T when it should have been A, reading a G when it should have been T, etc.) were calculated for each of three platforms using the empirical error models available with the GemSIM software (see references). Users can also estimate an error model from their own data using GemSIM and can use that error model with Polyester as described in the vignette. (You will need to run a Python script available at the Polyester GitHub repository to process the error model).

Value

DNAStringSet object that is the same as tFrags except but with sequencing error added.

References

McElroy KE, Luciani F and Thomas T (2012): GemSIM: general, error-model based simulator of next-generation sequencing data. BMC Genomics 13(1), 74.

See Also

add_error for uniform error

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
  library(Biostrings)
  # pretend the srPhiX174 DNAStringSet represents 35bp single-end 
  # sequencing reads:
  data(srPhiX174) 
  set.seed(718)
  data_with_errors = add_platform_error(srPhiX174, 'illumina4', paired=FALSE)
  
  # the 17th read in this set has an error at position 20:
  data_with_errors[17][[1]][20] # N
  srPhiX174[17][[1]][20] # T
  
  # 101 reads total have at least one sequencing error:
  sum(data_with_errors != srPhiX174)
 

alyssafrazee/polyester documentation built on Sept. 17, 2021, 8:54 a.m.