Description Usage Arguments Details Value Author(s) References See Also Examples
The function randomly generated DNA sequence of a given length and with fixed GC content to simulate DNA sequence representing a (proportion of the) genome for non-model species without available reference genome sequence.
1 | sim.DNAseq(size = 10000, GCfreq = 0.46)
|
size |
DNA sequence length to generate in bp. Could be the known or estimated size of the species genome or more conveniently a fraction large enough to be representative of the full length size to limit memory usage and speed up computations. |
GCfreq |
GC content expressed as the proportion of G and C bases in the genome. |
If no reference genome sequence or some reference contigs are available for a species, knowledge of GC content and genome size are needed to generate random DNA sequence representative of a species. If reference genome sequence (even draft or unassembled contigs) are available, a randomly generated sequence can be simulate following CG content and length of an import proportion of the reference sequence for comparison purpose (see example below).
A single continuous DNA sequence (character string).
Olivier Lepais
Lepais O & Weir JT. 2014. SimRAD: an R package for simulation-based prediction of the number of loci expected in RADseq and similar genotyping by sequencing approaches. Molecular Ecology Resources, 14, 1314-1321. DOI: 10.1111/1755-0998.12273.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | ## example 1: simulation of random sequence for non-model species without reference genome:
# generating 1Mb of DNA sequence with 44.4% GC content:
smsq <- sim.DNAseq(size=1000000, GCfreq=0.444)
# length:
width(smsq)
# GC content:
require(seqinr)
GC(s2c(smsq))
## example 2: simulating random sequence with parameter following a known reference genome sequence:
# generating a Fasta file for the example:
sq<-c()
for(i in 1:10){
sq <- c(sq, sim.DNAseq(size=rpois(1, 1000), GCfreq=0.444))
}
sq <- DNAStringSet(sq)
writeFasta(sq, file="SimRAD-exampleRefSeq-Fasta.fa", mode="w")
# importing the Fasta file and sub-selecting 25% of the contigs
rfsq <- ref.DNAseq("SimRAD-exampleRefSeq-Fasta.fa", subselect.contigs = TRUE, prop.contigs = 0.25)
# length of the reference sequence:
width(rfsq)
# computing GC content:
require(seqinr)
GC(s2c(rfsq))
# simulating random generated DNA sequence with characteristics equivalent to
# the sub-selected reference genome for comparison purpose:
smsq <- sim.DNAseq(size=width(rfsq), GCfreq=GC(s2c(rfsq)))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.