exclude.seqsite: Function to exclude fragments containing a specified...

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/exclude.seqsite.R

Description

Given a vector of sequences representing DNA fragments digested by restriction enzyme, the function return the DNA fragments that do not contain a specified restriction site, which is typically used to reduce the number of loci in the RESTseq method. The function can be use repeatedly for excluding fragments containing several restriction sites.

Usage

1
exclude.seqsite(sequences, site, verbose=TRUE)

Arguments

sequences

a vector of DNA sequences representing DNA fragments after digestion by restriction enzyme(s), typically the output of another function such as adapt.select, insilico.digest or exclude.seqsite.

site

restriction site to target DNA fragments to exclude. This typically corresponds to recognition site of a frequent cutter restriction enzyme.

verbose

If TRUE (the default), returns the number of fragments excluded and kept. FALSE makes the function silent to be used in a loop.

Details

Frequent cutter restriction enzyme can be easily used to further reduce the number of fragments as demonstrated by RESTseq method. This approach looks interesting in some species with complex genomes as it allows removing parts of the genomes containing highly repetitive CG or / and AT rich sequences.

This function can be used directly after a single enzyme digestion using insilico.digest function to remove fragments containing restriction site of a second enzyme. An equivalent alternative would be to simulate a double digestion using insilico.digest followed by adapt.select with type = "AA", which would remove fragments containing restriction site of the enzyme 2 (see example below).

An unlimited number of exclusion steps using different restriction enzyme can be simulated by running the function with the output of a previous execution of the function (see example below).

Value

A vector of DNA fragment sequences.

Author(s)

Olivier Lepais

References

Lepais O & Weir JT. 2014. SimRAD: an R package for simulation-based prediction of the number of loci expected in RADseq and similar genotyping by sequencing approaches. Molecular Ecology Resources, 14, 1314-1321. DOI: 10.1111/1755-0998.12273.

Stolle & Moritz 2013. RESTseq - Efficient benchtop population genomics with RESTriction fragment SEQuencing. PLoS ONE 8: e63960. doi:10.1371/journal.pone.0063960

See Also

adapt.select, size.select.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
### Example 1:
# simulating some sequence:
simseq <-  sim.DNAseq(size=1000000, GCfreq=0.433)

#Restriction Enzyme 1
#PstI
cs_5p1 <- "CTGCA"
cs_3p1 <- "G"

#Restriction Enzyme 2
#MseI #
cs_5p2 <- "T"
cs_3p2 <- "TAA"
# hence, recognition site: "TTAA"

# single digestion:
simseq.dig <- insilico.digest(simseq, cs_5p1, cs_3p1, cs_5p1, cs_3p1, verbose=TRUE)
# excluding fragments coutaining restriction site of the enzyme 2
simseq.exc <- exclude.seqsite(simseq.dig, "TTAA")

## which is equivalent to:
simseq.dig2 <- insilico.digest(simseq, cs_5p1, cs_3p1, cs_5p2, cs_3p2, verbose=TRUE)
simseq.selectAA <- adapt.select(simseq.dig2, type="AA", cs_5p1, cs_3p1, cs_5p2, cs_3p2)
length(simseq.selectAA)


### Example 2:
simseq <-  sim.DNAseq(size=1000000, GCfreq=0.51)

#Restriction Enzyme 1
#TaqI
cs_5p1 <- "T"
cs_3p1 <- "CGA"

simseq.dig <- insilico.digest(simseq, cs_5p1, cs_3p1, cs_5p1, cs_3p1, verbose=TRUE)

# removing fragments countaining restiction sites of MseI ("TTAA"), MliCI ("AATT"),
#         HaellI ("GGCC"), MspI ("CCGG") and HinP1I ("GCGC"):
excl1 <- exclude.seqsite(simseq.dig, "TTAA")
excl2 <- exclude.seqsite(excl1, "AATT")
excl3 <- exclude.seqsite(excl2, "GGCC")
excl4 <- exclude.seqsite(excl3, "CCGG")
excl5 <- exclude.seqsite(excl4, "GCGC")
# which can be followed by  size selection step.

SimRAD documentation built on May 1, 2019, 10:16 p.m.