DigestDNA: Simulate Restriction Digestion of DNA

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/DigestDNA.R

Description

Restriction enzymes can be used to cut double-stranded DNA into fragments at specific cut sites. DigestDNA performs an in-silico restriction digest of the input DNA sequence(s) given one or more restriction sites.

Usage

1
2
3
4
DigestDNA(sites,
          myDNAStringSet,
          type = "fragments",
          strand = "both")

Arguments

sites

A character vector of DNA recognition sequences and their enzymes' corresponding cut site(s).

myDNAStringSet

A DNAStringSet object or character vector with one or more sequences in 5' to 3' orientation.

type

Character string indicating the type of results desired. This should be (an abbreviation of) either "fragments" or "positions".

strand

Character string indicating the strand(s) to cut. This should be (an abbreviation of) one of "both", "top", or "bottom". The top strand is defined as the input DNAStringSet sequence, and the bottom strand is its reverse complement.

Details

In the context of a restriction digest experiment with a known DNA sequence, it can be useful to predict the expected DNA fragments in-silico. Restriction enzymes make cuts in double-stranded DNA at specific positions near their recognition site. The recognition site may be somewhat ambiguous, as represented by the IUPAC_CODE_MAP. Cuts that occur at different positions on the top and bottom strands result in sticky-ends, whereas those that occur at the same position result in fragments with blunt-ends. Multiple restriction sites can be supplied to simultaneously digest the DNA. In this case, sites for the different restriction enzymes may be overlapping, which could result in multiple close-proximity cuts that would not occur experimentally. Also, note that cut sites will not be matched to non-DNA_BASES in myDNAStringSet.

Value

DigestDNA can return two types of results: cut positions or the resulting DNA fragments corresponding to the top, bottom, or both strands. If type is "positions" then the output is a list with the cut location(s) in each sequence in myDNAStringSet. The cut location is defined as the position after the cut relative to the 5'-end. For example, a cut at 6 would occur between positions 5 and 6, where the respective strand's 5' nucleotide is defined as position 1.

If type is "fragments" (the default), then the result is a DNAStringSetList. Each element of the list contains the top and/or bottom strand fragments after digestion of myDNAStringSet, or the original sequence if no cuts were made. Sequences are named by whether they originated from the top or bottom strand, and list elements are named based on the input DNA sequences. The top strand is defined by myDNAStringSet as it is input, whereas the bottom strand is its reverse complement.

Author(s)

Erik Wright eswright@pitt.edu

See Also

DesignSignatures, RESTRICTION_ENZYMES

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
# digest hypothetical DNA sequences with BamHI
data(RESTRICTION_ENZYMES)
site <- RESTRICTION_ENZYMES[c("BamHI")]
dna <- DNAStringSet(c("AAGGATCCAA", "GGGATCAT"))
dna # top strand
reverseComplement(dna) # bottom strand
names(dna) <- c("hyp1", "hyp2")
d <- DigestDNA(site, dna)
d # fragments in a DNAStringSetList
unlist(d) # all fragments as one DNAStringSet

# Restriction digest of Yeast Chr. 1 with EcoRI and EcoRV
data(yeastSEQCHR1)
sites <- RESTRICTION_ENZYMES[c("EcoRI", "EcoRV")]
seqs <- DigestDNA(sites, yeastSEQCHR1)
seqs[[1]]

pos <- DigestDNA(sites, yeastSEQCHR1, type="positions")
str(pos)

Example output

Loading required package: Biostrings
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colMeans, colSums, colnames, do.call,
    duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
    lapply, lengths, mapply, match, mget, order, paste, pmax, pmax.int,
    pmin, pmin.int, rank, rbind, rowMeans, rowSums, rownames, sapply,
    setdiff, sort, table, tapply, union, unique, unsplit, which,
    which.max, which.min

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following object is masked from 'package:base':

    expand.grid

Loading required package: IRanges
Loading required package: XVector

Attaching package: 'Biostrings'

The following object is masked from 'package:base':

    strsplit

Loading required package: RSQLite
  A DNAStringSet instance of length 2
    width seq
[1]    10 AAGGATCCAA
[2]     8 GGGATCAT
  A DNAStringSet instance of length 2
    width seq
[1]    10 TTGGATCCTT
[2]     8 ATGATCCC
DNAStringSetList of length 2
[["hyp1"]] top=AAG top=GATCCAA bottom=TTG bottom=GATCCTT
[["hyp2"]] top=GGGATCAT bottom=ATGATCCC
  A DNAStringSet instance of length 6
    width seq                                               names               
[1]     3 AAG                                               hyp1.top
[2]     7 GATCCAA                                           hyp1.top
[3]     3 TTG                                               hyp1.bottom
[4]     7 GATCCTT                                           hyp1.bottom
[5]     8 GGGATCAT                                          hyp2.top
[6]     8 ATGATCCC                                          hyp2.bottom
  A DNAStringSet instance of length 314
      width seq                                             names               
  [1]   612 CCACACCACACCCACACACCCA...CCATCATTATCCACATTTTGAT top
  [2]  1431 ATCTATATCTCATTCGGCGGTC...ATTGGGCTAAGTGAGCTCTGAT top
  [3]   568 ATCAGAGACGTAGACACCCAAT...AGAAGCTTATTGTCTAAGCCTG top
  [4]    51 AATTCAGTCTGCTTTAAACGGC...GAGGAAATATTTCCATCTCTTG top
  [5]   136 AATTCGTACAACATTAAACGTG...GATGGTAATGAGACAAGTTGAT top
  ...   ... ...
[310]   132 ATCAACTTGTCTCATTACCATC...AACACACGTTTAATGTTGTACG bottom
[311]    51 AATTCAAGAGATGGAAATATTT...GGAAGCCGTTTAAAGCAGACTG bottom
[312]   572 AATTCAGGCTTAGACAATAAGC...ATTGGGTGTCTACGTCTCTGAT bottom
[313]  1431 ATCAGAGCTCACTTAGCCCAAT...GACCGCCGAATGAGATATAGAT bottom
[314]   612 ATCAAAATGTGGATAATGATGG...TGGGTGTGTGGGTGTGGTGTGG bottom
List of 1
 $ 1:List of 2
  ..$ top   : num [1:156] 613 2044 2612 2663 2799 ...
  ..$ bottom: num [1:156] 974 5491 6024 12303 15707 ...

DECIPHER documentation built on Nov. 8, 2020, 8:30 p.m.