findVDVjunctions: Find the vector-insert junctions at the beginning and end of...

Description Usage Arguments Value Examples

View source: R/findVDVjunctions.R

Description

The function does the following:

  1. At both ends of the read, search for the restriction site at the position where the vector-insert junction is expected to be found

  2. If the restriction site is not found, then search for the vector sequence just adjacent to the restriction site in a slighly wider window

  3. (optional) at these junctions, replace the observed vector sequence by the full (and true) vector sequence

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
findVDVjunctions(
  ReadName = NULL,
  ReadDNA,
  ReadVecAlign,
  RestrictionSite = "G^AATTC",
  VectorSequence = NULL,
  UnalignedVectorLength = 1000L,
  SideSeqSearch = 10L,
  replaceVectorSequence = TRUE
)

Arguments

ReadName

character string. Name of the read

ReadDNA

A DNAString or DNAStringSet with the read sequence

ReadVecAlign

Table with the Blast results from aligning the vector on the read

RestrictionSite

Character string in the for "G^AATTC" indicating the sequence and the cut site

VectorSequence

A DNAString or DNAStringSet of length 1 with the vector sequence starting and ending with the full sequence of the restriction site used for cloning

UnalignedVectorLength

Integer. If more than UnalignedVectorLengthbp of expected vector sequence is not correctly aligned at the vector-insert junction, the function will return a message

SideSeqSearch

Integer. If the expected restriction site is not found at a vector-insert junction then the algorithm will try to search for the vector sequence of length SideSeqSearch bp that is adjacent to the restriction site (Default to 10 bp)

replaceVectorSequence

Logical. If TRUE, the function will replace the vector sequence in the read by the true vector sequence, using the junction has been identified. If no junction is identified, then no sequence is replaced

Value

a list with the following elements:

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# Some dummy sequences (use paste0 for clarity / comparison of sequences):
## Vector sequence start and ends with the HindIII site used for cloning ("A^AGCTT")
vector <- Biostrings::DNAString(paste0(
            "AAGCTTTATTAAGACACCCGGTATGCTTCAGGATCGTTCGGACTAA",
            "ACCGTAACTGCGATATTTTAGGCGTGTTACAAGCTT"))
read <- Biostrings::DNAString(paste0(
            "ACCGTAACTGCGATATTTTAGGCGTGTTACAAGCTT",
            "GCTAGATCGCGCGATATGTG",
            "AAGCTTTATTAAGACACCCGGTATGCTTCAGGATCGTTCGGACTAA"))
noisyread <- Biostrings::DNAString(paste0(
            "ACCGTAACTGCGTTTTTTTAGGCGTGTTACAAGCTT",
            "GCTAAATCGCGCGCTATGTG",
            "GGGCTTTATTAAGACACCCGGTATGCTTTCAGGATCGTTCGGACTAA"))
# Import the blastn results for these sequences (alignment of vector on the reads):
readaln <- readBlast(system.file("extdata",
                                 "juncEx_vec_read.res",
                                 package = "NanoBAC"))
noisyreadaln <- readBlast(system.file("extdata",
                                      "juncEx_vec_noisyread.res",
                                      package = "NanoBAC"))
# Get the coordinates of the insert sequence:
findVDVjunctions("read", read, readaln, "A^AGCTT", vector,
                 replaceVectorSequence = FALSE)
# With the noisy read, the restriction site is not found at
#   the end of the read but an adjacent sequence is:
findVDVjunctions("noisyread", noisyread, noisyreadaln, "A^AGCTT", vector,
                 replaceVectorSequence = FALSE)
# Get the read sequence after replacing the vector sequence
#   by the full vector sequence on both sides
findVDVjunctions("noisyread", noisyread, noisyreadaln, "A^AGCTT", vector,
                 replaceVectorSequence = TRUE)$correctedRead

pgpmartin/NanoBAC documentation built on Dec. 11, 2020, 9:51 a.m.