trimLRPatterns: Trim Flanking Patterns from Sequences
In Biostrings: Efficient manipulation of biological strings

Description Usage Arguments Value Author(s) See Also Examples

The trimLRPatterns function trims left and/or right flanking patterns from sequences.

trimLRPatterns(Lpattern = "", Rpattern = "", subject,
               max.Lmismatch = 0, max.Rmismatch = 0,
               with.Lindels = FALSE, with.Rindels = FALSE,
               Lfixed = TRUE, Rfixed = TRUE, ranges = FALSE)

`Lpattern`	The left pattern.
`Rpattern`	The right pattern.
`subject`	An XString object, XStringSet object, or character vector containing the target sequence(s).
`max.Lmismatch`	Either an integer vector of length `nLp = nchar(Lpattern)` representing an absolute number of mismatches (or edit distance if `with.Lindels` is `TRUE`) or a single numeric value in the interval `[0, 1)` representing a mismatch rate when aligning terminal substrings (suffixes) of `Lpattern` with the beginning (prefix) of `subject` following the conventions set by `neditStartingAt`, `isMatchingStartingAt`, etc. When `max.Lmismatch` is `0L` or a numeric value in the interval `[0, 1)`, it is taken as a "rate" and is converted to `as.integer(1:nLp * max.Lmismatch)`, analogous to agrep (which, however, employs `ceiling`). Otherwise, `max.Lmismatch` is treated as an integer vector where negative numbers are used to prevent trimming at the `i`-th location. When an input integer vector is shorter than `nLp`, it is augmented with enough `-1`s at the beginning to bring its length up to `nLp`. Elements of `max.Lmismatch` beyond the first `nLp` are ignored. Once the integer vector is constructed using the rules given above, when `with.Lindels` is `FALSE`, `max.Lmismatch[i]` is the number of acceptable mismatches (errors) between the suffix `substring(Lpattern, nLp - i + 1, nLp)` of `Lpattern` and the first `i` letters of `subject`. When `with.Lindels` is `TRUE`, `max.Lmismatch[i]` represents the allowed "edit distance" between that suffix of `Lpattern` and `subject`, starting at position `1` of `subject` (as in `matchPattern` and `isMatchingStartingAt`). For a given element `s` of the `subject`, the initial segment (prefix) `substring(s, 1, j)` of `s` is trimmed if `j` is the largest `i` for which there is an acceptable match, if any.
`max.Rmismatch`	Same as `max.Lmismatch` but with `Rpattern`, along with `with.Rindels` (below), and its initial segments (prefixes) `substring(Rpattern, 1, i)`. For a given element `s` of the subject, with `nS = nchar(s)`, the terminal segment (suffix) `substring(s, nS - j + 1, nS)` of `s` is trimmed if `j` is the largest `i` for which there is an acceptable match, if any.
`with.Lindels`	If `TRUE`, indels are allowed in the alignments of the suffixes of `Lpattern` with the subject, at its beginning. See the `with.indels` arguments of the `matchPattern` and `neditStartingAt` functions for detailed information.
`with.Rindels`	Same as `with.Lindels` but for alignments of the prefixes of `Rpattern` with the subject, at its end. See the `with.indels` arguments of the `matchPattern` and `neditEndingAt` functions for detailed information.
`Lfixed, Rfixed`	Whether IUPAC extended letters in the left or right pattern should be interpreted as ambiguities (see ?`lowlevel-matching` for the details).
`ranges`	If `TRUE`, then return the ranges to use to trim `subject`. If `FALSE`, then returned the trimmed `subject`.

A new XString object, XStringSet object, or character vector with the "longest" flanking matches removed, as described above.

P. Aboyoun and H. Jaffee

matchPattern, matchLRPatterns, lowlevel-matching, XString-class, XStringSet-class

  Lpattern <- "TTCTGCTTG"
  Rpattern <- "GATCGGAAG"
  subject <- DNAString("TTCTGCTTGACGTGATCGGA")
  subjectSet <- DNAStringSet(c("TGCTTGACGGCAGATCGG", "TTCTGCTTGGATCGGAAG"))

  ## Only allow for perfect matches on the flanks
  trimLRPatterns(Lpattern = Lpattern, subject = subject)
  trimLRPatterns(Rpattern = Rpattern, subject = subject)
  trimLRPatterns(Lpattern = Lpattern, Rpattern = Rpattern, subject = subjectSet)

  ## Allow for perfect matches on the flanking overlaps
  trimLRPatterns(Lpattern = Lpattern, Rpattern = Rpattern, subject = subjectSet,
                 max.Lmismatch = 0, max.Rmismatch = 0)

  ## Allow for mismatches on the flanks
  trimLRPatterns(Lpattern = Lpattern, Rpattern = Rpattern, subject = subject,
                 max.Lmismatch = 0.2, max.Rmismatch = 0.2)
  maxMismatches <- as.integer(0.2 * 1:9)
  maxMismatches
  trimLRPatterns(Lpattern = Lpattern, Rpattern = Rpattern, subject = subjectSet,
                 max.Lmismatch = maxMismatches, max.Rmismatch = maxMismatches)

  ## Produce ranges that can be an input into other functions
  trimLRPatterns(Lpattern = Lpattern, Rpattern = Rpattern, subject = subjectSet,
                 max.Lmismatch = 0, max.Rmismatch = 0, ranges = TRUE)
  trimLRPatterns(Lpattern = Lpattern, Rpattern = Rpattern, subject = subject,
                 max.Lmismatch = 0.2, max.Rmismatch = 0.2, ranges = TRUE)

Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, basename, cbind, colMeans, colSums, colnames,
    dirname, do.call, duplicated, eval, evalq, get, grep, grepl,
    intersect, is.unsorted, lapply, lengths, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, rank, rbind,
    rowMeans, rowSums, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which, which.max, which.min

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following object is masked from 'package:base':

    expand.grid

Loading required package: IRanges
Loading required package: XVector

Attaching package: 'Biostrings'

The following object is masked from 'package:base':

    strsplit

  11-letter "DNAString" instance
seq: ACGTGATCGGA
  13-letter "DNAString" instance
seq: TTCTGCTTGACGT
  A DNAStringSet instance of length 2
    width seq
[1]     6 ACGGCA
[2]     0 
  A DNAStringSet instance of length 2
    width seq
[1]     6 ACGGCA
[2]     0 
  4-letter "DNAString" instance
seq: ACGT
[1] 0 0 0 0 1 1 1 1 1
  A DNAStringSet instance of length 2
    width seq
[1]     6 ACGGCA
[2]     0 
IRanges object with 2 ranges and 0 metadata columns:
          start       end     width
      <integer> <integer> <integer>
  [1]         7        12         6
  [2]        10         9         0
IRanges object with 1 range and 0 metadata columns:
          start       end     width
      <integer> <integer> <integer>
  [1]        10        13         4