findPalindromes: Searching a sequence for palindromes

findPalindromesR Documentation

Searching a sequence for palindromes

Description

The findPalindromes function can be used to find palindromic regions in a sequence.

palindromeArmLength, palindromeLeftArm, and palindromeRightArm are utility functions for operating on palindromic sequences. They should typically be used on the output of findPalindromes.

Usage

findPalindromes(subject, min.armlength=4,
                max.looplength=1, min.looplength=0, max.mismatch=0,
                allow.wobble=FALSE)

palindromeArmLength(x, max.mismatch=0, allow.wobble=FALSE)
palindromeLeftArm(x, max.mismatch=0, allow.wobble=FALSE)
palindromeRightArm(x, max.mismatch=0, allow.wobble=FALSE)

Arguments

subject

An XString object containing the subject string, or an XStringViews object.

min.armlength

An integer giving the minimum length of the arms of the palindromes to search for.

max.looplength

An integer giving the maximum length of "the loop" (i.e the sequence separating the 2 arms) of the palindromes to search for. Note that by default (max.looplength=1), findPalindromes will search for strict palindromes only.

min.looplength

An integer giving the minimum length of "the loop" of the palindromes to search for.

max.mismatch

The maximum number of mismatching letters allowed between the 2 arms of the palindromes to search for.

allow.wobble

Logical indicating whether wobble base pairs (G/U or G/T base pairings) should be treated as mismatches (the default) or matches.

x

An XString object containing a 2-arm palindrome, or an XStringViews object containing a set of 2-arm palindromes.

Details

The findPalindromes function finds palindromic substrings in a subject string. The palindromes that can be searched for are either strict palindromes or 2-arm palindromes (the former being a particular case of the latter) i.e. palindromes where the 2 arms are separated by an arbitrary sequence called "the loop".

If the subject string is a nucleotide sequence (i.e. DNA or RNA), the 2 arms must contain sequences that are reverse complement from each other. Otherwise, they must contain sequences that are the same.

Value

findPalindromes returns an XStringViews object containing all palindromes found in subject (one view per palindromic substring found).

palindromeArmLength returns the arm length (integer) of the 2-arm palindrome x. It will raise an error if x has no arms. Note that any sequence could be considered a 2-arm palindrome if we were OK with arms of length 0 but we are not: x must have arms of length greater or equal to 1 in order to be considered a 2-arm palindrome. When applied to an XStringViews object x, palindromeArmLength behaves in a vectorized fashion by returning an integer vector of the same length as x.

palindromeLeftArm returns an object of the same class as the original object x and containing the left arm of x.

palindromeRightArm does the same as palindromeLeftArm but on the right arm of x.

Like palindromeArmLength, both palindromeLeftArm and palindromeRightArm will raise an error if x has no arms. Also, when applied to an XStringViews object x, both behave in a vectorized fashion by returning an XStringViews object of the same length as x.

Author(s)

H. Pagès, with contributions from Erik Wright and Thomas McCarthy

See Also

maskMotif, matchPattern, matchLRPatterns, matchProbePair, XStringViews-class, DNAString-class

Examples

x0 <- BString("abbbaabbcbbaccacabbbccbcaabbabacca")

pals0a <- findPalindromes(x0, min.armlength=3, max.looplength=5)
pals0a
palindromeArmLength(pals0a)
palindromeLeftArm(pals0a)
palindromeRightArm(pals0a)

pals0b <- findPalindromes(x0, min.armlength=9, max.looplength=5,
                          max.mismatch=3)
pals0b
palindromeArmLength(pals0b, max.mismatch=3)
palindromeLeftArm(pals0b, max.mismatch=3)
palindromeRightArm(pals0b, max.mismatch=3)

## Whitespaces matter:
x1 <- BString("Delia saw I was aileD")
palindromeArmLength(x1)
palindromeLeftArm(x1)
palindromeRightArm(x1)

x2 <- BString("was it a car or a cat I saw")
palindromeArmLength(x2)
palindromeLeftArm(x2)
palindromeRightArm(x2)

## On a DNA or RNA sequence:
x3 <- DNAString("CCGAAAACCATGATGGTTGCCAG")
findPalindromes(x3)
findPalindromes(RNAString(x3))

## Note that palindromes can be nested:
x4 <- DNAString("ACGTTNAACGTCCAAAATTTTCCACGTTNAACGT")
findPalindromes(x4, max.looplength=19)

## Treat wobble base pairings as matches:
x5 <- RNAString("AUGUCUNNNNAGGCGU")
findPalindromes(x5, max.looplength=4, min.looplength=4)
findPalindromes(x5, max.looplength=4, min.looplength=4, max.mismatch=2)
findPalindromes(x5, max.looplength=4, min.looplength=4, allow.wobble=TRUE)

## A real use case:
library(BSgenome.Dmelanogaster.UCSC.dm3)
chrX <- Dmelanogaster$chrX
chrX_pals0 <- findPalindromes(chrX, min.armlength=40, max.looplength=80)
chrX_pals0
palindromeArmLength(chrX_pals0)  # 251 70 262

## Allowing up to 2 mismatches between the 2 arms:
chrX_pals2 <- findPalindromes(chrX, min.armlength=40, max.looplength=80,
                              max.mismatch=2)
chrX_pals2
palindromeArmLength(chrX_pals2, max.mismatch=2)  # 254 77 44 48 40 264

Bioconductor/Biostrings documentation built on Dec. 16, 2024, 8:46 a.m.