identifyPotentialPackElements: Pack Element Filtering
In packFinder: de novo Annotation of Pack-TYPE Transposable Elements

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/identifyPotentialPackElements.R

Primary filtering stage for the packSearch algorithm. Identifies potential Pack-TYPE transposable elements based on proximity of matching inverted repeats and equality of TSD sequences.

identifyPotentialPackElements(
  forwardMatches,
  reverseMatches,
  Genome,
  elementLength,
  tsdMismatch = 0
)

`forwardMatches`	A dataframe containing genomic ranges and names referring to forwards-facing TIR sequences and their respective TSD sequences.
`reverseMatches`	A dataframe containing genomic ranges and names referring to reverse-facing TIR sequences and their respective TSD sequences.
`Genome`	A DNAStringSet object containing the matches referred to in `forwardMatches` and `reverseMatches`
`elementLength`	A vector of two integers containing the minimum and maximum transposable element length.
`tsdMismatch`	An integer referring to the allowable mismatch (substitutions or indels) between a transposon's TSD sequences. `matchPattern` from Biostrings is used for pattern matching.

Used by packSearch as a primariy filtering stage. Identifies matches likely to be transposons based on their TIR region, from identifyTirMatches, and their TSD region, from getTsds. It is recommended to use the general pipeline function packSearch for identification of potential pack elements, however each stage may be called individually. Note that only exact TSD matches are considered, so supplying long sequences for TSD elements may lead to false-negative results.

A dataframe, packMatches, containing the locations of potential Pack-TYPE transposable elements in Genome.

Jack Gisby

packSearch

data(arabidopsisThalianaRefseq)

forwardMatches <- identifyTirMatches(
    Biostrings::DNAString("CACTACAA"),
    arabidopsisThalianaRefseq,
    tsdLength = 3,
    strand = "+"
)

reverseMatches <- identifyTirMatches(
    Biostrings::reverseComplement(Biostrings::DNAString("CACTACAA")),
    arabidopsisThalianaRefseq,
    tsdLength = 3,
    strand = "-"
)

packMatches <- identifyPotentialPackElements(
    forwardMatches, 
    reverseMatches,
    arabidopsisThalianaRefseq, 
    c(300, 3500)
)