identifyPotentialPackElements: Pack Element Filtering

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/identifyPotentialPackElements.R

Description

Primary filtering stage for the packSearch algorithm. Identifies potential Pack-TYPE transposable elements based on proximity of matching inverted repeats and equality of TSD sequences.

Usage

1
2
3
4
5
6
7
identifyPotentialPackElements(
  forwardMatches,
  reverseMatches,
  Genome,
  elementLength,
  tsdMismatch = 0
)

Arguments

forwardMatches

A dataframe containing genomic ranges and names referring to forwards-facing TIR sequences and their respective TSD sequences.

reverseMatches

A dataframe containing genomic ranges and names referring to reverse-facing TIR sequences and their respective TSD sequences.

Genome

A DNAStringSet object containing the matches referred to in forwardMatches and reverseMatches

elementLength

A vector of two integers containing the minimum and maximum transposable element length.

tsdMismatch

An integer referring to the allowable mismatch (substitutions or indels) between a transposon's TSD sequences. matchPattern from Biostrings is used for pattern matching.

Details

Used by packSearch as a primariy filtering stage. Identifies matches likely to be transposons based on their TIR region, from identifyTirMatches, and their TSD region, from getTsds. It is recommended to use the general pipeline function packSearch for identification of potential pack elements, however each stage may be called individually. Note that only exact TSD matches are considered, so supplying long sequences for TSD elements may lead to false-negative results.

Value

A dataframe, packMatches, containing the locations of potential Pack-TYPE transposable elements in Genome.

Author(s)

Jack Gisby

See Also

packSearch

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
data(arabidopsisThalianaRefseq)

forwardMatches <- identifyTirMatches(
    Biostrings::DNAString("CACTACAA"),
    arabidopsisThalianaRefseq,
    tsdLength = 3,
    strand = "+"
)

reverseMatches <- identifyTirMatches(
    Biostrings::reverseComplement(Biostrings::DNAString("CACTACAA")),
    arabidopsisThalianaRefseq,
    tsdLength = 3,
    strand = "-"
)

packMatches <- identifyPotentialPackElements(
    forwardMatches, 
    reverseMatches,
    arabidopsisThalianaRefseq, 
    c(300, 3500)
)

packFinder documentation built on Nov. 8, 2020, 5:24 p.m.