General use pipeline function for the Pack-TYPE transposon finding algorithm.
packSearch( tirSeq, Genome, mismatch = 0, elementLength, tsdLength, tsdMismatch = 0, fixed = TRUE )
The maximum edit distance to be considered for TIR
matches (indels + substitions). See
The maximum element length to be considered, as a vector
of two integers. E.g.
Integer referring to the length of the flanking TSD region.
An integer referring to the allowable mismatch
(substitutions or indels) between a transposon's TSD
Logical that will be passed to the 'fixed' argument of
Finds potential pack-TYPE elements based on:
Similarity of TIR sequence to
Proximity of potential TIR sequences
Directionality of TIR sequences
Similarity of TSD sequences
The algorithm finds potential forward and reverse TIR
their associated TSD sequence via
The main filtering stage,
matches to obtain a dataframe of potential PACK elements.
Note that this pipeline does not consider the
possibility of discovered elements being autonomous
elements, so it is recommended to cluster and/or BLAST
elements for further analysis. Furthermore, only exact TSD
matches are considered, so supplying long sequences for
TSD elements may lead to false-negative results.
A dataframe, containing elements identified by thealgorithm. These may be autonomous or pack-TYPE elements. Will contain the following features:
start - the predicted element's start base sequence position.
end - the predicted element's end base sequence position.
seqnames - character string referring to the
sequence name in
Genome to which
end refer to.
width - the width of the predicted element.
strand - the strand direction of the
transposable element. This will be set to "*" as the
packSearch function does not consider
transposons to have a direction - only TIR sequences.
packMatches dataframe to
packClust will assign a direction to
each predicted Pack-TYPE element.
This dataframe is in the format produced by
object to a dataframe:
functions, such as
packClust, use this
dataframe to manipulate predicted transposable elements.
This algorithm does not consider:
Autonomous elements - autonomous elements will
be predicted by this algorithm as there is no BLAST
step. It is recommended that, after clustering
packClust, the user
analyses each group to determine which predicted
elements are autonomous and which are likely
Pack-TYPE elements. Alternatively, databases such as
supply annotations for autonomous transposable
elements that can be used to filter autonomous matches.
TSD Mismatches - if two TIRs do not have exact matches for their terminal site duplications they will be ignored. Supplying longer TSD sequences will likely lead to a lower false-positive rate, however may also cause a greater rate of false-negative results.
Pattern matching is done via
data(arabidopsisThalianaRefseq) packMatches <- packSearch( Biostrings::DNAString("CACTACAA"), arabidopsisThalianaRefseq, elementLength = c(300, 3500), tsdLength = 3 )
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.