packSearch: packFinder Algorithm Pipeline

View source: R/packSearch.R

packSearchR Documentation

packFinder Algorithm Pipeline


General use pipeline function for the Pack-TYPE transposon finding algorithm.


  mismatch = 0,
  tsdMismatch = 0,
  fixed = TRUE



A DNAString object containing the TIR sequence to be searched for.


A DNAStringSet object to be searched.


The maximum edit distance to be considered for TIR matches (indels + substitions). See matchPattern for details.


The maximum element length to be considered, as a vector of two integers. E.g. c(300, 3500)


Integer referring to the length of the flanking TSD region.


An integer referring to the allowable mismatch (substitutions or indels) between a transposon's TSD sequences. matchPattern from Biostrings is used for pattern matching.


Logical that will be passed to the 'fixed' argument of matchPattern. Determines the behaviour of IUPAC ambiguity codes when searching for TIR sequences.


Finds potential pack-TYPE elements based on:

  • Similarity of TIR sequence to tirSeq

  • Proximity of potential TIR sequences

  • Directionality of TIR sequences

  • Similarity of TSD sequences

The algorithm finds potential forward and reverse TIR sequences using identifyTirMatches and their associated TSD sequence via getTsds. The main filtering stage, identifyPotentialPackElements, filters matches to obtain a dataframe of potential PACK elements. Note that this pipeline does not consider the possibility of discovered elements being autonomous elements, so it is recommended to cluster and/or BLAST elements for further analysis. Furthermore, only exact TSD matches are considered, so supplying long sequences for TSD elements may lead to false-negative results.


A dataframe, containing elements identified by thealgorithm. These may be autonomous or pack-TYPE elements. Will contain the following features:

  • start - the predicted element's start base sequence position.

  • end - the predicted element's end base sequence position.

  • seqnames - character string referring to the sequence name in Genome to which start and end refer to.

  • width - the width of the predicted element.

  • strand - the strand direction of the transposable element. This will be set to "*" as the packSearch function does not consider transposons to have a direction - only TIR sequences. Passing the packMatches dataframe to packClust will assign a direction to each predicted Pack-TYPE element.

This dataframe is in the format produced by coercing a link[GenomicRanges:GRanges-class]{GRanges} object to a dataframe: data.frame(GRanges). Downstream functions, such as packClust, use this dataframe to manipulate predicted transposable elements.


This algorithm does not consider:

  • Autonomous elements - autonomous elements will be predicted by this algorithm as there is no BLAST step. It is recommended that, after clustering elements using packClust, the user analyses each group to determine which predicted elements are autonomous and which are likely Pack-TYPE elements. Alternatively, databases such as Repbase ( supply annotations for autonomous transposable elements that can be used to filter autonomous matches.

  • TSD Mismatches - if two TIRs do not have exact matches for their terminal site duplications they will be ignored. Supplying longer TSD sequences will likely lead to a lower false-positive rate, however may also cause a greater rate of false-negative results.

Pattern matching is done via matchPattern.


Jack Gisby

See Also

identifyTirMatches, getTsds, identifyPotentialPackElements, packClust, packMatches, DNAStringSet, DNAString, matchPattern



packMatches <- packSearch(
    elementLength = c(300, 3500),
    tsdLength = 3

jackgisby/packFinder documentation built on July 19, 2022, 2:25 a.m.