packSearch: packFinder Algorithm Pipeline

Description Usage Arguments Details Value Note Author(s) See Also Examples

View source: R/packSearch.R

Description

General use pipeline function for the Pack-TYPE transposon finding algorithm.

Usage

1
2
3
4
5
6
7
8
packSearch(
  tirSeq,
  Genome,
  mismatch = 0,
  elementLength,
  tsdLength,
  tsdMismatch = 0
)

Arguments

tirSeq

A DNAString object containing the TIR sequence to be searched for.

Genome

A DNAStringSet object to be searched.

mismatch

The maximum edit distance to be considered for TIR matches (indels + substitions). See matchPattern for details.

elementLength

The maximum element length to be considered, as a vector of two integers. E.g. c(300, 3500)

tsdLength

Integer referring to the length of the flanking TSD region.

tsdMismatch

An integer referring to the allowable mismatch (substitutions or indels) between a transposon's TSD sequences. matchPattern from Biostrings is used for pattern matching.

Details

Finds potential pack-TYPE elements based on:

The algorithm finds potential forward and reverse TIR sequences using identifyTirMatches and their associated TSD sequence via getTsds. The main filtering stage, identifyPotentialPackElements, filters matches to obtain a dataframe of potential PACK elements. Note that this pipeline does not consider the possibility of discovered elements being autonomous elements, so it is recommended to cluster and/or BLAST elements for further analysis. Furthermore, only exact TSD matches are considered, so supplying long sequences for TSD elements may lead to false-negative results.

Value

A dataframe, containing elements identified by thealgorithm. These may be autonomous or pack-TYPE elements. Will contain the following features:

This dataframe is in the format produced by coercing a link[GenomicRanges:GRanges-class]{GRanges} object to a dataframe: data.frame(GRanges). Downstream functions, such as packClust, use this dataframe to manipulate predicted transposable elements.

Note

This algorithm does not consider:

Pattern matching is done via matchPattern.

Author(s)

Jack Gisby

See Also

identifyTirMatches, getTsds, identifyPotentialPackElements, packClust, packMatches, DNAStringSet, DNAString, matchPattern

Examples

1
2
3
4
5
6
7
8
data(arabidopsisThalianaRefseq)

packMatches <- packSearch(
    Biostrings::DNAString("CACTACAA"),
    arabidopsisThalianaRefseq,
    elementLength = c(300, 3500),
    tsdLength = 3
)

packFinder documentation built on Nov. 8, 2020, 5:24 p.m.