tirClust: Analyse TIR Sequences of Pre-clustered Transposable Elements

View source: R/tirClust.R

tirClustR Documentation

Analyse TIR Sequences of Pre-clustered Transposable Elements

Description

Takes transposable elements clustered by VSEARCH, packClust, and produces consensus sequences for the terminal inverted repeats of each. Allows for the visualisation of TIR similarities between clusters for both forward and reverse strands.

Usage

tirClust(
  packMatches,
  Genome,
  tirLength = 25,
  plot = TRUE,
  plotSavePath = NULL,
  k = 5,
  output = "consensus"
)

Arguments

packMatches

A dataframe containing genomic ranges and names referring to sequences to be extracted. This dataframe is in the format produced by coercing a link[GenomicRanges:GRanges-class]{GRanges} object to a dataframe: data.frame(GRanges).

Must contain the following features:

  • start - the predicted element's start base sequence position.

  • end - the predicted element's end base sequence position.

  • seqnames - character string referring to the sequence name in Genome to which start and end refer to.

Genome

A DNAStringSet object containing sequences referred to in packMatches (the object originally used to predict the transposons packSearch).

tirLength

The TIR size to be considered. Consensus sequences will be generated based on the first and last tirLength bases of a transposon.

plot

Argument specifying whether the TIR consensus sequences should be plottted as a dendrogram.

plotSavePath

File path for the dendrogram plot. If unspecified, the dendrogram plot is not saved.

k

The k-mer size to be used for calculating a distance matrix between TIR consensus sequences. See kdistance. Larger word sizes will not be suitable for longer TIR sequences, due to processing time required. Additionally, k must be greater than the TIR sequence length.

output

Controls the output of tirClust. If output is specified as "consensus", the consensus sequences of each TIR cluster will be returned; else, if output is specified as "dendrogram", a dendrogram object will be returned for creation of customisable plots.

Value

If output is specified as "consensus" (default), returns a list of consensus sequences for each cluster specified in packMatches as a DNAStringSet. Else if output is specified as "dendrogram", returns a dendrogram object used to create hierarchical clustering diagrams.

Author(s)

Jack Gisby

See Also

codepackClust, codepackAlign, kdistance, DNAStringSet, as.alignment, packSearch

Examples

data(arabidopsisThalianaRefseq)
data(packMatches)

tirClust(packMatches, arabidopsisThalianaRefseq)


jackgisby/packFinder documentation built on July 19, 2022, 2:25 a.m.