collapseSeqs: Collapse Overlapping Sequences

View source: R/collapseSeqs.R

collapseSeqsR Documentation

Collapse Overlapping Sequences

Description

The sequences predicted by packSearch often overlap, which may be due to the presence of closely interspersed elements or false TIR identification. In such cases, these elements can be combined using link[GenomicRanges:GRanges-class]{GRanges} in order to collapse overlapping elements, preventing over-estimation of transposon numbers. Also removes duplicate elements that have been generated in the case of multiple searches.

Usage

collapseSeqs(packMatches, Genome)

Arguments

packMatches

A dataframe containing genomic ranges and names referring to sequences to be extracted. This dataframe is in the format produced by coercing a link[GenomicRanges:GRanges-class]{GRanges} object to a dataframe: data.frame(GRanges).

Must contain the following features:

  • start - the predicted element's start base sequence position.

  • end - the predicted element's end base sequence position.

  • seqnames - character string referring to the sequence name in Genome to which start and end refer to.

Genome

A DNAStringSet object containing sequences referred to in packMatches (the object originally used to predict the transposons packSearch).

Value

A set of non-overlapping transposon sequences in the format of the input dataframe.

Author(s)

Jack Gisby

See Also

packSearch, link[GenomicRanges:GRanges-class]{GRanges}

Examples

data(packMatches)
data(arabidopsisThalianaRefseq)

packMatches$start <- 1
packMatches$end <- 10

collapseSeqs(packMatches, arabidopsisThalianaRefseq)


jackgisby/packFinder documentation built on July 19, 2022, 2:25 a.m.