packClust: Cluster Transposons with VSEARCH

View source: R/packClust.R

packClustR Documentation

Cluster Transposons with VSEARCH

Description

Cluster potential pack-TYPE elements by sequence similarity. Resulting groups may be aligned with packAlign, or the clusters may be analysed with tirClust

Usage

packClust(
  packMatches,
  Genome,
  identity = 0.6,
  threads = 1,
  identityDefinition = 2,
  maxWildcards = 0.05,
  strand = "both",
  saveFolder = NULL,
  vSearchPath = "vsearch"
)

Arguments

packMatches

A dataframe of potential Pack-TYPE transposable elements, in the format given by packSearch. This dataframe is in the format produced by coercing a link[GenomicRanges:GRanges-class]{GRanges} object to a dataframe: data.frame(GRanges). Will be saved as a FASTA file for VSEARCH.

Genome

A DNAStringSet object containing sequences referred to in packMatches (the object originally used to predict the transposons packSearch).

identity

The sequence identity of two transposable elements in packMatches required to be grouped into a cluster.

threads

The number of threads to be used by VSEARCH.

identityDefinition

The pairwise identity definition used by VSEARCH. Defaults to 2, the standard VSEARCH definition.

maxWildcards

The maximal allowable proportion of wildcards in the sequence of each match (defaults to 0.05).

strand

The strand direction (+, - or *) to be clustered.

saveFolder

The folder to save output files (uc, blast6out, FASTA)

vSearchPath

When the package is run on windows systems, the location of the VSEARCH executable file must be given; this should be left as default on Linux/MacOS systems.

Value

Saves cluster information, including a uc and blast6out file, to the specified location. Returns the given packMatches dataframe with an additional column, cluster, containing cluster IDs.

Note

In order to cluster sequences using VSEARCH, the executable file must first be installed.

Author(s)

Jack Gisby

References

VSEARCH may be downloaded from https://github.com/torognes/vsearch. See https://www.ncbi.nlm.nih.gov/pubmed/27781170 for further information.

See Also

tirClust, packAlign, readBlast, readUc, filterWildcards, packSearch

Examples

data(arabidopsisThalianaRefseq)
data(packMatches)

# packClust run on a Linux/MacOS system
## Not run: 
    packClust(packMatches, Genome)

## End(Not run)

# packClust run on a Windows system
## Not run: 
    packClust(packMatches, Genome, 
            vSearchPath = "path/to/vsearch/vsearch.exe")

## End(Not run)


jackgisby/packFinder documentation built on July 19, 2022, 2:25 a.m.