packBlast: Pipeline for BLAST/Classification of PackTYPE Elements

View source: R/packBlast.R

packBlastR Documentation

Pipeline for BLAST/Classification of PackTYPE Elements

Description

Run BLAST against user-specified databases of non-transposon and transposon-relates proteins. Can be used to classify transposons based on their internal sequences.

Usage

packBlast(
  packMatches,
  Genome,
  blastPath,
  protDb,
  autoDb,
  minE = 0.001,
  blastTask = "blastn-short",
  maxHits = 100,
  threads = 1,
  saveFolder = NULL,
  tirCutoff = 100,
  autoCutoff = 1e-05,
  autoLength = 150,
  autoIdentity = 70,
  autoScope = NULL,
  protCutoff = 1e-05,
  protLength = 250,
  protIdentity = 70,
  protScope = 0.3
)

Arguments

packMatches

A dataframe of potential Pack-TYPE transposable elements, in the format given by packSearch. This dataframe is in the format produced by coercing a link[GenomicRanges:GRanges-class]{GRanges} object to a dataframe: data.frame(GRanges). Will be saved as a FASTA file for VSEARCH.

Genome

A DNAStringSet object containing sequences referred to in packMatches (the object originally used to predict the transposons packSearch).

blastPath

Path to the BLAST+ executable, or name of the BLAST+ application for Linux/MacOS users.

protDb

For assigning Pack-TYPE elements. Path to the blast database containing nucleotide or protein sequences to be matched against internal transposon sequences. Can be generated using BLAST+, or with link{makeBlastDb}.

autoDb

For assigning autonomous elements. Path to the blast database containing nucleotide or protein sequences to be matched against internal transposon sequences. Can be generated using BLAST+, or with link{makeBlastDb}.

minE

Blast results with e values greater than the specified cutoff will be ignored. This will be passed to BLASTN and applied to both transposon and non-transposon matches.

blastTask

Type of BLAST+ task, defaults to "blastn-short".

maxHits

Maximum hits returned by BLAST+ per query.

threads

Allowable number of threads to be utilised by BLAST+.

saveFolder

Directory to save BLAST+ results in; defaults to the working directory.

tirCutoff

How many bases to ignore at the terminal ends of the transposons to prevent hits to TIR sequences.

autoCutoff

Blast results for transposon-related elements will be filtered to ignore those with e values above the specified cutoff.

autoLength

Blast results for transposon-related elements containing hits with alignment lengths lower than this value will be ignored

autoIdentity

Blast results for transposon-related elements containing hits with sequence identities lower than this value will be ignored

autoScope

If specified, transposon-related blast results below the specified value will be ignored. Note that the dataframe of transposon matches must also be supplied to calculate scope. Scope is the proportion of the transposon's internal sequence occupied by the BLAST hit.

protCutoff

Blast results for genic/other matches will be filtered to ignore those with e values above the specified cutoff.

protLength

Blast results for genic/other matches containing hits with alignment lengths lower than this value will be ignored

protIdentity

Blast results for genic/other matches containing hits with sequence identities lower than this value will be ignored

protScope

If specified, genic/other blast matches below the specified value will be ignored. Note that the dataframe of transposon matches must also be supplied to calculate scope. Scope is the proportion of the transposon's internal sequence occupied by the BLAST hit.

Value

Returns the original packMatches dataframe, with the addition of a "classification" column containing one of the following values:

  • auto - elements that match known transposases or transposon-related proteins are classified as autonomous elements

  • pack - elements that match other proteins or genic sequences may be classified as Pack-TYPE elements

  • other - elements that generate no significant hits

Author(s)

Jack Gisby

References

For further information, see the NCBI BLAST+ application documentation and help pages (https://www.ncbi.nlm.nih.gov/pubmed/20003500?dopt=Citation).

See Also

blastAnalysis, packSearch, readBlast, blastAnnotate

Examples

## Not run: 
packMatches <- data(packMatches)
Genome <- data(arabidopsisThalianaRefseq)

packBlast(packMatches, Genome, 
    protDb = "C:/data/TAIR10_CDS", 
    autoDb = "C:/data/TAIR10_transposons", 
    blastPath = "C:/blast/bin/blastn.exe")

## End(Not run)


jackgisby/packFinder documentation built on July 19, 2022, 2:25 a.m.