readBlast: Convert NCBI BLAST+ Files to Dataframe

View source: R/readBlast.R

readBlastR Documentation

Convert NCBI BLAST+ Files to Dataframe

Description

Reads .blast6out files (NCBI Blast Format) generated by the VSEARCH clustering and alignment algorithms.

Usage

readBlast(
  file,
  minE = 1,
  length = 0,
  identity = 0,
  removeExactMatches = FALSE,
  scope = NULL,
  packMatches = NULL
)

Arguments

file

The file path of the blast file.

minE

Blast results with e values greater than the specified cutoff will be ignored.

length

Blast results alignment lengths lower below this value will be ignored

identity

Blast results with target sequence identities below this value will be ignored.

removeExactMatches

If true, matches with 100 be ignored to prevent self-hits.

scope

If specified, blast results below the specified value will be ignored. Note that the dataframe of transposon matches must also be supplied to calculate scope. Scope is the proportion of the transposon's internal sequence occupied by the BLAST hit.

packMatches

taframe containing genomic ranges and names referring to sequences to be extracted. Can be obtained from packSearch or generated from a GRanges object, after conversion to a dataframe. Must contain the following features:

  • start - the predicted element's start base sequence position.

  • end - the predicted element's end base sequence position.

  • seqnames - character string referring to the sequence name in Genome to which start and end refer to.

Details

blast6out file is tab-separated text file compatible with NCBI BLAST m8 and NCBI BLAST+ outfmt 6 formats. One cluster/alignment can be found for each line.

Value

A dataframe containing the converted .blast6out file. The file contains the following features:

  • Query sequence ID

  • Target sequence ID

  • Percenty sequence identity

  • Alignment length

  • Number of mismatches

  • Number of gaps

  • Base position of alignment start in query sequence

  • Base position of alignment end in query sequence

  • Base position of alignment start in target sequence

  • Base position of alignment end in target sequence

  • E-value

  • Bit score

Author(s)

Jack Gisby

References

For further information, see the NCBI BLAST+ application documentation and help pages (https://www.ncbi.nlm.nih.gov/pubmed/20003500?dopt=Citation). VSEARCH may be downloaded from https://github.com/torognes/vsearch; see https://www.ncbi.nlm.nih.gov/pubmed/27781170 for further information.

See Also

codeblastAnalysis, codeblastAnnotate, codepackAlign, codereadUc, codepackClust

Examples

readBlast(system.file(
    "extdata", 
    "packMatches.blast6out", 
    package = "packFinder"
))


jackgisby/packFinder documentation built on July 19, 2022, 2:25 a.m.