blast_summary: Write a summary for a tabular output from blastn (BLAST+)

Description Usage Arguments Value Column names Programming notes Examples

Description

The BLAST file must originate from blastn with the follwing output format option:

-outfmt"6 qseqid sseqid sacc stitle sscinames staxids sskingdoms sblastnames pident slen length mismatch gapopen qstart qend sstart send evalue bitscore"

It is very important that the columns are in this precise order and no column is missing.

For the formating options see:

What does the function do :

  1. Group all GenBank accession

  2. Obtain taxonomy from GenBank (note the GenBank taxonomy is now in the PR2 database after downloading from ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/new_taxdump/)

  3. Merge back into the BLAST file

  4. Compute a summary with best hit,

The summary file contains several set of columns

  1. The top hit (column with prefix hit_top_)

Usage

1
blast_summary(file_name)

Arguments

file_name

The name of the BLAST file with full path

Value

TRUE if the function has been successful.

The summary table is saved by changing the name of the file by replacing the extension by _summary.tsv.

Column names

The columns for the Blast are named as follows. For the summary a prefix is added

1
2
3
4
5
6
7
8
9
  query_id, hit_id, hit_acc, hit_title, hit_sci_names

  hit_tax_ids, hit_super_kingdoms, hit_blast_names,

  pct_identity, hit_length, alignment_length, mismatches,

  gap_opens, query_start, query_end, hit_start, hit_end,

  evalue, bit_score

Programming notes

The following functions must be used with libary qualifier dplyr:: because they are also in the plyr library

Uses the local version of the PR2 database for faster access (much faster !!)

Examples

1
blast_reformat("C:/BLAST_output.txt")

vaulot/dvutils documentation built on Nov. 20, 2021, 11:01 a.m.