Description Usage Arguments Value Column names Programming notes Examples
The BLAST file must originate from blastn with the follwing output format option:
-outfmt"6 qseqid sseqid sacc stitle sscinames staxids sskingdoms sblastnames pident slen length mismatch gapopen qstart qend sstart send evalue bitscore"
It is very important that the columns are in this precise order and no column is missing.
For the formating options see:
What does the function do : 0. Remove any self hit
Group all GenBank accession
Obtain taxonomy from GenBank (note the GenBank taxonomy is now in the PR2 database after downloading from ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/new_taxdump/)
Check if the sequence is in PR2 and get the PR2 taxo (this is done with the pr2 database package)
Detect whether the Subject sequence is uncultured or not
Merge back into the BLAST file
Compute a summary with best hit, best hit to PR2, best hit to cultured, taxo consensus (identity>96%), contradiction at division level (identity>90%)
The modified BLAST output file includes additional columns
kingdom -> species : PR2 taxonomy for those accession numbers that are present in PR2
hit_rank : the rank of the hit based on decreasing % identity and decreasing bit scores
uncultured : TRUE if the hit corresponds to an uncultured item
hit_lineage : GenBank taxonomy of the hit
The summary file contains several set of columns
The top hit (column with prefix hit_top_)
The top hit for which a PR2 sequence is available (columns starting with hit_pr2_)
The top hit corresponding to a culture or an isolate (columns starting with hit_cul)
A "consensus" taxonomy based on all the hits with more than 98\
Contradiction between hits >90\
1 | blast_18S_reformat(file_name)
|
file_name |
The name of the BLAST file with full path |
TRUE if the function has been successful.
The modified table is saved by changing the name of the file by replacing the extension by _pr2.tsv.
The summary table is saved by changing the name of the file by replacing the extension by _summary.tsv.
The columns for the Blast are named as follows. For the summary a prefix is added
1 2 3 4 5 6 7 8 9 10 11 | hit_top_ / hit_pr2_ / hit_cult_
query_id, hit_id, hit_acc, hit_title, hit_sci_names
hit_tax_ids, hit_super_kingdoms, hit_blast_names,
pct_identity, hit_length, alignment_length, mismatches,
gap_opens, query_start, query_end, hit_start, hit_end,
evalue, bit_score
|
The following functions must be used with libary qualifier dplyr:: because they are also in the plyr library
ungroup
desc
rename
Uses the pr2database package for faster access (much faster !!)
1 | blast_18S_reformat("C:/BLAST_output.tsv")
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.