Reads a complete set of result files from a BLAST search and computes distance between all sequences based on the BLAST bit-score.
A text vector of filenames.
A threshold E-value to immediately discard (very) poor BLAST alignments. Default is 1.0.
A logical indicating if textual output should be given to monitor the progress.
Each input file must be a BLAST result file where all proteins of one genome have been
queried against a database of all proteins from another genome. The result files must all have
12 columns of results, i.e. have been produced by the option -outfmt 6 in the BLAST+ software.
The filenames must have the format GID111_vs_GID222.txt and are typically produced by
Setting a small e.value threshold can speed up the computation and reduce the size of the output, but you may loose some alignments that could produce smallish distances for short sequences.
The distance computed is a relative score. If an alignment of query A against hit B has a bit-score of S(A;B), we compute an intermediate distance D(A;B)=1-S(A;B)/S(A;A) where S(A;A) is the bit-score of aligning A against itself. Reversing the search, we also get D(B;A)=1-S(B;A)/S(B;B), where B has been used as query and A is the hit. The final distance is D(A,B)=(D(A;B)+D(B;A))/2. A distance of 0.0 means A and B are identical. The maximum possible distance is 1.0, meaning there is no BLAST hit found either way.
This distance should not be interpreted as lack of identity. A distance of 0.0 means 100% identity, but a distance of 0.25 does not mean 75% identity. It has some resemblance to an evolutinary (raw) distance, but since it is based on protein alignments, the type of mutations plays a significant role, not only the number of mutations.
The function returns a data.frame with columns Sequence.A, Sequence.B and Distance. Each row corresponds to a pair of sequence having at least one BLAST hit between them. All pairs not listed in the output have distance 1.0 between them.
Lars Snipen and Kristian Hovde Liland.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
# Using BLAST result files in this package... prefix <- c("GID1_vs_GID1.txt", "GID1_vs_GID2.txt", "GID1_vs_GID3.txt", "GID2_vs_GID1.txt", "GID2_vs_GID2.txt", "GID2_vs_GID3.txt", "GID3_vs_GID1.txt", "GID3_vs_GID2.txt", "GID3_vs_GID3.txt") xpth <- file.path(path.package("micropan"),"extdata") blast.files <- file.path(xpth,paste(prefix,".xz",sep="")) # We need to uncompress them first... tf <- tempfile(pattern=prefix,fileext=".xz") s <- file.copy(blast.files,tf) tf <- unlist(lapply(tf,xzuncompress)) # Computing pairwise distances blast.distances <- bDist(tf) # ...and cleaning... s <- file.remove(tf) # See also example for blastAllAll
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.