simMatrix: Similarity matrix for BLAST data.

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/simMatrix.R

Description

Function to compute similarity matrix for all-vs-all BLAST results of rDNA sequences generated with standalone BLAST from NCBI or local BLAST implemented in BioEdit.

Usage

1
simMatrix(x, sequence.range = FALSE, Min, Max)

Arguments

x

data.frame with BLAST data; see BLASTdata.

sequence.range

logical: use sequence range.

Min

minimum sequence length.

Max

maximum sequence length.

Details

The given BLAST data is used to compute a similarity matrix using the following algorithm: First, the length of each sequence (LS) comprised in the input data file is extracted. If there is more than one comparison for one sequence including different parts of the respective sequence, that one with maximum base length is chosen. Subsequently, the number of matching bases (mB) is calculated by multiplying two variables comprised in the BLAST output: the identity between sequences (%) and the number of nucleotides divided by 100. The, resulting value is rounded to integer. Furthermore, the similarity is calculated by dividing mB by LS. Finally, the similarity matrix including all sequences is built. If the similarity of a combination is not shown in the BLAST report file (because the similarity was lower than 70%), this comparison is included in the similarity matrix with the result zero.

Value

Similarity matrix.

Author(s)

Fabienne Flessa Fabienne.Flessa@uni-bayreuth.de,
Alexandra Kehl Alexandra.Kehl@uni-tuebingen.de,
Matthias Kohl Matthias.Kohl@stamats.de

References

Standalone Blast download: https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/

Blast News: https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastNews

BioEdit: https://bioedit.software.informer.com/

Persoh, D., Melcher, M., Flessa, F., Rambold, G.: First fungal community analyses of endophytic ascomycetes associated with Viscum album ssp. austriacum and itshost Pinus sylvestris. Fungal Biology 2010 Jul;114(7):585-96.

Flessa, F., Kehl, A., Kohl, M. Analysing diversity and community structures using PCR-RFLP: a new software application. Molecular Ecology Resources 2013 Jul; 13(4):726-33.

See Also

BLASTdata, sim2dist

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
data(BLASTdata)

## without sequence range
## code takes some time
## Not run: 
res <- simMatrix(BLASTdata)

## End(Not run)

## with sequence range
range(BLASTdata$alignment.length)
res1 <- simMatrix(BLASTdata, sequence.range = TRUE, Min = 100, Max = 450)
res2 <- simMatrix(BLASTdata, sequence.range = TRUE, Min = 500)

Example output

Loading required package: RColorBrewer
[1]  11 996

RFLPtools documentation built on Feb. 8, 2022, 5:06 p.m.