dist.dna: Pairwise Distances from DNA Sequences

Description Usage Arguments Details Value Note Author(s) References See Also

View source: R/DNA.R


This function computes a matrix of pairwise distances from DNA sequences using a model of DNA evolution. Eleven substitution models (and the raw distance) are currently available.


dist.dna(x, model = "K80", variance = FALSE,
         gamma = FALSE, pairwise.deletion = FALSE,
         base.freq = NULL, as.matrix = FALSE)



a matrix or a list containing the DNA sequences; this must be of class "DNAbin" (use as.DNAbin is they are stored as character).


a character string specifying the evolutionary model to be used; must be one of "raw", "N", "TS", "TV", "JC69", "K80" (the default), "F81", "K81", "F84", "BH87", "T92", "TN93", "GG95", "logdet", "paralin", "indel", or "indelblock".


a logical indicating whether to compute the variances of the distances; defaults to FALSE so the variances are not computed.


a value for the gamma parameter possibly used to apply a correction to the distances (by default no correction is applied).


a logical indicating whether to delete the sites with missing data in a pairwise way. The default is to delete the sites with at least one missing data for all sequences (ignored if model = "indel" or "indelblock").


the base frequencies to be used in the computations (if applicable). By default, the base frequencies are computed from the whole set of sequences.


a logical indicating whether to return the results as a matrix. The default is to return an object of class dist.


The molecular evolutionary models available through the option model have been extensively described in the literature. A brief description is given below; more details can be found in the references.


an object of class dist (by default), or a numeric matrix if as.matrix = TRUE. If model = "BH87", a numeric matrix is returned because the Barry–Hartigan distance is not symmetric.

If variance = TRUE an attribute called "variance" is given to the returned object.


If the sequences are very different, most evolutionary distances are undefined and a non-finite value (Inf or NaN) is returned. You may do dist.dna(, model = "raw") to check whether some values are higher than 0.75.


Emmanuel Paradis


Barry, D. and Hartigan, J. A. (1987) Asynchronous distance between homologous DNA sequences. Biometrics, 43, 261–276.

Felsenstein, J. (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. Journal of Molecular Evolution, 17, 368–376.

Felsenstein, J. and Churchill, G. A. (1996) A Hidden Markov model approach to variation among sites in rate of evolution. Molecular Biology and Evolution, 13, 93–104.

Galtier, N. and Gouy, M. (1995) Inferring phylogenies from DNA sequences of unequal base compositions. Proceedings of the National Academy of Sciences USA, 92, 11317–11321.

Gu, X. and Li, W.-H. (1996) Bias-corrected paralinear and LogDet distances and tests of molecular clocks and phylogenies under nonstationary nucleotide frequencies. Molecular Biology and Evolution, 13, 1375–1383.

Jukes, T. H. and Cantor, C. R. (1969) Evolution of protein molecules. in Mammalian Protein Metabolism, ed. Munro, H. N., pp. 21–132, New York: Academic Press.

Kimura, M. (1980) A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. Journal of Molecular Evolution, 16, 111–120.

Kimura, M. (1981) Estimation of evolutionary distances between homologous nucleotide sequences. Proceedings of the National Academy of Sciences USA, 78, 454–458.

Jin, L. and Nei, M. (1990) Limitations of the evolutionary parsimony method of phylogenetic analysis. Molecular Biology and Evolution, 7, 82–102.

Lake, J. A. (1994) Reconstructing evolutionary trees from DNA and protein sequences: paralinear distances. Proceedings of the National Academy of Sciences USA, 91, 1455–1459.

Lockhart, P. J., Steel, M. A., Hendy, M. D. and Penny, D. (1994) Recovering evolutionary trees under a more realistic model of sequence evolution. Molecular Biology and Evolution, 11, 605–602.

McGuire, G., Prentice, M. J. and Wright, F. (1999). Improved error bounds for genetic distances from DNA sequences. Biometrics, 55, 1064–1070.

Tamura, K. (1992) Estimation of the number of nucleotide substitutions when there are strong transition-transversion and G + C-content biases. Molecular Biology and Evolution, 9, 678–687.

Tamura, K. and Nei, M. (1993) Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Molecular Biology and Evolution, 10, 512–526.

See Also

read.GenBank, read.dna, write.dna, DNAbin, dist.gene, cophenetic.phylo, dist

ape documentation built on April 25, 2021, 9:06 a.m.