TKF91: Evolutionary distance estimation with TKF91 model

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

This function implements the TKF91 model to estimate the pairwise distance from protein sequences.

Usage

1
2
3
4
TKF91(fasta, mu=NULL, expectedLength=362, 
      substModel, substModelBF)
TKF91Pair(seq1, seq2, mu=NULL, distance=NULL,
          expectedLength=362, substModel, substModelBF)

Arguments

fasta

A named list of sequences in vector of characters format. read.fasta from package seqinr outputs this format when reading from a fasta file.

mu

A numeric value or NULL. It is the death rate per normal link in TKF91 model. When it is NULL, a joint estimation of mu and distance will be done. When it is given, only the distance will be estimated.

distance

A numeric value: the PAM distance between two protein sequences. When it is given, TKF91Pair only calculates the negative log-likelihood.

expectedLength

A numeric object: the expected length of input protein sequences. By default, the average sequence length, 362, from OMA browser is used.

substModel

A numeric matrix: the mutation probability from one AA to another AA at PAM distance 1. The order of AA in the matrix should be identical to AACharacterSet.

substModelBF

A vector of numeric: the backrgound frequency of AAs. The order of AA in the vector should also be identical to AACharacterSet.

seq1, seq2

A vector of character: the sequences of two proteins to compare.

Details

Currently this implementation only supports the normal 20 AAs. Missing or Ambiguous characters are not supported.

Value

A list of matrices are returned: the matrix of estimated distances, the matrix of estimated distance variances, the matrix of negative log-likelihood between the sequences.

Author(s)

Ge Tan

References

Thorne, J.L., Kishino, H., and Felsenstein, J. (1991). An evolutionary model for maximum likelihood alignment of DNA sequences. J. Mol. Evol. 33, 114-124.

Gonnet, G.H., Cohen, M.A., and Benner, S.A. (1992). Exhaustive matching of the entire protein sequence database. Science 256, 1443-1445.

See Also

AACharacterSet, GONNET, GONNETBF

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
  
    ## This example is not tested due to running time > 5s
  data(GONNET)
  data(GONNETBF)
  library(seqinr)
  fasta <- read.fasta(file.path(system.file("extdata", package="TKF"),
                      "pair1.fasta"),
                      seqtype="AA", set.attributes=FALSE)
  ## 1D estimation: only distance
  TKF91(fasta, mu=5.920655e-04, 
        substModel=GONNET, substModelBF=GONNETBF)
  ## 2D estimation: joint estimation of distance and mu
  TKF91(fasta, substModel=GONNET, substModelBF=GONNETBF)
  ## only apply to a pair of sequences
  seq1 <- fasta[[1]]
  seq2 <- fasta[[2]]
  TKF91Pair(seq1, seq2, mu=5.920655e-04, 
            substModel=GONNET, substModelBF=GONNETBF)
  

TKF documentation built on May 2, 2019, 7:59 a.m.

Related to TKF91 in TKF...