Description Usage Arguments Details Value Author(s) References See Also Examples
This function implements the TKF92 model to estimate the pairwise distance from protein sequences.
1 2 3 4 5 6 7 8 |
fasta |
A named list of sequences in vector of characters format.
|
mu |
A numeric value between 0 and 1 or NULL.
It is the death rate per normal link in TKF92 model.
When it is NULL, a joint estimation of |
r |
A numeric value between 0 and 1 or NULL.
It is the success probability of the geometric distribution
for modeling the fragment length in TKF92 model.
When it is NULL, a joint estimation of |
distance |
A numeric value: the PAM distance between two protein sequences. When it is given, TKF92Pair only calculates the negative log-likelihood. |
method |
When mu, r and distance are co-estimated, the optimisation method can be one of "NM", "constrOptim". When the method is "NM", the implementation of "nmsimplex2" from gsl is used. When it is "constrOptim", the implementation of constrained "Nelder-Mead" from R stats is used. This argument is ignored when only distance is being estimated. |
expectedLength |
A numeric object: the expected length of input protein sequences. By default, the average sequence length, 362, from OMA browser is used. |
substModel |
A numeric matrix: the mutation probability from one AA to another AA at PAM distance 1. The order of AA in the matrix should be identical to AACharacterSet. |
substModelBF |
A vector of numeric: the backrgound frequency of AAs. The order of AA in the vector should also be identical to AACharacterSet. |
seq1, seq2 |
A vector of character: the sequences of two proteins to compare. |
skipFailure |
If TRUE, it will skip the failed optimisation of the paired sequences and continue to the next pair. If FALSE, an error will be raised. |
Currently this implementation only supports the normal 20 AAs. Missing or Ambiguous characters are not supported.
The default multidimentional optimisation is "nmsimplex2" from the gsl library http://www.gnu.org/software/gsl/.
The one dimentional optimisation implmentation is the "brent" from gsl library.
A list of matrices are returned: the matrix of estimated distances, the matrix of estimated distance variances, the matrix of negative log-likelihood between the sequences.
Ge Tan
Thorne, J.L., Kishino, H., and Felsenstein, J. (1992). Inching toward reality: an improved likelihood model of sequence evolution. J. Mol. Evol. 34, 3-16.
Gonnet, G.H., Cohen, M.A., and Benner, S.A. (1992). Exhaustive matching of the entire protein sequence database. Science 256, 1443-1445.
AACharacterSet
,
GONNET
, GONNETBF
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
## This example is not tested due to running time > 5s
data(GONNET)
data(GONNETBF)
library(seqinr)
fasta <- read.fasta(file.path(system.file("extdata", package="TKF"),
"pair1.fasta"),
seqtype="AA", set.attributes=FALSE)
## 1D estimation: only distance
TKF92(fasta, mu=0.0006137344, r=0.7016089061,
substModel=GONNET, substModelBF=GONNETBF)
## 3D estimation: joint estimation of distance, mu and r
TKF92(fasta, substModel=GONNET, substModelBF=GONNETBF, method="NM")
TKF92(fasta, substModel=GONNET, substModelBF=GONNETBF, method="constrOptim")
## only apply to a pair of sequences
seq1 <- fasta[[1]]
seq2 <- fasta[[2]]
TKF92Pair(seq1, seq2, mu=0.0006137344, r=0.7016089061,
substModel=GONNET, substModelBF=GONNETBF)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.