simSeq: Simulate sequences.

View source: R/simSeq.R

simSeqR Documentation

Simulate sequences.

Description

Simulate sequences from a given evolutionary tree or an pml object.

Usage

simSeq(x, ...)

## S3 method for class 'phylo'
simSeq(x, l = 1000, Q = NULL, bf = NULL,
  rootseq = NULL, type = "DNA", model = NULL, levels = NULL,
  rate = 1, ancestral = FALSE, code = 1, ...)

## S3 method for class 'pml'
simSeq(x, ancestral = FALSE, ...)

Arguments

x

a phylogenetic tree tree with edge lengths, i.e. an object of class phylo or and object of class pml.

...

Further arguments passed to or from other methods.

l

The length of the sequence to simulate.

Q

Either a numeric matrix of size Nstates × Nstates, giving the transition rates between states or a vector representing the lower triangular of these matrix (see details).

bf

Base frequencies.

rootseq

A vector of length l containing the root sequence. If not provided, the root sequence is randomly generated.

type

Type of sequences ("DNA", "AA", "CODON" or "USER").

model

Amino acid model of evolution to employ, for example "WAG", "JTT", "Dayhoff" or "LG". For a full list of supported models, type phangorn:::.aamodels. Ignored if type is not equal to "AA".

levels

A character vector of the different character tokens. Ignored unless type = "USER".

rate

A numerical value greater than zero giving the mutation rate or scaler for edge lengths.

ancestral

Logical specifying whether to return ancestral sequences.

code

The ncbi genetic code number for translation (see details). By default the standard genetic code is used.

Details

simSeq is a generic function to simulate sequence alignments along a phylogeny. It is quite flexible and can generate DNA, RNA, amino acids, codon, morphological or binary sequences. simSeq can take as input a phylogenetic tree of class phylo, or a pml object; it will return an object of class phyDat. There is also a more low level version, which lacks rate variation, but one can combine different alignments with their own rates (see example). The rate parameter acts like a scaler for the edge lengths.

For codon models type="CODON", two additional arguments dnds for the dN/dS ratio and tstv for the transition transversion ratio can be supplied.

So far simSeq is limited to time reversible models. simSeq will normalize the rate matrix A composed from Q and bf so that every row of A sums to zero and expected rate is one, see formulas 13.14 and 13.15 on page 205 in Felsenstein (2004). The edge lengths are should represent the expected number of mutations per site.

Defaults:

If x is a tree of class phylo, then sequences will be generated with the default Jukes-Cantor DNA model ("JC").

If bf is not specified, then all states will be treated as equally probable.

If Q is not specified, then a uniform rate matrix will be employed.

Value

simSeq returns an object of class phyDat.

Author(s)

Klaus Schliep klaus.schliep@gmail.com

References

Felsenstein, J. (2004). Inferring Phylogenies. Sinauer Associates, Sunderland.

See Also

phyDat, pml, SOWH.test

Examples


## Not run: 
data(Laurasiatherian)
tree <- nj(dist.ml(Laurasiatherian))
fit <- pml(tree, Laurasiatherian, k=4)
fit <- optim.pml(fit, optNni=TRUE, model="GTR", optGamma=TRUE)
data <- simSeq(fit)

## End(Not run)


tree <- rtree(5)
plot(tree)
nodelabels()

# Example for simple DNA alignment
data <- simSeq(tree, l = 10, type="DNA", bf=c(.1,.2,.3,.4), Q=1:6,
               ancestral=TRUE)
as.character(data)


# Example to simulate discrete Gamma rate variation
rates <- discrete.gamma(1,4)
data1 <- simSeq(tree, l = 100, type="AA", model="WAG", rate=rates[1])
data2 <- simSeq(tree, l = 100, type="AA", model="WAG", rate=rates[2])
data3 <- simSeq(tree, l = 100, type="AA", model="WAG", rate=rates[3])
data4 <- simSeq(tree, l = 100, type="AA", model="WAG", rate=rates[4])
data <- c(data1,data2, data3, data4)

write.phyDat(data, file="temp.dat", format="sequential", nbcol = -1,
  colsep = "")
unlink("temp.dat")


KlausVigo/phangorn documentation built on Feb. 14, 2025, 8:24 a.m.