phylo_data: Export phylogenetic data
In opm: Analysing Phenotype Microarray and Growth Curve Data

Description Usage Arguments Details Value References See Also Examples

Create entire character matrix (include header and footer) in a file format suitable for exporting phylogenetic data. Return it or write it to a file. This function can also produce HTML tables and text paragraphs suitable for displaying PM data in taxonomic journals such as IJSEM.

  ## S4 method for signature 'OPMD_Listing'
phylo_data(object, html.args = html_args(), run.tidy = FALSE)
  ## S4 method for signature 'OPMS_Listing'
phylo_data(object, html.args = html_args(), run.tidy = FALSE)
  ## S4 method for signature 'XOPMX'
phylo_data(object, as.labels,
    subset = param_names("disc.name"), sep = " ", extract.args = list(),
    join = TRUE, discrete.args = list(range = TRUE, gap = TRUE), ...) 
  ## S4 method for signature 'data.frame'
phylo_data(object, as.labels = NULL,
    subset = "numeric", sep = " ", ...) 
  ## S4 method for signature 'matrix'
phylo_data(object,
    format = opm_opt("phylo.fmt"), outfile = "", enclose = TRUE, indent = 3L,
    paup.block = FALSE, delete = c("none", "uninf", "constant", "ambig"),
    join = FALSE, cutoff = 0, digits = opm_opt("digits"),
    comments = comment(object), html.args = html_args(),
    prefer.char = format == "html", run.tidy = FALSE, ...)

`object`	Data frame, numeric matrix or `OPMS` or `MOPMX` object (with aggregated values). Currently only ‘integer’, ‘logical’, ‘double’ and ‘character’ matrix content is supported. The data-frame and `OPMS` methods first call `extract` and then the matrix method. The methods for `OPMD_Listing` and `OPMS_Listing` objects can be applied to the results of `listing`.
`format`	Character scalar determining the output format, either `epf` (Extended PHYLIP Format), `nexus`, `phylip`, `hennig` or `html`. If NEXUS or ‘Hennig’ format is chosen, a non-empty `comment` attribute will be output together with the data (and appropriately escaped). In case of HTML format, a non-empty `comment` yields the title of the HTML document. EPF or ‘extended PHYLIP’ is sometimes called ‘relaxed PHYLIP’. The main difference between EPF and PHYLIP is that the former can use labels with more than ten characters, but its labels must not contain whitespace. (These adaptations are done automatically with `safe_labels`.)
`outfile`	Character scalar. If a non-empty character scalar, resulting lines are directly written to this file. Otherwise, they are returned.
`enclose`	Logical scalar. Shall labels be enclosed in single quotes? Ignored unless `format` is ‘nexus’.
`indent`	Integer scalar. Indentation of commands in NEXUS format. Ignored unless `format` is ‘nexus’ (and a matter of taste anyway).
`paup.block`	Logical scalar. Append a PAUP* block with selected (recommended) default values? Has no effect unless ‘nexus’ is selected as ‘format’.
`delete`	Character scalar with one of the following values: uninf Columns are removed which are either constant (in the strict sense) or are columns in which some fields contain polymorphisms, and no pairs of fields share no character states. ambig Columns with ambiguities (multiple states in at least one single field) are removed. constant Columns which are constant in the strict sense are removed. `delete` is currently ignored for formats other than HTML, and note that columns become rows in the final HTML output.
`join`	Logical scalar, vector or factor. Unless `FALSE`, rows of `object` are joined together, either according to the row names (if `join` is `TRUE`), or directly according to `join`. This can be used to deal with measurements repetitions for the same organism or treatment.
`cutoff`	Numeric scalar. If joining results in multiple-state characters, they can be filtered by removing all entries with a relative frequency less than ‘cutoff’. Makes not much sense for non-integer numeric data.
`digits`	Numeric scalar. Used for rounding, and thus ignored unless `object` is of mode ‘numeric’.
`comments`	Character vector. Comments to be added to the output (as title if HTML is chosen). Ignored if the output format does not allow for comments. If empty, a default comment is chosen.
`html.args`	List of arguments used to modify the generated HTML. See `html_args` for the supported list elements and their meaning.
`prefer.char`	Logical scalar indicating whether or not to use `NA` as intermediary character state. Has only an effect for ‘logical’ and ‘integer’ characters. A warning is issued if integers are not within the necessary range, i.e. either `0` or `1`.
`run.tidy`	Logical scalar. Filter the resulting HTML through the Tidy program? Ignored unless `format` is `html`. Otherwise, if `TRUE`, it is an error if the Tidy executable is not found.
`as.labels`	Vector of data-frame indexes or `OPMS` metadata entries. See `extract`.
`sep`	Character scalar. See `extract`.
`subset`	Character scalar. For the `OPMS` method, passed to the `OPMS` method of `extract`. For the data-frame method, a selection of column classes to extract.
`extract.args`	Optional list of arguments passed to that method.
`discrete.args`	Optional list of arguments passed from the `OPMS` method to `discrete`. If set to `NULL`, discretisation is turned off. Ignored if stored discretised values are chosen by setting `subset` to `param_names("disc.name")`.
`...`	Optional arguments passed between the methods (i.e., from the other methods to the matrix method) or to `hwrite` from the hwriter package. Note that ‘table.summary’ is set via `html.args` and that ‘page’, ‘x’ and ‘div’ cannot be used.

Exporting PM data in such formats allows one to either infer trees from the data under the maximum-likelihood and/or the maximum-parsimony criterion, or to reconstruct the evolution of PM characters on given phylogenetic trees, or to nicely display the data in HTML format.

For exporting NEXUS format, the matrix should normally be converted beforehand by applying discrete. Exporting HTML is optimised for data discretised with gap set to TRUE. For other data, the character.states argument should be modified, see html_args. The hennig (Hennig86) format is the one used by TNT; it allows continuous characters to be analysed as such. Regarding the meaning of ‘character’ as used here, see the ‘Details’ section of discrete.

The generated HTML is guaranteed to produce neither errors nor warnings if checked using the Tidy program. It deliberately contains no formatting instructions but a rich annotation with ‘class’ attributes which allows for CSS-based formatting. This annotation includes the naming of all sections and all kinds of textual content. Whether the characters show differences between at least one organism and the others is also indicated. For the CSS files that come with the package, see the examples below and opm_files.

Character vector, each element representing a line in a potential output file, returned invisibly if outfile is given.

Berger, S. A., Stamatakis, A. 2010 Accuracy of morphology-based phylogenetic fossil placement under maximum likelihood. 8th ACS/IEEE International Conference on Computer Systems and Applications (AICCSA-10). Hammamet, Tunisia [analysis of phenotypic data with RAxML].

Felsenstein, J. 2005 PHYLIP (Phylogeny Inference Package) version 3.6. Distributed by the author. Seattle: University of Washington, Department of Genome Sciences [the PHYLIP program].

Goloboff, P.A., Farris, J.S., Nixon, K.C. 2008 TNT, a free program for phylogenetic analysis. Cladistics 24, 774–786 [the TNT program].

Goloboff, P.A., Mattoni, C., Quinteros, S. 2005 Continuous characters analysed as such. Cladistics 22, 589–601.

Maddison, D. R., Swofford, D. L., Maddison, W. P. 1997 Nexus: An extensible file format for systematic information. Syst Biol 46, 590–621 [the NEXUS format].

Stamatakis, A. 2006 RAxML-VI-HPC: Maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models Bioinformatics 22, 2688–2690. [the RAxML program].

Swofford, D. L. 2002 PAUP*: Phylogenetic Analysis Using Parsimony (*and Other Methods), Version 4.0 b10. Sunderland, Mass.: Sinauer Associates [the PAUP* program].

http://ijs.microbiologyresearch.org/ [IJSEM journal]

http://tidy.sourceforge.net/ [HTML Tidy]

base::comment base::write hwriter::hwrite

Other phylogeny-functions: html_args, safe_labels

# simple helper functions
echo <- function(x) write(substr(x, 1, 250), file = "")
is_html <- function(x) is.character(x) &&
  c("<html>", "<head>", "<body>", "</html>", "</head>", "</body>") %in% x
longer <- function(x, y) any(nchar(x) > nchar(y)) &&
  !any(nchar(x) < nchar(y))

## examples with a dummy data set
x <- matrix(c(0:9, letters[1:22]), nrow = 2)
colnames(x) <- LETTERS[1:16]
rownames(x) <- c("Ahoernchen", "Behoernchen") # Chip and Dale in German

# EPF is a comparatively restricted format
echo(y.epf <- phylo_data(x, format = "epf"))
stopifnot(is.character(y.epf), length(y.epf) == 3)
stopifnot(identical(y.epf, phylo_data(as.data.frame(x), subset = "factor",
  format = "epf")))

# PHYLIP is even more restricted (shorter labels!)
echo(y.phylip <- phylo_data(x, format = "phylip"))
stopifnot((y.epf == y.phylip) == c(TRUE, FALSE, FALSE))

# NEXUS allows for more content; note the comment and the character labels
echo(y.nexus <- phylo_data(x, format = "nexus"))
nexus.len.1 <- length(y.nexus)
stopifnot(is.character(y.nexus), nexus.len.1 > 10)

# adding a PAUP* block with (hopefully useful) default settings
echo(y.nexus <- phylo_data(x, format = "nexus", paup.block = TRUE))
stopifnot(is.character(y.nexus), length(y.nexus) > nexus.len.1)

# adding our own comment
comment(x) <- c("This is", "a test") # yields two lines
echo(y.nexus <- phylo_data(x, format = "nexus"))
stopifnot(identical(length(y.nexus), nexus.len.1 + 1L))

# Hennig86/TNT also includes the comment
echo(y.hennig <- phylo_data(x, format = "hennig"))
hennig.len.1 <- length(y.hennig)
stopifnot(is.character(y.hennig), hennig.len.1 > 10)

# without an explicit comment, the default one will be used
comment(x) <- NULL
echo(y.hennig <- phylo_data(x, format = "hennig"))
stopifnot(identical(length(y.hennig), hennig.len.1 - 1L))

## examples with real data and HTML

# setting the CSS file that comes with opm as default
opm_opt(css.file = opm_files("css")[[1]])

# see discrete() for the conversion and note the OPMS example below: one
# could also get the results directly from OPMS objects
x <- extract(vaas_4[, , 1:10], as.labels = list("Species", "Strain"),
  in.parens = FALSE)
x <- discrete(x, range = TRUE, gap = TRUE)
echo(y <- phylo_data(x, format = "html",
  html.args = html_args(organisms.start = "Strains: ")))
# this yields HTML with the usual tags, a table legend, and the table itself
# in a single line; the default 'organisms.start' could also be used
stopifnot(is_html(y))

# now with joining of the results per species (and changing the organism
# description accordingly)
x <- extract(vaas_4[, , 1:10], as.labels = list("Species"),
  in.parens = FALSE)
x <- discrete(x, range = TRUE, gap = TRUE)
echo(y <- phylo_data(x, format = "html", join = TRUE,
  html.args = html_args(organisms.start = "Species: ")))
stopifnot(is_html(y))
# Here and in the following examples note the highlighting of the variable
# (uninformative or informative) characters. The uninformative ones are those
# that are not constant but show overlap regarding the sets of character
# states between all organisms. The informative ones are those that are fully
# distinct between all organisms.

# 'OPMS' method, yielding the same results than above but directly
echo(yy <- phylo_data(vaas_4[, , 1:10], as.labels = "Species",
  format = "html", join = TRUE, extract.args = list(in.parens = FALSE),
  html.args = html_args(organisms.start = "Species: ")))
# the timestamps might differ, but otherwise the result is as above
stopifnot(length(y) == length(yy) && length(which(y != yy)) < 2)

# appending user-defined sections
echo(yy <- phylo_data(vaas_4[, , 1:10], as.labels = "Species",
  format = "html", join = TRUE, extract.args = list(in.parens = FALSE),
  html.args = html_args(organisms.start = "Species: ",
  append = list(section.1 = "additional text", section.2 = "more text"))))
stopifnot(length(y) < length(yy), length(which(!y %in% yy)) < 2)
# note the position -- there are also 'prepend' and 'insert' arguments

# effect of deletion
echo(y <- phylo_data(x, "html", delete = "none", join = FALSE))
echo(y.noambig <- phylo_data(x, "html", delete = "ambig", join = FALSE))
stopifnot(length(which(y != y.noambig)) < 2) # timestamps might differ
# ambiguities are created only by joining
echo(y <- phylo_data(x, "html", delete = "none", join = TRUE))
echo(y.noambig <- phylo_data(x, "html", delete = "ambig", join = TRUE))
stopifnot(longer(y, y.noambig))
echo(y.nouninf <- phylo_data(x, "html", delete = "uninf", join = TRUE))
stopifnot(longer(y, y.nouninf))
echo(y.noconst <- phylo_data(x, "html", delete = "const", join = TRUE))
stopifnot(longer(y.noconst, y.nouninf))

# getting real numbers, not discretised ones
echo(yy <- phylo_data(vaas_4[, , 1:10], as.labels = "Species",
  format = "html", join = TRUE, extract.args = list(in.parens = FALSE),
  subset = "A", discrete.args = NULL,
  html.args = html_args(organisms.start = "Species: ")))
stopifnot(is_html(yy), length(yy) == length(y) - 1) # no symbols list
# the highlighting is also used here, based on the following heuristic:
# if mean+/-2*sd does not overlap, the character is informative; else
# if mean+/-sd does not overlap, the character is uninformative; otherwise
# it is constant

# this can also be used for formats other than HTML (but not all make sense)
echo(yy <- phylo_data(vaas_4[, , 1:10], as.labels = "Species",
  format = "hennig", join = TRUE, extract.args = list(in.parens = FALSE),
  subset = "A", discrete.args = NULL))
stopifnot(is.character(yy), length(yy) > 10)

## 'OPMD_Listing' method
echo(x <- phylo_data(listing(vaas_1, NULL)))
stopifnot(is.character(x), length(x) == 1)
echo(x <- phylo_data(listing(vaas_1, NULL, html = TRUE)))
stopifnot(is.character(x), length(x) > 1)

## 'OPMS_Listing' method
echo(x <- phylo_data(listing(vaas_4, as.groups = "Species")))
stopifnot(is.character(x), length(x) == 2, !is.null(names(x)))
echo(x <- phylo_data(listing(vaas_4, as.groups = "Species", html = TRUE)))
stopifnot(is.character(x), length(x) > 2, is.null(names(x)))