GetPseudoIndependentPhyloFiles: Get pseudo-independent cladistic data sets
In graemetlloyd/metatree: Generating Meta-Analytical Phylogenies

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/GetPseudoIndependentPhyloFile.R

Given a directory of metadata files, returns a list of pseudo-independent data sets.

1	GetPseudoIndependentPhyloFiles(xmlwd, exclude.list)

`xmlwd`	The working directory containing the XML files.
`exclude.list`	An optional list of data sets to exclude a priori (see details).

There are many applications for meta-analytical analysis of cladistic data sets (see, for example Wagner 2000, Liow 2007, Hughes et al. 2013, Wright et al. 2016), but a major consideration should always be first establishing an independent compilation of data sets. This function applies a set of criteria laid out in Wright et al. (2016) that primarily focuses on non-independence of data sets due to their repeated reuse and assumes this data is captured in XML files in the same format used by the Metatree function.

Specifically, the function uses parent-child and sibling-sibling relationships between data sets to first identify clusters of non-independent data sets. Then from each cluster it selects (in priority order) the data set with: (i) the most characters, (ii) the most taxa, (iii) the most recent publication date, or (iv) if two or more data sets tie on all three criteria, then simply the first data set alphabetically.

Note that in practice other pruning may be desired, e.g., to exclude data sets of minimum or maximum size (taxa or characters), however this is not automated here. It is also recommended that some data sets never be considered, e.g., because they are molecular when morphology is desired, they concern trace or trackway taxa, they are ontogenetic not phylogenetic, or they concern supertree or metatree characters. These must be curated manually and can be excluded from analysis by using the exclude.list option.

Note also that the term pseudo-independent is applied here as the function only considers data set non-independence through inherited character lists and not taxonomic independence, which may also bias results if not accounted for.

A list of pseudo-independent file names.

Graeme T. Lloyd graemetlloyd@gmail.com

Hughes, M., Gerber, S. and Wills, M. A., 2013. Clades reach highest morphological disparity early in their evolution. Proceeedings of the National Academy of Science U.S.A., 110, 13875–13879.

Liow, L. H., 2007. Lineages with long durations are old and morphologically average: an analysis using multiple datasets. Evolution, 61, 885–901.

Wagner, P. J., 2000. Exhaustion of morphologic character states among fossil taxa. Evolution, 54, 365–386.

Wright, A. M., Lloyd, G. T. and Hillis, D. M., 2016. Modeling character change heterogeneity in phylogenetic analyses of morphology through the use of priors. Systematic Biology, 65, 602-611.

For a more detailed description of how XML files should be formatted see the Metatree, ReadMetatreeXML or WriteMetatreeXML functions.