datelife_search: Get scientific, peer-reviewed information on time of lineage...

View source: R/datelife_search.R

datelife_searchR Documentation

Description

datelife_search is the core DateLife function to find and get all openly available, peer-reviewed scientific information on time of lineage divergence for a set of input taxon names given as a character vector, a newick character string, a phylo or multiPhylo object or as a an already processed datelifeQuery object obtained with make_datelife_query().

Usage

datelife_search(
  input = c("Rhea americana", "Pterocnemia pennata", "Struthio camelus"),
  use_tnrs = FALSE,
  get_spp_from_taxon = FALSE,
  partial = TRUE,
  cache = "opentree_chronograms",
  summary_format = "phylo_all",
  na_rm = FALSE,
  summary_print = c("citations", "taxa"),
  taxon_summary = c("none", "summary", "matrix"),
  criterion = "taxa"
)

Arguments

input

One of the following:

A character vector

With taxon names as a single comma separated starting or concatenated with c().

A phylogenetic tree with taxon names as tip labels

As a phylo or multiPhylo object, OR as a newick character string.

A datelifeQuery object

An output from make_datelife_query().

use_tnrs

Whether to use Open Tree of Life's Taxonomic Name Resolution Service (TNRS) to process input taxon names. Default to TRUE, it corrects misspellings and taxonomic name variations with tnrs_match(), a wrapper of rotl::tnrs_match_names().

get_spp_from_taxon

Whether to search ages for all species belonging to a given taxon or not. Default to FALSE. If TRUE, it must have same length as input. If input is a newick string with some clades it will be converted to a phylo object, and the order of get_spp_from_taxon will match phy$tip.label.

partial

Whether to return or exclude partially matching source chronograms, i.e, those that match some and not all of taxa given in datelife_query. Options are TRUE or FALSE. Defaults to TRUE: return all matching source chronograms.

cache

A character vector of length one, with the name of the data object to cache. Default to "opentree_chronograms", a data object storing Open Tree of Life's database chronograms and other associated information.

summary_format

A character vector of length one, indicating the output format for results of the DateLife search. Available output formats are:

"citations"

A character vector of references where chronograms with some or all of the target taxa are published (source chronograms).

"mrca"

A named numeric vector of most recent common ancestor (mrca) ages of target taxa defined in input, obtained from the source chronograms. Names of mrca vector are equal to citations.

"newick_all"

A named character vector of newick strings corresponding to target chronograms derived from source chronograms. Names of newick_all vector are equal to citations.

"newick_sdm"

Only if multiple source chronograms are available. A character vector with a single newick string corresponding to a target chronogram obtained with SDM supertree method (Criscuolo et al. 2006).

"newick_median"

Only if multiple source chronograms are available. A character vector with a single newick string corresponding to a target chronogram from the median of all source chronograms.

"phylo_sdm"

Only if multiple source chronograms are available. A phylo object with a single target chronogram obtained with SDM supertree method (Criscuolo et al. 2006).

"phylo_median"

Only if multiple source chronograms are available. A phylo object with a single target chronogram obtained from source chronograms with median method.

"phylo_all"

A named list of phylo objects corresponding to each target chronogram obtained from available source chronograms. Names of phylo_all list correspond to citations.

"phylo_biggest"

The chronogram with the most taxa. In the case of a tie, the chronogram with clade age closest to the median age of the equally large trees is returned.

"html"

A character vector with an html string that can be saved and then opened in any web browser. It contains a 4 column table with data on target taxa: mrca, number of taxa, citations of source chronogram and newick target chronogram.

"data_frame"

A 4 column data.frame with data on target taxa: mrca, number of taxa, citations of source chronograms and newick string.

na_rm

If TRUE, it drops rows containing NAs from the datelifeResult patristic matrix; if FALSE, it returns NA where there are missing entries.

summary_print

A character vector specifying the type of summary information to be printed to screen. Options are:

"citations"

Prints references of chronograms where target taxa are found.

"taxa"

Prints a summary of the number of chronograms where each target taxon is found.

"none"

Nothing is printed to screen.

Defaults to c("citations", "taxa"), which displays both.

taxon_summary

A character vector specifying if data on target taxa missing in source chronograms should be added to the output as a "summary" or as a presence/absence "matrix". Default to "none", no information on taxon_summary added to the output.

criterion

Defaults to criterion = "taxa". Used for chronogram summarizing, i.e., obtaining a single summary chronogram from a group of input chronograms. For summarizing approaches that return a single summary tree from a group of phylogenetic trees, it is necessary that the latter form a grove, roughly, a sufficiently overlapping set of taxa between trees, see Ané et al. (2009) \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1007/s00026-009-0017-x")}. In rare cases, a group of trees can have multiple groves. This argument indicates whether to get the grove with the most trees (criterion = "trees") or the most taxa (criterion = "taxa").

Details

If only one taxon name is given as input, get_spp_from_taxon is always set to TRUE.

Value

The output is determined by the argument summary_format:

If summary_format = "citations"

The function returns a character vector of references.

If summary_format = "mrca"

The function returns a named numeric vector of most recent common ancestor (mrca) ages.

If summary_format = "newick_[all, sdm, or median]"

The function returns output chronograms as newick strings.

If summary_format = "phylo_[all, sdm, median, or biggest]"

The function returns output chronograms as phylo or multiPhylo objects.

If ⁠summary_format = "html" or "data_frame"⁠

The function returns a 4 column table with data on mrca ages, number of taxa, references, and output chronograms as newick strings.

Examples

## Not run: 

# For this example, we will set a temp working directory, but you can set
# your working directory as needed:
# we will use the tempdir() function to get a temporary directory:
tempwd <- tempdir()

# Obtain median ages from a set of source chronograms in newick format:
ages <- datelife_search(c(
  "Rhea americana", "Pterocnemia pennata", "Struthio camelus",
  "Mus musculus"
), summary_format = "newick_median")

# Save the tree in the temp working directory in newick format:
write(ages, file = file.path(tempwd, "some.bird.ages.txt"))

# Obtain median ages from a set of source chronograms in phylo format
# Will produce same tree as above but in "phylo" format:
ages.again <- datelife_search(c(
  "Rhea americana", "Pterocnemia pennata", "Struthio camelus",
  "Mus musculus"
), summary_format = "phylo_median")
plot(ages.again)
library(ape)
ape::axisPhylo()
mtext("Time (million years ago)", side = 1, line = 2, at = (max(get("last_plot.phylo",
  envir = .PlotPhyloEnv
)$xx) * 0.5))

# Save "phylo" object in newick format
write.tree(ages.again, file = file.path(tempwd, "some.bird.tree.again.txt"))

# Obtain MRCA ages and target chronograms from all source chronograms
# Generate an htm"l output readable in any web browser:
ages.html <- datelife_search(c(
  "Rhea americana", "Pterocnemia pennata", "Struthio camelus",
  "Mus musculus"
), summary_format = "html")
write(ages.html, file = file.path(tempwd, "some.bird.trees.html"))
system(paste("open", file.path(tempwd, "some.bird.trees.html")))

## End(Not run) # end dontrun

phylotastic/datelife documentation built on June 9, 2024, 6:50 p.m.