plot_orthotree: Create a phylogenetic tree of shared orthologs

View source: R/plot_orthotree.R

plot_orthotreeR Documentation

Create a phylogenetic tree of shared orthologs

Description

Automatically creates a phylogenetic tree plot annotated with metadata describing how many orthologous genes each species shares with the reference_species ("human" by default).

Usage

plot_orthotree(
  tree = NULL,
  orth_report = NULL,
  species = NULL,
  method = c("babelgene", "homologene", "gprofiler"),
  tree_source = "timetree",
  non121_strategy = "drop_both_species",
  reference_species = "human",
  clades = list(Primates = c("Homo sapiens", "Macaca mulatta"), Eutherians =
    c("Homo sapiens", "Mus musculus", "Bos taurus"), Mammals = c("Homo sapiens",
    "Mus musculus", "Bos taurus", "Ornithorhynchus anatinus", "Monodelphis domestica"),
    Tetrapods = c("Homo sapiens", "Mus musculus", "Gallus gallus", "Anolis carolinensis",
    "Xenopus tropicalis"), Vertebrates = c("Homo sapiens", "Mus musculus",
    "Gallus gallus", "Anolis carolinensis", "Xenopus tropicalis", "Danio rerio"),
    Invertebrates = c("Drosophila melanogaster", 
     "Caenorhabditis elegans")),
  clades_rotate = list(),
  scaling_factor = NULL,
  show_plot = TRUE,
  save_paths = c(tempfile(fileext = ".ggtree.pdf"), tempfile(fileext = ".ggtree.png")),
  width = 15,
  height = width,
  mc.cores = 1,
  verbose = TRUE
)

Arguments

tree

A phylogenetic tree of class phylo. If no tree is provided (NULL) a 100-way multiz tree will be imported from UCSC Genome Browser.

orth_report

An ortholog report from one or more species generated by report_orthologs.

species

Species to include in the final plot. If NULL, then all species from the given database (method) will be included (via map_species), so long as they also exist in the tree.

method

R package to use for gene mapping:

  • "gprofiler" : Slower but more species and genes.

  • "homologene" : Faster but fewer species and genes.

  • "babelgene" : Faster but fewer species and genes. Also gives consensus scores for each gene mapping based on a several different data sources.

tree_source

Can be one of the following:

  • "timetree2022":
    Import and prune the TimeTree >147k species phylogenetic tree. Can also simply type "timetree".

  • "timetree2015":
    Import and prune the TimeTree >50k species phylogenetic tree.

  • "OmaDB":
    Construct a tree from OMA (Orthologous Matrix browser) via the getTaxonomy function. NOTE: Does not contain branch lengths, and therefore may have limited utility.

  • "UCSC":
    Import and prune the UCSC 100-way alignment phylogenetic tree (hg38 version).

  • "<path>":
    Read a tree from a newick text file from a local or remote URL using read.tree.

non121_strategy

How to handle genes that don't have 1:1 mappings between input_species:output_species. Options include:

  • "drop_both_species" or "dbs" or 1 :
    Drop genes that have duplicate mappings in either the input_species or output_species
    (DEFAULT).

  • "drop_input_species" or "dis" or 2 :
    Only drop genes that have duplicate mappings in the input_species.

  • "drop_output_species" or "dos" or 3 :
    Only drop genes that have duplicate mappings in the output_species.

  • "keep_both_species" or "kbs" or 4 :
    Keep all genes regardless of whether they have duplicate mappings in either species.

  • "keep_popular" or "kp" or 5 :
    Return only the most "popular" interspecies ortholog mappings. This procedure tends to yield a greater number of returned genes but at the cost of many of them not being true biological 1:1 orthologs.

  • "sum","mean","median","min" or "max" :
    When gene_df is a matrix and gene_output="rownames", these options will aggregate many-to-one gene mappings (input_species-to-output_species) after dropping any duplicate genes in the output_species.

reference_species

Reference species.

clades

A named list of clades each containing a character vector of species used to define the respective clade using MRCA.

clades_rotate

A list of clades to rotate (via rotate), each containing a character vector of species used to define the respective clade using MRCA.

scaling_factor

How much to scale y-axis parameters (e.g. offset) by.

show_plot

Whether to print the final tree plot.

save_paths

Paths to save plot to.

width

Saved plot width.

height

Saved plot height.

mc.cores

Number of cores to parallelise different steps with.

verbose

Print messages.

Value

A list containing:

  • plot : Annotated ggtree object.

  • tree : The pruned, standardised phylogenetic tree used in the plot.

  • orth_report : Ortholog reports for each species against the reference_species.

  • metadata : Metadata used in the plot, including silhouette PNG ids from phylopic.

  • clades : Metadata used for highlighting clades.

  • method : method used.

  • reference_species : reference_species used.

  • save_paths : save_paths to plot.

Source

ggtree tutorial

Examples

orthotree <- plot_orthotree(species = c("human","monkey","mouse"))

neurogenomics/orthogene documentation built on Jan. 30, 2024, 4:44 a.m.