To perform a test of Physcraper functionality, we pruned the Gottlieb et al. tree down ~20%, excluding the outgroups, corresponding to 9 tips. We then performed a Physcraper run to test if we would recover the dropped tips. The input alignment will NOT be automatically trimmed by Physcraper

We performed the following steps:

  1. Pruning the tree
  2. Matching the names
  3. Running Physcraper
  4. Checking the results
  5. Analyzing the results

Pruning the Gottlieb et al. tree

Getting the tree from the Open Tree of Life database

tree6577_pg_2827 <- rotl::get_study_tree(study_id = "pg_2827", tree_id = "tree6577", tip_label = "original_label")
tree6577_pg_2827$tip.label

Identifying the outgroups

outgroup_index <- match(c("Helwingia_chinensis", "Helwingia_japonica"), tree6577_pg_2827$tip.label)

Getting the ingroup tip labels

ingroup_tip.label <- tree6577_pg_2827$tip.label[-outgroup_index]

Calculating the number of tips to keep

ingroup_ntip <- length(ingroup_tip.label)
ingroup_ntip* 0.8

Choosing at random the ingroup tips to drop

set.seed(19022021)
todrop <- sample(ingroup_tip.label, ingroup_ntip-37)
todrop

Dropping the tips from the tree

pruned_tree <- ape::drop.tip(phy = tree6577_pg_2827, tip = todrop)
pruned_tree

Saving the tree as newick and nexus

ape::write.tree(phy = pruned_tree, file = "../data-raw/ilex-test/tree6577_pg_2827_pruned.tre")
ape::write.nexus(pruned_tree, file = "../data-raw/ilex-test/tree6577_pg_2827_pruned.nex")

Matching the tree tip labels

First generate a simple text file containing one tip label per line

write(paste(pruned_tree$tip.label, collapse = "\n"), file = "../data-raw/ilex-test/pruned_tree_tip_labels.txt")

Go to OpenTree's TNRS matching bulk tool

Available at this link https://tree.opentreeoflife.org/curator/tnrs/

Pruning the alignment

Read the alignment

ali_T1281_M2478 <- ape::read.nexus.data(file = "../data-raw/alignments/T1281-M2478.nex")
names(ali_T1281_M2478)

Prune it

in_ali <- !names(ali_T1281_M2478) %in% todrop
ali_T1281_M2478_pruned <- ali_T1281_M2478[in_ali]

Save the pruned alignment

ape::write.nexus.data(x = ali_T1281_M2478_pruned, file = "../data-raw/ilex-test/alignment_T1281_M2478_pruned.nex", interleaved = FALSE)

# dendropy.dataio.nexusreader.InvalidCharacterStateSymbolError: Error parsing data source 'physcraperex/data-raw/ilex-test/alignment_T1281_M2478_pruned.nex' on line 7 at column 33: Unrecognized character state symbol for state alphabet 'DNA' (DnaStateAlphabet) : '/'

Modify the slash added when multistate characters are present in the matrix

I did this by hand, by looking for "/" symbol for example in "atcg/t-tct" and removing it and enclosing the adjacent bases within curly braces to indicate multistate character, "atc{gt}-tct" I found some cases with three different states, for example "atc/g/t-tct", would be "at{cgt}-tct"

Running the test

Locally

physcraper_run.py -tf physcraperex/data-raw/ilex-test/tree6577_pg_2827_pruned.tre -tfs newick -a physcraperex/data-raw/ilex-test/alignment_T1281_M2478_pruned.nex -as nexus -ti physcraperex/data-raw/ilex-test/pruned_tree_tip_labels_mapped/main.json -db /absolute/path/local_blast_db -nt 8 -o physcraperex/data/ilex-test-local-1

Trying it with the otu_info.csv file from a previous run:

physcraper_run.py -tf physcraperex/data-raw/ilex-test/tree6577_pg_2827_pruned.tre -tfs newick -a physcraperex/data-raw/alignments/T1281-M2478.nex -as nexus -ti physcraperex/data/ilex-local/run_T1281-M2478/otu_info_T1281-M2478.json -db /absolute/path/local_blast_db -nt 8 -o physcraperex/data/ilex-test-local

Remotely

physcraper_run.py -tf physcraperex/data-raw/ilex-test/tree6577_pg_2827_pruned.tre -tfs newick -a physcraperex/data-raw/alignments/T1281-M2478.nex -as nexus -ti physcraperex/data-raw/ilex-test/pruned_tree_tip_labels_mapped/main.json -o physcraperex/data/ilex-test-remote

Checking the results of the test

Getting information from the OTU info table

otu_info_T1281_M2478 <- readr::read_delim("../data/ilex-test-local-1/inputs_alignment_T1281_M2478_pruned/otu_info_alignment_T1281_M2478_pruned.csv", "\t", escape_double = FALSE, trim_ws = TRUE)
DT::datatable(otu_info_T1281_M2478)
# Verify that all tips in input tree are in alignment
sum(!otu_info_T1281_M2478$`^physcraper:in_current_aln`)

Looking at the output OTU info file

otu_info_alignment_T1281_M2478_pruned <- readr::read_delim("../data/ilex-test-local-1/outputs_alignment_T1281_M2478_pruned/otu_info_alignment_T1281_M2478_pruned.csv", "\t", escape_double = FALSE, trim_ws = TRUE)
DT::datatable(otu_info_alignment_T1281_M2478_pruned)

Getting the OTT ids of the dropped taxa

todrop_tnrs <- rotl::tnrs_match_names(names = todrop)

Looking for the dropped taxa in the OTU info table

# Using the OTT id:
todrop_tnrs$ott_id %in% otu_info_alignment_T1281_M2478_pruned$`^ot:ottId`

# Using the unique taxon name:
todrop_tnrs$unique_name %in% otu_info_alignment_T1281_M2478_pruned$`^ot:ottTaxonName`

# Getting the taxa in original tree that are not in the updated tree
notinupdated <- todrop_tnrs$unique_name[!todrop_tnrs$ott_id %in% otu_info_alignment_T1281_M2478_pruned$`^ot:ottId`]

The following taxa were in original tree but were not recovered with this Physcraper run:

r paste0("- _", notinupdated, "_", collapse = "\n ")

Why does this happen?

We looked for the NCBI GenBank accession numbers reported in the supplementary material of the original publication for the ITS sequences of each of the missed taxa, Ilex warburgii, I. dimorphophylla and I. percoriacea. The first too have updated the originally reported accession numbers from U92600/U92601 and U92592/U92593 to a single accession number AH007153 and AH007149, respectively. For I. percoriacea, the NCBI accession number reported in the priginal alignment is AH007156, and it remains the same (as of r format(Sys.time(), "%a %b %d, %Y")). Inspecting the sequences, all three contain a middle gap of 100 undetermined nucleotide bases (Ns). These Ns were not present in these sequences in the original alignment. Physcraper drops these sequence because they are unalignable without any manual curation.

Note: updated_taxonname.tree and labelled_tag.tre are exactly the same tree



McTavishLab/physcraperex documentation built on April 10, 2021, 12:02 a.m.