To perform a test of Physcraper functionality, we pruned the Gottlieb et al. tree down ~20%, excluding the outgroups, corresponding to 9 tips. We then performed a Physcraper run to test if we would recover the dropped tips. The input alignment will NOT be automatically trimmed by Physcraper
We performed the following steps:
tree6577_pg_2827 <- rotl::get_study_tree(study_id = "pg_2827", tree_id = "tree6577", tip_label = "original_label") tree6577_pg_2827$tip.label
outgroup_index <- match(c("Helwingia_chinensis", "Helwingia_japonica"), tree6577_pg_2827$tip.label)
ingroup_tip.label <- tree6577_pg_2827$tip.label[-outgroup_index]
ingroup_ntip <- length(ingroup_tip.label) ingroup_ntip* 0.8
set.seed(19022021) todrop <- sample(ingroup_tip.label, ingroup_ntip-37) todrop
pruned_tree <- ape::drop.tip(phy = tree6577_pg_2827, tip = todrop) pruned_tree
ape::write.tree(phy = pruned_tree, file = "../data-raw/ilex-test/tree6577_pg_2827_pruned.tre") ape::write.nexus(pruned_tree, file = "../data-raw/ilex-test/tree6577_pg_2827_pruned.nex")
write(paste(pruned_tree$tip.label, collapse = "\n"), file = "../data-raw/ilex-test/pruned_tree_tip_labels.txt")
Available at this link https://tree.opentreeoflife.org/curator/tnrs/
ali_T1281_M2478 <- ape::read.nexus.data(file = "../data-raw/alignments/T1281-M2478.nex") names(ali_T1281_M2478)
in_ali <- !names(ali_T1281_M2478) %in% todrop ali_T1281_M2478_pruned <- ali_T1281_M2478[in_ali]
ape::write.nexus.data(x = ali_T1281_M2478_pruned, file = "../data-raw/ilex-test/alignment_T1281_M2478_pruned.nex", interleaved = FALSE) # dendropy.dataio.nexusreader.InvalidCharacterStateSymbolError: Error parsing data source 'physcraperex/data-raw/ilex-test/alignment_T1281_M2478_pruned.nex' on line 7 at column 33: Unrecognized character state symbol for state alphabet 'DNA' (DnaStateAlphabet) : '/'
I did this by hand, by looking for "/" symbol for example in "atcg/t-tct" and removing it and enclosing the adjacent bases within curly braces to indicate multistate character, "atc{gt}-tct" I found some cases with three different states, for example "atc/g/t-tct", would be "at{cgt}-tct"
physcraper_run.py -tf physcraperex/data-raw/ilex-test/tree6577_pg_2827_pruned.tre -tfs newick -a physcraperex/data-raw/ilex-test/alignment_T1281_M2478_pruned.nex -as nexus -ti physcraperex/data-raw/ilex-test/pruned_tree_tip_labels_mapped/main.json -db /absolute/path/local_blast_db -nt 8 -o physcraperex/data/ilex-test-local-1
Trying it with the otu_info.csv file from a previous run:
physcraper_run.py -tf physcraperex/data-raw/ilex-test/tree6577_pg_2827_pruned.tre -tfs newick -a physcraperex/data-raw/alignments/T1281-M2478.nex -as nexus -ti physcraperex/data/ilex-local/run_T1281-M2478/otu_info_T1281-M2478.json -db /absolute/path/local_blast_db -nt 8 -o physcraperex/data/ilex-test-local
physcraper_run.py -tf physcraperex/data-raw/ilex-test/tree6577_pg_2827_pruned.tre -tfs newick -a physcraperex/data-raw/alignments/T1281-M2478.nex -as nexus -ti physcraperex/data-raw/ilex-test/pruned_tree_tip_labels_mapped/main.json -o physcraperex/data/ilex-test-remote
otu_info_T1281_M2478 <- readr::read_delim("../data/ilex-test-local-1/inputs_alignment_T1281_M2478_pruned/otu_info_alignment_T1281_M2478_pruned.csv", "\t", escape_double = FALSE, trim_ws = TRUE) DT::datatable(otu_info_T1281_M2478)
# Verify that all tips in input tree are in alignment sum(!otu_info_T1281_M2478$`^physcraper:in_current_aln`)
otu_info_alignment_T1281_M2478_pruned <- readr::read_delim("../data/ilex-test-local-1/outputs_alignment_T1281_M2478_pruned/otu_info_alignment_T1281_M2478_pruned.csv", "\t", escape_double = FALSE, trim_ws = TRUE) DT::datatable(otu_info_alignment_T1281_M2478_pruned)
todrop_tnrs <- rotl::tnrs_match_names(names = todrop)
# Using the OTT id: todrop_tnrs$ott_id %in% otu_info_alignment_T1281_M2478_pruned$`^ot:ottId` # Using the unique taxon name: todrop_tnrs$unique_name %in% otu_info_alignment_T1281_M2478_pruned$`^ot:ottTaxonName` # Getting the taxa in original tree that are not in the updated tree notinupdated <- todrop_tnrs$unique_name[!todrop_tnrs$ott_id %in% otu_info_alignment_T1281_M2478_pruned$`^ot:ottId`]
The following taxa were in original tree but were not recovered with this Physcraper run:
r paste0("- _", notinupdated, "_", collapse = "\n ")
We looked for the NCBI GenBank accession numbers reported in the supplementary material of the original publication for
the ITS sequences of each of the missed taxa,
Ilex warburgii, I. dimorphophylla and I. percoriacea. The first too have updated the originally reported accession numbers from U92600/U92601 and U92592/U92593 to a single accession number AH007153 and
AH007149, respectively.
For I. percoriacea, the NCBI accession number reported in the priginal alignment is AH007156, and it remains the same (as of r format(Sys.time(), "%a %b %d, %Y")
).
Inspecting the sequences, all three contain a middle gap of 100 undetermined nucleotide bases (Ns). These Ns were not present in these sequences in the original alignment. Physcraper drops these sequence because they are unalignable without any manual curation.
Note: updated_taxonname.tree and labelled_tag.tre are exactly the same tree
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.