The R package rGUIDANCE has many dependencies. Some are external of R (R Core Team 2017), some are internal.
rGUIDANCE uses third-party executables (e.g., RAxML, MAFFT). Following a list with URLs to where these programs can be downloaded. Within the following script only three are needed: MAFFT, RAxML and BLAST (dependency of another R package). For more information please refer to the guidance help page (just type ?guidance).
Functions which utilize any of these programs usually contain the argument "exec" (for executable). exec is the path to where the executable lies on the local computer. The functions in R then parse information towards the program and captures the results back into R.
If you have trouble where to find executables, please refer to guides such as: Windows: https://tiptopsecurity.com/how-to-find-the-executable-file-for-a-program/ Mac: https://www.alvinpoh.com/how-to-find-the-location-of-an-executable-program-in-the-mac-terminal/
rGUIDANCE depends on many R packages. The most important are: Rcpp, ape, ips and foreach. Rcpp allows to out-source heavy computations into C++ code via R (Eddelbuettel and Francois 2011). The score calculations involve many pairwise comparisons between alignments which is slow in R. ape is a basic R package for the "Analyses of Phylogenetics and Evolution" (Paradis, Claude, and Strimmer 2004; Paradis 2012) and was especially used for handling the alternative guide trees. ips hosts interfaces to other software, such as RAxML (ips = Interfaces to Phylogenetic Software). foreach allows easy parallelization of for loops (Microsoft and Weston 2017).
The below example involves downloading sequences from NCBI GenBank, which means that the sequences downloaded may have changed when you run the script. Therefore we put the sequences (cluster 0; cl0.fas) as datafile in the R package as an example file. We here sketched out the pipeline described in the rGUIDANCE paper. If you only want to see how guidance works on an alignment and downstream analysis then directly jump to point 2 after installing rGUIDANCE and other packages.
## Install rGUIDANCE install.packages("devtools") devtools::install_github("FranzKrah/rGUIDANCE") ## Load required R packages library(ape) library(ips) library(phytools) library(rGUIDANCE)
library(phylotaR) wd <- '[path-to-directory]' dir.create(wd) ncbi_dr <- "/usr/local/ncbi/blast/bin/" ## path to executable txid <- 44624 # Helvella ID setup(wd = wd, txid = txid, ncbi_dr = ncbi_dr, v = TRUE) parameters_reset(wd = wd, parameters = c('ncps'), values = c(1)) # here number of cores could be increased run(wd = wd) ##Cluster selection all_clusters <- read_phylota(wd) reduced <- drop_by_rank(phylota = all_clusters, rnk = 'species', n = 1) cids <- all_clusters@cids n_taxa <- get_ntaxa(phylota = all_clusters, cid = cids) # drop all the clusters with fewer than x taxa keep <- cids[n_taxa > 10] selected <- drop_clstrs(phylota = all_clusters, cid = keep) # Store first cluster as fasta file write_sqs( phylota = reduced, sid = names(get_txids( phylota = reduced, cid = "0", rnk = 'species')), sq_nm = get_tx_slot( phylota = reduced, txid = get_txids(phylota = reduced, cid = "0", rnk = 'species'), slt_nm = 'scnm'), outfile = paste(wd, "cl0.fas", sep = "/") ) # 28S rRNA
## Load sequences for the first sequence cluster ## This is a file produced with the above code, however, sequences might change. ## To keep the code reproducible, we here provide the option to use our pre-compiled file: fpath <- system.file("extdata", "cl0.fas", package="rGUIDANCE") # pre-downloaded file cl0 <- ape::read.FASTA(fpath) ## If you used the full pipeline: ## (Caution: might cause trouble downstream with this specific example because of updated GenBank): ## cl0 <- read.FASTA(paste(wd, "cl0.fas", sep ="/")) names(cl0) <- gsub(" ", "_", names(cl0)) ## Use GUIDANCE to calculate column score (CS) ## Note: You will need to change the path to where your local MAFFT is g <- guidance(cl0, ncore = 12, msa.exec = "/usr/local/bin/mafft") msa0 <- g@msa sc <- scores(g, "column_raxml", na.rm = FALSE)
## [4] Use alignment and CS for phylogeny inference ## Note: You will need to change the path to where your local RAxML is tr.w <- ips::raxml(msa0, m = "GTRGAMMA", f = "a", N = 100, p = 1234, x = 1234, exec = "/Applications/standard-RAxML-master/raxmlHPC-PTHREADS-AVX", threads = 12, weights = sc$column_raxml, outgroup = "Helvella_aestivalis") ## Note: the precompiled RAxML version did not work here tr <- ips::raxml(msa0, m = "GTRGAMMA", f = "a", N = 100, p = 1234, x = 1234, exec = "/Applications/standard-RAxML-master/raxmlHPC-PTHREADS-AVX", threads = 12, outgroup = "Helvella_aestivalis")
tr.w.ult <- ape::chronos(tr.w$bipartitions) tr.ult <- ape::chronos(tr$bipartitions)
library(phylogram) cutoff <- 70 # threshold of bootstrap value considered "significant" w <- tr.w.ult u <- tr.ult bs_w <- sum(as.numeric(w$node.label), na.rm = TRUE) bs_u <- sum(as.numeric(u$node.label), na.rm = TRUE) w <- ape::ladderize(ips::collapseUnsupportedEdges(w, cutoff = cutoff), right = FALSE) u <- ape::ladderize(ips::collapseUnsupportedEdges(u, cutoff = cutoff), right = FALSE) res_w <- round(Nnode(w)/Ntip(w) * 100) res_u <- round(Nnode(u)/Ntip(u) * 100) w$tip.label <- gsub("Helvella_", "", w$tip.label) u$tip.label <- gsub("Helvella_", "", u$tip.label) w <- phylogram::as.dendrogram(w) u <- phylogram::as.dendrogram(u) u <- dendextend::rotate(u, labels(w)) # ignore warning dndlist <- dendextend::dendlist(u, w) ## Tanglegram dendextend::tanglegram(dndlist, fast = TRUE, margin_inner = 8, lwd = 1, margin_top = 5, columns_width = c(5, 1, 5), cex_main = 1, cex_main_left = 1.5, cex_main_right = 1.5, main = "Helvella\n(saddle fungi)", main_left = paste("No CS \n Sum(BS) = ", bs_u, "\n", res_u, "% resolved"), main_right = paste("With CS\nSum(BS) = ", bs_w, "\n", res_w, "% resolved"))
Dirk Eddelbuettel and Romain Francois (2011). Rcpp: Seamless R and C++ Integration. Journal of Statistical Software, 40(8), 1-18. URL http://www.jstatsoft.org/v40/i08/.
Microsoft and Steve Weston (2017). foreach: Provides Foreach Looping Construct for R. R package version 1.4.4. https://CRAN.R-project.org/package=foreach
Paradis, Emmanuel. 2012. Analysis of Phylogenetics and Evolution with R. Second Edi. New York: Springer.
Paradis, Emmanuel, Julien Claude, and Korbinian Strimmer. 2004. “APE: analyses of phylogenetics and evolution in R language.” Bioinformatics 20: 289–90. doi:10.1093/bioinformatics/btg412.
R Core Team. 2017. “R: A Language and Environment for Statistical Computing.” Vienna, Austria: R Foundation for Statistical Computing. https://cran.r-project.org/.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.