phyloregion
with comparable packagesIn this vignette, we benchmark phyloregion
against other similar R
packages in analyses of standard alpha diversity metrics commonly
used in conservation, such as phylogenetic diversity and phylogenetic
endemism as well as metrics for analyzing compositional turnover (e.g.,
beta diversity and phylogenetic beta diversity). Specifically, we
compare phyloregion
's functions with available packages for efficiency
in memory allocation and computation speed in various biogeographic analyses.
First, load the packages for the benchmarking:
library(ape) library(Matrix) library(bench) library(ggplot2) # packages we benchmark library(phyloregion) library(betapart) library(picante) library(vegan) library(hilldiv) library(BAT) library(pez)
We will use a small data set which comes with phyloregion
. This dataset
consists of a dated phylogeny of the woody plant species of southern
Africa along with their geographical distributions. The dataset comes from a study that maps tree diversity hotspots in southern Africa [@Daru2015ddi].
data(africa) # subset matrix X_sparse <- africa$comm[1:30, ] X_sparse <- X_sparse[, colSums(X_sparse)>0] X_dense <- as.matrix(X_sparse) Xt_dense <- t(X_dense) object.size(X_sparse)
## 76504 bytes
object.size(X_dense)
## 134752 bytes
dim(X_sparse)
## [1] 30 401
To make results comparable, it is often desirable to make sure
that the taxa in different datasets match each other [@Kembel2010].
For example, the community matrix in the hilldiv
package [@hilldiv]
needs to be transposed. These transformations can influence the execution
times of the function, often only marginally.
Thus, to benchmark phyloregion
against other packages, we here use the
package bench
[@bench2020] because it returns execution times and
provides estimates of memory allocations for each computation.
phyloregion
for analysis of phylogenetic diversityFor analysis of alpha diversity commonly used in conservation such as phylogenetic
diversity - the sum of all phylogenetic branch lengths within an area [@Faith1992]
- phyloregion
is 31 to 284 times faster and 67 to 192 times
memory efficient, compared to other packages!
tree <- africa$phylo tree <- keep.tip(tree, colnames(X_sparse)) pd_picante <- function(x, tree){ res <- picante::pd(x, tree)[,1] names(res) <- row.names(x) res } pd_pez <- function(x, tree){ dat <- pez::comparative.comm(tree, x) res <- pez::.pd(dat)[,1] names(res) <- row.names(x) res } pd_hilldiv <- function(x, tree) hilldiv::index_div(x, tree, index="faith") pd_phyloregion <- function(x, tree) phyloregion::PD(x, tree) res1 <- bench::mark(picante=pd_picante(X_dense, tree), hilldiv=pd_hilldiv(Xt_dense,tree=tree), pez=pd_pez(X_dense, tree), phyloregion=pd_phyloregion(X_sparse, tree)) summary(res1)
## # A tibble: 4 × 6 ## expression min median `itr/sec` mem_alloc `gc/sec` ## <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> ## 1 picante 97.85ms 112.45ms 8.22 59.6MB 9.87 ## 2 hilldiv 762.42ms 762.42ms 1.31 170.1MB 3.93 ## 3 pez 108.59ms 122.83ms 8.16 60.4MB 8.16 ## 4 phyloregion 2.82ms 3.23ms 268. 909.8KB 5.99
autoplot(res1)
phyloregion
for analysis of phylogenetic endemismAnother benchmark for phyloregion
is in the analysis of phylogenetic
endemism, the degree to which phylogenetic diversity is restricted to
any given area [@Rosauer2009]. Here, we found that phyloregion
is
160 times faster and 489 times efficient in memory allocation.
tree <- africa$phylo tree <- keep.tip(tree, colnames(X_sparse)) pe_pez <- function(x, tree){ dat <- pez::comparative.comm(tree, x) res <- pez::pez.endemism(dat)[,1] names(res) <- row.names(x) res } pe_phyloregion <- function(x, tree) phyloregion::phylo_endemism(x, tree) res2 <- bench::mark(pez=pe_pez(X_dense, tree), phyloregion=pe_phyloregion(X_sparse, tree)) summary(res2)
## # A tibble: 2 × 6 ## expression min median `itr/sec` mem_alloc `gc/sec` ## <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> ## 1 pez 630.38ms 630.38ms 1.59 493.88MB 9.52 ## 2 phyloregion 2.94ms 3.19ms 280. 1.06MB 3.98
autoplot(res2)
phyloregion
for analysis of taxonomic beta diversityFor analysis of taxonomic beta diversity, which compares diversity between
communities [@Koleff2003], phyloregion
has marginal advantage
over other packages. Nonetheless, it is 1-39 times faster and allocates 2 to 110
times less memory than other packages.
chk_fun <- function(target, current) all.equal(target, current, check.attributes = FALSE) fun_phyloregion <- function(x) as.matrix(phyloregion::beta_diss(x)[[3]]) fun_betapart <- function(x) as.matrix(betapart::beta.pair(x)[[3]]) fun_vegan <- function(x) as.matrix(vegan::vegdist(x, binary=TRUE)) fun_BAT <- function(x) as.matrix(BAT::beta(x, func = "Soerensen")[[1]]) res3 <- bench::mark(phyloregion=fun_phyloregion(X_sparse), betapart=fun_betapart(X_dense), vegan=fun_vegan(X_dense), BAT=fun_BAT(X_dense), check=chk_fun) summary(res3)
## # A tibble: 4 × 6 ## expression min median `itr/sec` mem_alloc `gc/sec` ## <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> ## 1 phyloregion 598.5µs 689µs 1210. 428.2KB 4.39 ## 2 betapart 849.4µs 909µs 1024. 594.1KB 10.3 ## 3 vegan 946.8µs 992µs 958. 1016.1KB 7.01 ## 4 BAT 44.5ms 47ms 20.9 31.8MB 8.95
autoplot(res3)
phyloregion
for analysis of phylogenetic beta diversityFor analysis of phylogenetic turnover (beta-diversity) among communities - the
proportion of shared phylogenetic branch lengths between communities [@Graham2008] - phyloregion
is 300-400 times faster and allocates 100-600 times less memory!
fun_phyloregion <- function(x, tree) phyloregion::phylobeta(x, tree)[[3]] fun_betapart <- function(x, tree) betapart::phylo.beta.pair(x, tree)[[3]] fun_picante <- function(x, tree) 1 - picante::phylosor(x, tree) fun_BAT <- function(x, tree) BAT::beta(x, tree, func = "Soerensen")[[1]] chk_fun <- function(target, current) all.equal(target, current, check.attributes = FALSE) res4 <- bench::mark(picante=fun_picante(X_dense, tree), betapart=fun_betapart(X_dense, tree), BAT=fun_BAT(X_dense, tree), phyloregion=fun_phyloregion(X_sparse, tree), check=chk_fun) summary(res4)
## # A tibble: 4 × 6 ## expression min median `itr/sec` mem_alloc `gc/sec` ## <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> ## 1 picante 2.25s 2.25s 0.444 1.24GB 2.66 ## 2 betapart 2.05s 2.05s 0.489 1.24GB 2.93 ## 3 BAT 1.41s 1.41s 0.712 293.64MB 0.712 ## 4 phyloregion 4.21ms 4.49ms 175. 1.12MB 1.99
autoplot(res4)
Note that for this test, picante
returns a similarity matrix while
betapart
, and phyloregion
return a dissimilarity matrix.
sessionInfo()
## R version 4.2.1 (2022-06-23) ## Platform: x86_64-apple-darwin17.0 (64-bit) ## Running under: macOS Monterey 12.6 ## ## Matrix products: default ## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib ## ## locale: ## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 ## ## attached base packages: ## [1] stats graphics grDevices utils datasets methods base ## ## other attached packages: ## [1] pez_1.2-4 BAT_2.9.2 hilldiv_1.5.1 picante_1.8.2 ## [5] nlme_3.1-157 vegan_2.6-2 lattice_0.20-45 permute_0.9-7 ## [9] betapart_1.5.6 phyloregion_1.0.7 bench_1.1.2 Matrix_1.5-3 ## [13] ape_5.6-2 knitr_1.39 ggplot2_3.3.6 ## ## loaded via a namespace (and not attached): ## [1] utf8_1.2.2 proto_1.0.0 ks_1.13.5 ## [4] tidyselect_1.2.0 htmlwidgets_1.5.4 grid_4.2.1 ## [7] combinat_0.0-8 pROC_1.18.0 animation_2.7 ## [10] munsell_0.5.0 codetools_0.2-18 clustMixType_0.2-15 ## [13] interp_1.1-3 future_1.29.0 withr_2.5.0 ## [16] profmem_0.6.0 colorspace_2.0-3 highr_0.9 ## [19] rstudioapi_0.13 geometry_0.4.6.1 stats4_4.2.1 ## [22] ggsignif_0.6.4 listenv_0.8.0 slam_0.1-50 ## [25] FD_1.0-12.1 nls2_0.3-3 mnormt_2.1.0 ## [28] farver_2.1.1 coda_0.19-4 parallelly_1.32.1 ## [31] vctrs_0.5.0 generics_0.1.3 clusterGeneration_1.3.7 ## [34] ipred_0.9-13 xfun_0.31 itertools_0.1-3 ## [37] fastcluster_1.2.3 R6_2.5.1 doParallel_1.0.17 ## [40] ggbeeswarm_0.6.0 pdist_1.2.1 assertthat_0.2.1 ## [43] scales_1.2.0 nnet_7.3-17 beeswarm_0.4.0 ## [46] rgeos_0.5-9 gtable_0.3.0 globals_0.16.1 ## [49] caper_1.0.1 phangorn_2.9.0 MatrixModels_0.5-1 ## [52] timeDate_4021.106 rlang_1.0.6 FSA_0.9.3 ## [55] scatterplot3d_0.3-41 splines_4.2.1 rstatix_0.7.1 ## [58] rgdal_1.5-30 ModelMetrics_1.2.2.2 broom_1.0.1 ## [61] checkmate_2.1.0 reshape2_1.4.4 abind_1.4-5 ## [64] backports_1.4.1 Hmisc_4.7-1 caret_6.0-93 ## [67] tools_4.2.1 lava_1.7.0 psych_2.2.9 ## [70] lavaan_0.6-12 ellipsis_0.3.2 raster_3.5-21 ## [73] RColorBrewer_1.1-3 proxy_0.4-27 Rcpp_1.0.9 ## [76] plyr_1.8.7 base64enc_0.1-3 progress_1.2.2 ## [79] purrr_0.3.4 prettyunits_1.1.1 ggpubr_0.4.0 ## [82] rpart_4.1.16 deldir_1.0-6 pbapply_1.5-0 ## [85] deSolve_1.34 qgraph_1.9.2 cluster_2.1.3 ## [88] magrittr_2.0.3 data.table_1.14.2 SparseM_1.81 ## [91] mvtnorm_1.1-3 smoothr_0.2.2 hms_1.1.1 ## [94] evaluate_0.15 jpeg_0.1-9 mclust_6.0.0 ## [97] gridExtra_2.3 compiler_4.2.1 tibble_3.1.8 ## [100] maps_3.4.0 KernSmooth_2.23-20 crayon_1.5.1 ## [103] hypervolume_3.0.4 htmltools_0.5.3 mgcv_1.8-40 ## [106] corpcor_1.6.10 Formula_1.2-4 snow_0.4-4 ## [109] tidyr_1.2.0 expm_0.999-6 lubridate_1.8.0 ## [112] DBI_1.1.3 magic_1.6-0 subplex_1.8 ## [115] MASS_7.3-57 ade4_1.7-19 car_3.1-1 ## [118] cli_3.4.1 quadprog_1.5-8 parallel_4.2.1 ## [121] gower_1.0.0 igraph_1.3.4 pkgconfig_2.0.3 ## [124] numDeriv_2016.8-1.1 foreign_0.8-82 sp_1.5-0 ## [127] terra_1.5-21 recipes_1.0.3 foreach_1.5.2 ## [130] pbivnorm_0.6.0 predicts_0.1-3 vipor_0.4.5 ## [133] hardhat_1.2.0 prodlim_2019.11.13 stringr_1.4.0 ## [136] digest_0.6.29 pracma_2.4.2 phytools_1.0-3 ## [139] rcdd_1.5 fastmatch_1.1-3 htmlTable_2.4.1 ## [142] quantreg_5.94 gtools_3.9.3 geiger_2.0.10 ## [145] lifecycle_1.0.3 glasso_1.11 carData_3.0-5 ## [148] maptpx_1.9-7 fansi_1.0.3 pillar_1.8.0 ## [151] fastmap_1.1.0 plotrix_3.8-2 survival_3.3-1 ## [154] glue_1.6.2 fdrtool_1.2.17 png_0.1-7 ## [157] iterators_1.0.14 class_7.3-20 stringi_1.7.8 ## [160] palmerpenguins_0.1.1 doSNOW_1.0.20 latticeExtra_0.6-30 ## [163] dplyr_1.0.9 e1071_1.7-11 future.apply_1.10.0
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.