knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.align = "center", fig.height = 5, fig.width = 7 )
In this vignette we provide a set of best practices for sensitivity analyses that should be taken into consideration when conducting data formatting and model fitting in the DAISIE and DAISIEprep framework. Sensitivty analyses is the testing of a model output to variations in the model input. A model is considered sensitive if the model output vastly changes due to relatively small perturbations in the data input into the model. These small perturbations may be due to uncertainty in the data (i.e. a posterior distribution of possible branching times in a phylogeny), measurement error, or other factors. In the case we are interested in, these perturbations are variation in island colonisation and branching times, endemicity status on the island and number of colonisations events. Each of these can change if using multiple phylogenies from the posterior distribution of inferred phylogenies, or using different extraction algorithms in DAISIEprep.
Firstly we discuss thetype of variations in the data and how each of these can impact parameter estimation. Another aspect of sensitivity analysis, which we will not explore in this vignette, is the sensitivity of model selection to input data variation. By this we mean the best-fit model and the ranking of the model (by likelihood, AIC, BIC, or other metric) may change given differences in the data. This can be equally as important as sensitivity of parameter estimates and we recommend users check model selection as well as parameter estimates in their work.
library(DAISIEprep)
The DAISIEprep
R package provides the tools to extract phylogenetic community
data from phylogenetic trees with the endemicity status of the species assigned
to each tip in the phylogeny. However, there is not a single correct method for
extracting the data and thus DAISIEprep
implements several algorithms to
account for variations in what would be considered an appropriate assumptions
for the island system of interest. The two major divisions for extracting data
are in the extraction_method
argument in extract_island_species()
(and
by extension extract_multi_island_species()
), which can be either "min"
for
the minimum time of colonisation algorithm, or "asr"
for the geographical
ancestral state reconstruction algorithm. The "min"
algorithm conforms to the
assumptions of the DAISIE inference model (implemented in the DAISIE
R package). These
assumptions are:
The all points are linked, by not allowing back-colonisation, a species on the islands
cannot be endemic to the island (i.e. part of an endemic island radiation) and
then migrate or expand its range away from the island. These three points mean
that if the island system of interest has experienced back-colonisation or a
species in an endemic island radiation has expanded its range off the island and
its island population has gone extinct (making it not present on the island but
extant) the "min"
algorithm will split clades into multiple colonisations. In
the case that the island of interest is very remote and species colonisation and
diversify and do not disperse off the island, this algorithm provides a simple
model to extract the data.
However, it is clear that this common assumptions of the DAISIE model and
thus the "min"
algorithm are violated in empirical data. Therefore the second
algorithm, "asr"
, is implemented to remedy this. The
"asr"
algorithm uses the reconstructed states at each node in the phylogeny inferring
whether a species is not present on the island, non-endemic to the island, or
endemic to the island. Using this information the algorithm can traverse the
phylogeny back to the node where the island clade colonised the island. This
algorithm overcomes the limitations of the "min"
algorithm by allowing
non-endemic species to be part of island clades (extracting them as endemic
clades for the purposes of applying the DAISIE inference model), and additionally
allowing species that are not present on the island to be included in data when
embedded within an endemic island clade (this feature is turned on/off with the
include_not_present
argument in extract_island_species()
). Therefore, the
"asr"
algorithm has benefits when the focal island system has experienced some
species movement from the island to other regions. However, it is not without
limitations, ancestral state reconstruction models should be interpreted with
caution and uncertainty of a species geographic range deep in the past, near the
root of the tree is often high leading to variability in interpretation of
whether a species was present on the island at the time. The formulation of the
ancestral state reconstruction model is also important, with the transition
matrix between states crucial to plausible results. By default we use a
symmetrical transition structure where species go from not present on the island,
to non-endemic and then to endemic. Without jumps from not present to endemic
and vice versa. This is in line with the reasoning in the DAISIE model that
species that colonise the island do not migrate their entire mainland
population, instead going through a widespread range, before becoming endemic
via cladogenesis or anagenesis on the island, or extinction of the mainland
population.
In this vignette we demonstrate the sensitivity of the parameters estimated
from the DAISIE maximum likelihood inference model to changes in the algorithm
used to extract the data. We apply the "min"
and "asr"
algorithms, and within
"asr"
we apply two different models of ancestral state reconstruction: parsimony
and continuous-time Markov model (Mk model). Traditionally, these have been two
of the most common methods for reconstructing ancestral states, for other
methods to reconstruct ancestral ranges see Extending_asr vignette in the
DAISIEprep
package.
The data we use for this example is the macro-phylogeny of mammals[@upham_inferring_2019] and the island endemicity data of Madagascar [@michielsen_macroevolutionary_2023]. The mammal phylogeny is a global phylogeny containing most mammal species and the Madagascar checklist is the most up-to-date catelog of Madagascars mammal fauna. The phylogeny is constructed from genetic sequences to create the DNA-only phylogeny. Species that are known but for which genetic data is unavailable are inserted into the tree using a polytomy resolving technique which produces the complete phylogeny. We test the sensitivity of estimates for both the DNA-only and complete phylogenies.
The results presented in this vignette are not computing each time the vignette
is rendered due to the large computation time required. Instead, the analyses are
run on a cluster computer and saved in the package. The analysis script run to
produce the results can be found in the DAISIEprepExtra
package here.
The sensitivity analysis uses the sensitivity()
function in the DAISIEprep
package.
The sensitivity()
function creates a table of all possible combination of
data extraction settings, given the input arguments, and this forms our parameter
space for the sensitivity analysis. The phylogenetic trees and island endemicity
data is provided to carry out the extraction and formatting. DAISIEprep
uses
the phylo4d
class from the phylobase
R package. However, for the sensitivity()
function a phylo
object can be provided and all the house-keeping is taken care
of inside the function. The sensitivity()
function loops through each parameter
setting and extracts and formats the data and fit the DAISIE
maximum likelihood inference model (DAISIE::DAISIE_ML_CS()
) to the data.
The output produces results for the DNA-only phylogeny and the complete phylogeny. The raw data of parameter estimates for the different parameter settings is tidied into a tibble containing the data we need for both the DNA and complete phylogeny.
sensitivity_data <- DAISIEprep:::read_sensitivity()
We can plot the distribution of parameter estimates for the DNA and complete data sets.
DAISIEprep:::plot_sensitivity( sensitivity_data = sensitivity_data$sensitivity_dna )
DAISIEprep:::plot_sensitivity( sensitivity_data = sensitivity_data$sensitivity_complete )
Most parameters are relatively unsensitivite to the different phylogenies
across the posterior distribution of possible trees. The noticable outlier is
colonisation rate, where the choice of extraction algorithm heavily influences
the inferred rate of island colonisation. The "min"
algorithm shows the highest
rate of colonisation, likely due to breaking up island clades that may have
undergone some back-colonisation. The smallest estimated colonisation rate comes
from the "asr"
algorithm using parsimony to reconstruct the geographical states
in the phylogeny. This can be explained by parsimony favouring the fewest number
of state changes (i.e. range shifts from mainland to island), which translates
into fewer colonisation events and likely lumping clades together which may have
independently colonised the island.
An alternative plot is to look at the pairwise comparisons of each estimated rate from the DAISIE inference model across each posterior phylogeny.
DAISIEprep:::plot_sensitivity( sensitivity_data = sensitivity_data$sensitivity_dna, pairwise_diffs = TRUE )
DAISIEprep:::plot_sensitivity( sensitivity_data = sensitivity_data$sensitivity_complete, pairwise_diffs = TRUE )
The general pattern is the same as the density plots shown above. The rates of cladogenesis, anagenesis and extinction are largely clusters with little clear separation of estimates by extraction method. The exception is again colonisation rate which shows visible clustering of rate estimates based on which extraction method is chosen.
Here we have demonstrated the variability, or lack of, in parameter estimates from phylogenetic data on an island community when changing the data as well as the choice of extraction algorithm.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.