Palaeobiology demo: disparity-through-time and within groups

This demo aims to give quick overview of the dispRity package (v.r version_release) for palaeobiology analyses of disparity, including disparity through time analyses.

This demo showcases a typical disparity-through-time analysis: we are going to test whether the disparity changed through time in a subset of eutherian mammals from the last 100 million years using a dataset from @beckancient2014.

Before starting

The morphospace

In this example, we are going to use a subset of the data from @beckancient2014. See the example data description for more details. Briefly, this dataset contains an ordinated matrix of the Gower distance between 50 mammals based (BeckLee_mat50), another matrix of the same 50 mammals and the estimated discrete data characters of their descendants (thus 50 + 49 rows, BeckLee_mat99), a dataframe containing the ages of each taxon in the dataset (BeckLee_ages) and finally a phylogenetic tree with the relationships among the 50 mammals (BeckLee_tree). The ordinated matrix will represent our full morphospace, i.e. all the mammalian morphologies that ever existed through time (for this dataset).

## Loading demo and the package data
library(dispRity)

## Setting the random seed for repeatability
set.seed(123)

## Loading the ordinated matrix/morphospace:
data(BeckLee_mat50)
data(BeckLee_mat99)
head(BeckLee_mat50[,1:5])

dim(BeckLee_mat50)
## The morphospace contains 50 taxa and has 48 dimensions (or axes)

## Showing a list of first and last occurrences data for some fossils
data(BeckLee_ages)
head(BeckLee_ages)

## Plotting a phylogeny
data(BeckLee_tree)
plot(BeckLee_tree, cex = 0.7)
axisPhylo(root = 140)

You can have an even nicer looking tree if you use the strap package!

if(!require(strap)) install.packages("strap")
strap::geoscalePhylo(BeckLee_tree, cex.tip = 0.7, cex.ts = 0.6)

Setting up your own data

I greatly encourage you to follow along this tutorial with your very own data: it is more exciting and, ultimately, that's probably your objective.

What data can I use?

You can use any type of morphospace in any dataset form ("matrix", "data.frame"). Throughout this tutorial, you we assume you are using the (loose) morphospace definition from @Guillerme2020: any matrix were columns are traits and rows are observations (in a distance matrix, columns are still trait, i.e. "distance to species A", etc.). We won't cover it here but you can also use lists of matrices and list of trees.

How should I format my data for this tutorial?

To go through this tutorial you will need:

If you are missing any of these, fear not, here are a couple of functions to simulate the missing data, it will surely make your results look funky but it'll let you go through the tutorial.

WARNING: the data generated by the functions i.need.a.matrix, i.need.a.tree, i.need.node.data and i.need.FADLAD are used to SIMULATE data for this tutorial. This is not to be used for publications or analysing real data! If you need a data matrix, a phylogenetic tree or FADLAD data, (i.need.a.matrix, i.need.a.tree and i.need.FADLAD), you will actually need to collect data from the literature or the field! If you need node data, you will need to use ancestral states estimations (e.g. using estimate_ancestral_states from the Claddis package).

## Functions to get simulate a PCO looking like matrix from a tree
i.need.a.matrix <- function(tree) {
    matrix <- space.maker(elements = Ntip(tree), dimensions = Ntip(tree), distribution = rnorm,
                          scree = rev(cumsum(rep(1/Ntip(tree), Ntip(tree)))))
    rownames(matrix) <- tree$tip.label
    return(matrix)
}

## Function to simulate a tree
i.need.a.tree <- function(matrix) {
    tree <- rtree(nrow(matrix))
    tree$root.time <- max(tree.age(tree)$age)
    tree$tip.label <- rownames(matrix)
    tree$node.label <- paste0("n", 1:(nrow(matrix)-1))
    return(tree)
}

## Function to simulate some "node" data
i.need.node.data <- function(matrix, tree) {
    matrix_node <- space.maker(elements = Nnode(tree), dimensions = ncol(matrix),
                               distribution = rnorm, scree = apply(matrix, 2, var))
    if(!is.null(tree$node.label)) {
        rownames(matrix_node) <- tree$node.label
    } else {
        rownames(matrix_node) <- paste0("n", 1:(nrow(matrix)-1))
    }
    return(rbind(matrix, matrix_node))
}

## Function to simulate some "FADLAD" data
i.need.FADLAD <- function(tree) {
    tree_ages <- tree.age(tree)[1:Ntip(tree),]
    return(data.frame(FAD = tree_ages[,1], LAD = tree_ages[,1], row.names = tree_ages[,2]))
}

You can use these functions for the generating the data you need. For example

## Aaaaah I don't have FADLAD data!
my_FADLAD <- i.need.FADLAD(tree)
## Sorted.

In the end this is what your data should be named to facilitate the rest of this tutorial (fill in yours here):

## A matrix with tip data
my_matrix <- BeckLee_mat50

## A phylogenetic tree 
my_tree <- BeckLee_tree

## A matrix with tip and node data
my_tip_node_matrix <- BeckLee_mat99

## A table of first and last occurrences data (FADLAD)
my_fadlad <- BeckLee_ages

A disparity-through-time analysis

Splitting the morphospace through time

One of the crucial steps in disparity-through-time analysis is to split the full morphospace into smaller time subsets that contain the total number of morphologies at certain points in time (time-slicing) or during certain periods in time (time-binning). Basically, the full morphospace represents the total number of morphologies across all time and will be greater than any of the time subsets of the morphospace.

The dispRity package provides a chrono.subsets function that allows users to split the morphospace into time slices (using method = continuous) or into time bins (using method = discrete). In this example, we are going to split the morphospace into five equal time bins of 20 million years long from 100 million years ago to the present. We will also provide to the function a table containing the first and last occurrences dates for some fossils to take into account that some fossils might occur in several of our different time bins.

## Creating the vector of time bins ages
time_bins <- rev(seq(from = 0, to = 100, by = 20))

## Splitting the morphospace using the chrono.subsets function
binned_morphospace <- chrono.subsets(data = my_matrix, tree = my_tree,
    method = "discrete", time = time_bins, inc.nodes = FALSE,
    FADLAD = my_fadlad)

The output object is a dispRity object (see more about that here. In brief, dispRity objects are lists of different elements (i.e. disparity results, morphospace time subsets, morphospace attributes, etc.) that display only a summary of the object when calling the object to avoiding filling the R console with superfluous output. It also allows easy plotting/summarising/analysing for repeatability down the line but we will not go into this right now.

## Printing the class of the object
class(binned_morphospace)

## Printing the content of the object
str(binned_morphospace)
names(binned_morphospace)

## Printing the object as a dispRity class
binned_morphospace

These objects will gradually contain more information when completing the following steps in the disparity-through-time analysis.

Bootstrapping the data

Once we obtain our different time subsets, we can bootstrap and rarefy them (i.e. pseudo-replicating the data). The bootstrapping allows us to make each subset more robust to outliers and the rarefaction allows us to compare subsets with the same number of taxa to remove sampling biases (i.e. more taxa in one subset than the others). The boot.matrix function bootstraps the dispRity object and the rarefaction option within performs rarefaction.

## Getting the minimum number of rows (i.e. taxa) in the time subsets
minimum_size <- min(size.subsets(binned_morphospace))

## Bootstrapping each time subset 100 times and rarefying them 
rare_bin_morphospace <- boot.matrix(binned_morphospace, bootstraps = 100,
    rarefaction = minimum_size)

Note how information is adding up to the dispRity object.

Calculating disparity

We can now calculate the disparity within each time subsets along with some confidence intervals generated by the pseudoreplication step above (bootstraps/rarefaction). Disparity can be calculated in many ways and this package allows users to come up with their own disparity metrics. For more details, please refer to the dispRity metric section (or directly use moms).

In this example, we are going to look at how the spread of the data in the morphospace through time. For that we are going to use the sum of the variance from each dimension of the morphospace in the morphospace. We highly recommend using a metric that makes sense for your specific analysis and for your specific dataset and not just because everyone uses it [@moms, @Guillerme2020]!

How can I be sure that the metric is the most appropriate for my morphospace and question?

This is not a straightforward question but you can use the test.metric function to check your assumptions (more details here): basically what test.metric does is modifying your morphospace using a null process of interest (e.g. changes in size) and checks whether your metric does indeed pick up that change. For example here, let see if the sum of variances picks up changes in size but not random changes:

my_test <- test.metric(my_matrix, metric = c(sum, dispRity::variances), shifts = c("random", "size"))
summary(my_test)
plot(my_test)

We see that changes in the inner size (see @moms for more details) is actually picked up by the sum of variances but not random changes or outer changes. Which is a good thing!

As you've noted, the sum of variances is defined in test.metric as c(sum, variances). This is a core bit of the dispRity package were you can define your own metric as a function or a set of functions. You can find more info about this in the dispRity metric section but in brief, the dispRity package considers metrics by their "dimensions" level which corresponds to what they output. For example, the function sum is a dimension level 1 function because no matter the input it outputs a single value (the sum), variances on the other hand is a dimension level 2 function because it will output the variance of each column in a matrix (an example of a dimensions level 3 would be the function var that outputs a matrix). The dispRity package always automatically sorts the dimensions levels: it will always run dimensions level 3 > dimensions level 2 > and dimensions level 1. In this case both c(sum, variances) and c(variances, sum) will result in actually running sum(variances(matrix)).

Anyways, let's calculate the sum of variances on our bootstrapped and rarefied morphospaces:

## Calculating disparity for the bootstrapped and rarefied data
disparity <- dispRity(rare_bin_morphospace , metric = c(sum, dispRity::variances))

To display the actual calculated scores, we need to summarise the disparity object using the S3 method summary that is applied to a dispRity object (see ?summary.dispRity for more details). By the way, as for any R package, you can refer to the help files for each individual function for more details.

## Summarising the disparity results
summary(disparity)

The summary.dispRity function comes with many options on which values to calculate (central tendency and quantiles) and on how many digits to display. Refer to the function's manual for more details.

Plotting the results

It is sometimes easier to visualise the results in a plot than in a table. For that we can use the plot S3 function to plot the dispRity objects (see ?plot.dispRity for more details).

## Graphical options
quartz(width = 10, height = 5) ; par(mfrow = (c(1,2)), bty = "n")

## Plotting the bootstrapped and rarefied results
plot(disparity, type = "continuous", main = "bootstrapped results")
plot(disparity, type = "continuous", main = "rarefied results",
     rarefaction = minimum_size)

Nice. The curves look pretty similar.

Same as for the summary.dispRity function, check out the plot.dispRity manual for the many, many options available.

Testing differences

Finally, to draw some valid conclusions from these results, we can apply some statistical tests. We can test, for example, if mammalian disparity changed significantly through time over the last 100 million years. To do so, we can compare the means of each time-bin in a sequential manner to see whether the disparity in bin n is equal to the disparity in bin n+1, and whether this is in turn equal to the disparity in bin n+2, etc. Because our data is temporally autocorrelated (i.e. what happens in bin n+1 depends on what happened in bin n) and pseudoreplicated (i.e. each bootstrap draw creates non-independent time subsets because they are all based on the same time subsets), we apply a non-parametric mean comparison: the wilcox.test. Also, we need to apply a p-value correction (e.g. Bonferroni correction) to correct for multiple testing (see ?p.adjust for more details).

## Testing the differences between bins in the bootstrapped dataset.
test.dispRity(disparity, test = wilcox.test, comparison = "sequential",
    correction = "bonferroni")

## Testing the differences between bins in the rarefied dataset.
test.dispRity(disparity, test = wilcox.test, comparison = "sequential",
    correction = "bonferroni", rarefaction  = minimum_size)

Here our results show significant changes in disparity through time between all time bins (all p-values < 0.05). However, when looking at the rarefied results, there is no significant difference between the time bins in the Palaeogene (60-40 to 40-20 Mya), suggesting that the differences detected in the first test might just be due to the differences in number of taxa sampled (13 or 6 taxa) in each time bin.

Some more advanced stuff

The previous section detailed some of the basic functionalities in the dispRity package but of course, you can do some much more advanced analysis, here is just a list of some specific tutorials from this manual that you might be interested in:

You can even come up with your own ideas, implementations and modifications of the package: the dispRity package is a modular and collaborative package and I encourage you to contact me (guillert@tcd.e) for any ideas you have about adding new features to the package (whether you have them already implemented or not)!



TGuillerme/dispRity documentation built on Dec. 21, 2024, 4:05 a.m.