geographic_acf: Phylogenetic autocorrelation function of geographic...

View source: R/geographic_acf.R

geographic_acfR Documentation

Phylogenetic autocorrelation function of geographic locations.

Description

Given a rooted phylogenetic tree and geographic coordinates (latitudes & longitudes) of each tip, calculate the phylogenetic autocorrelation function (ACF) of the geographic locations. The ACF is a function of phylogenetic distance x, i.e., ACF(x) is the autocorrelation between two tip locations conditioned on the tips having phylogenetic ("patristic") distance x.

Usage

geographic_acf( trees,
                tip_latitudes,
                tip_longitudes,
                Npairs              = 10000,
                Nbins               = NULL,
                min_phylodistance   = 0,
                max_phylodistance   = NULL,
                uniform_grid        = FALSE,
                phylodistance_grid  = NULL)

Arguments

trees

Either a single rooted tree of class "phylo", or a list of multiple such trees.

tip_latitudes

Either a numeric vector of size Ntips (if trees was a single tree), specifying the latitudes (decimal degrees) of the tree's tips, or a list of such numeric vectors (if trees contained multiple trees) specifying the latitudes of each tree's tips. Note that tip_latitudes[k][i] must correspond to the i-th tip in the k-th input tree, i.e. as listed in trees[[k]]$tip.label. By convention, positive latitudes correspond to the northern hemisphere.

tip_longitudes

Similar to tip_latitudes, but listing the latitudes (decimal degrees) of each tip in each input tree. By convention, positive longitudes correspond to the hemisphere East of the prime meridian.

Npairs

Maximum number of random tip pairs to draw from each tree. A greater number of tip pairs will improve the accuracy of the estimated ACF within each distance bin. Tip pairs are drawn randomly with replacement, if Npairs is lower than the number of tip pairs in a tree. If Npairs=Inf, then every tip pair of every tree is included exactly once (for small and moderately sized trees this is recommended).

Nbins

Number of phylogenetic distance bins to consider. A greater number of bins will increase the resolution of the ACF as a function of phylogenetic distance, but will decrease the number of tip pairs falling within each bin (which reduces the accuracy of the estimated ACF). If NULL, then Nbins is automatically and somewhat reasonably chosen based on the size of the input trees.

min_phylodistance

Numeric, minimum phylogenetic distance to conssider. Only relevant if phylodistance_grid is NULL.

max_phylodistance

Numeric, optional maximum phylogenetic distance to consider. If NULL, this is automatically set to the maximum phylodistance between any two tips.

uniform_grid

Logical, specifying whether the phylodistance grid should be uniform, i.e., with equally sized phylodistance bins. If FALSE, then the grid is chosen non-uniformly (i.e., each bin has different size) such that each bin roughly contains the same number of tip pairs. Only relevant if phylodistance_grid is NULL. It is generally recommended to keep uniform_grid=FALSE, to avoid uneven estimation errors across bins.

phylodistance_grid

Numeric vector, optional explicitly specified phylodistance bins (left boundaries thereof) on which to evaluate the ACF. Must contain non-negative numbers in strictly ascending order. Hence, the first bin will range from phylodistance_grid[1] to phylodistance_grid[2], while the last bin will range from tail(phylodistance_grid,1) to max_phylodistance. Can be used as an alternative to Nbins. If non-NULL, then Nbins, min_phylodistance and uniform_grid are irrelevant.

Details

The autocorrelation between random geographic locations is defined as the expectation of <X,Y>, where <> is the scalar product and X and Y are the unit vectors pointing towards the two random locations on the sphere. For comparison, for a spherical Brownian Motion model with constant diffusivity D and radius r the autocorrelation function is given by ACF(t)=e^{-2Dt/r^2} (see e.g. simulate_sbm). Note that this function assumes that Earth is a perfect sphere.

The phylogenetic autocorrelation function (ACF) of the geographic distribution of species can give insight into the dispersal processes shaping species distributions over global scales. An ACF that decays slowly with increasing phylogenetic distance indicates a strong phylogenetic conservatism of the location and thus slow dispersal, whereas a rapidly decaying ACF indicates weak phylogenetic conservatism and thus fast dispersal. Similarly, if the mean distance between two random tips increases with phylogenetic distance, this indicates a phylogenetic autocorrelation of species locations. Here, phylogenetic distance between tips refers to their patristic distance, i.e. the minimum cumulative edge length required to connect the two tips.

Since the phylogenetic distances between all possible tip pairs do not cover a continuoum (as there is only a finite number of tips), this function randomly draws tip pairs from the tree, maps them onto a finite set of phylodistance bins and then estimates the ACF for the centroid of each bin based on tip pairs in that bin. In practice, as a next step one would usually plot the estimated ACF (returned vector autocorrelations) over the centroids of the phylodistance bins (returned vector phylodistances). When multiple trees are provided as input, then the ACF is first calculated separately for each tree, and then averaged across trees (weighted by the number of tip pairs included from each tree in each bin).

Phylogenetic distance bins can be specified in two alternative ways: Either a set of bins (phylodistance grid) is automatically calculated based on the provided Nbins, min_phylodistance, max_phylodistance and uniform_grid, or a phylodistance grid is explicitly provided via phylodistance_grid and max_phylodistance.

The trees may include multi-furcations (i.e. nodes with more than 2 children) as well as mono-furcations (i.e. nodes with only one child). If edge lengths are missing from the trees, then every edge is assumed to have length 1. The input trees must be rooted at some node for technical reasons (see function root_at_node), but the choice of the root node does not influence the result.

This function assumes that each tip is assigned exactly one geographic location. This might be problematic in situations where each tip covers multiple geographic locations, for example if tips are species and multiple individuals were sampled from each species. In that case, one might consider representing each individual as a separate tip in the tree, so that each tip has exactly one geographic location.

Value

A list with the following elements:

success

Logical, indicating whether the calculation was successful. If FALSE, an additional element error (character) is returned that provides a brief description of the error that occurred; in that case all other return values may be undefined.

phylodistances

Numeric vector of size Nbins, storing the center of each phylodistance bin in increasing order. This is equal to 0.5*(left_phylodistances+right_phylodistances). Typically, you will want to plot autocorrelations over phylodistances.

left_phylodistances

Numeric vector of size Nbins, storing the left boundary of each phylodistance bin in increasing order.

right_phylodistances

Numeric vector of size Nbins, storing the right boundary of each phylodistance bin in increasing order.

autocorrelations

Numeric vector of size Nbins, storing the estimated geographic autocorrelation for each phylodistance bin.

std_autocorrelations

Numeric vector of size Nbins, storing the standard deviation of geographic autocorrelations encountered in each phylodistance bin. Note that this is not the standard error of the estimated ACF; it is a measure for how different the geographic locations are between tip pairs within each phylodistance bin.

mean_geodistances

Numeric vector of size Nbins, storing the mean geographic distance between tip pairs in each distance bin, in units of sphere radii. If you want geographic distances in km, you need to multiply these by Earth's mean radius in km (about 6371). If multiple input trees were provided, this is the average across all trees, weighted by the number of tip pairs included from each tree in each bin.

std_geodistances

Numeric vector of size Nbins, storing the standard deviation of geographic distances between tip pairs in each distance bin, in units of sphere radii.

Npairs_per_distance

Integer vector of size Nbins, storing the number of random tip pairs associated with each distance bin.

Author(s)

Stilianos Louca

See Also

consentrait_depth, get_trait_acf

Examples

# generate a random tree
tree = generate_random_tree(list(birth_rate_intercept=1),max_tips=1000)$tree

# simulate spherical Brownian Motion on the tree
simul = simulate_sbm(tree, radius=1, diffusivity=0.1)
tip_latitudes  = simul$tip_latitudes
tip_longitudes = simul$tip_longitudes

# calculate geographical autocorrelation function
ACF = geographic_acf(tree, 
                     tip_latitudes, 
                     tip_longitudes,
                     Nbins        = 10,
                     uniform_grid = TRUE)

# plot ACF (autocorrelation vs phylogenetic distance)
plot(ACF$phylodistances, ACF$autocorrelations, type="l", xlab="distance", ylab="ACF")

castor documentation built on Aug. 18, 2023, 1:07 a.m.