find.cohorts: Identify macroevolutionary cohorts

Description Usage Arguments Details Value

View source: R/find.cohorts.R


Perform simulated annealing to identify macroevolutionary cohorts.


find.cohorts(x, phy, kmax, control = list())



A vector or matrix of tip statistics. If x is a matrix the function assumes each column is a distribution of tip statistics.


A phylogeny that inherits from class phylo and class extended-phylo.


The maximum number of cohorts to fit. Must be greater than 0.


An optional list containing parameters for the simulating annealing algorithm. Options are:

  • T0 The initial temperature for the chain. Default is 1.0.

  • alpha The annealing schedule. The temperature after n-steps is T0 * alpha^n. Default is 0.9.

  • iter Iterations per temperature. Default is 1000.

  • gmfrac Fraction of moves that represent tree-wide jumps to new nodes as opposed to neighboring node interchanges. Default is 0.1.

  • savecohorts A boolean indicating whether or not to save and return the cohort memberships. Default is TRUE. One reason this may be FALSE is if x represents a set of simulated null distributions and only the resulting scores are of interest (see below).


The function attempts to identify groups of tips (cohorts) with similar properties based on a set of statistics computed for each tip. Typically these statistics will have some relationship to an evolutionary rate, although any statistic will work in principle. Every interal node in a phylogeny can define a cohort that may include all or some of the tips descended from it (cohorts can be nested, so only a subset of tips may be included). The function performs simulated annealing to determine the arrangement of the k cohorts that minimizes the total within cohort sum of squares of the tip statistic(s) x from k = 1 to k = kmax. Cohorts are defined by a set of k-1 non-root internal nodes plus the root node. The annealing chaing stops once the temperature falls below the hard coded minimum of 0.0001. The number of steps in the chain is therefore: ceiling((log(0.0001)-log(T0))/log(alpha)) * iter

Note that the function does not determine the optimal number of cohorts, just the optimal arrangement (based on minimizing the total within cohort sum of squares) given a fixed number. However, x may represent a distribution of tip statistics derived from a null model, and this provides a means of determining the optimal number using other methods such as the gap statistic.


If savecohorts=TRUE a list with two components:

Otherwise a vector or matrix of the total within group sum of squares implied by the cohort memberships at each k.

blueraleigh/pea.toolkits documentation built on Dec. 24, 2017, 3:20 p.m.