# Fit evolutionary models for continuous trait data

### Description

Takes a dataset of continuous trait values for a series of sister pairs (e.g. sister species) with known ages of divergence and finds the maximum likelihood fits under a series of evolutionary models.

### Usage

1 2 3 4 5 | ```
model.test.sisters(DIST, TIME, GRAD, GRAD2 = "NULL", meserr1 = 0,
meserr2 = 0, models = c("BM_null", "BM_linear", "BM_2rate",
"BM_linear_breakpoint", "BM_quadratic", "OU_null", "OU_linear_beta",
"OU_linear", "OU_2rate", "OU_linear_breakpoint"), starting=NULL,
Beta_starting = NULL, Alpha_starting = NULL)
``` |

### Arguments

`DIST` |
vector of Euclidean distances for sister pair dataset |

`TIME` |
vector of evolutionary ages (i.e. node ages ) for sister pair dataset |

`GRAD` |
vector of gradient values (i.e. any continuous variable) for sister pair dataset (see Details) |

`GRAD2` |
this is a vector of gradient values for a second continuous variable to be used for models that test for the effect of two gradients on rates of evolution. Not currently implemented. |

`meserr1` |
a list of measurement errors that correspond to the first of each species in a sister pair. Order of sister pairs is the same as for DIST. |

`meserr2` |
a list of measurement errors that correspond to the second of each species in a sister pair. Order of sister pairs is the same as for DIST. |

`models` |
A vector listing which models to test. By default all models are tested. Models with more than 4 parameters should be used with caution, especially with small datasets (i.e. less than 100 sister pairs) as the data may provide insufficient power to reject simpler models in favour of complex ones. |

`starting` |
List of starting values for each model. If starting=NULL, the built-in starting parameters are used. The method can be sensitive to starting parameters. For each model, the null method tests a large number of starting parameters, and chooses the set of starting parameters maximized the likelihood. However, the null starting parameters may not be optimized for all datasets. The user can customize starting parameters. Each element of the list that is not "NULL" is a matrix, with the number of columns equal to the number of model parameters and each row containing a different set of starting parameters. See Example 3 below. |

`Beta_starting` |
vector of Beta starting values to test for BM_2rate and OU_2rate models. NULL uses built-in starting parameters are used. The null values are c(0.001, 0.01, 0.1, 1, 10, 100, 1000) for BM_2rate and c(0.01, 0.1, 1, 10, 100) for OU_2rate |

`Alpha_starting` |
vector of Alpha starting values to test for OU_2rate model. NULL uses built-in starting parameters are used. The null values are c(0.01, 0.1, 1, 10) |

### Details

Evolutionary models include null models whereby a single set of model parameters are fit to all sister pairs and models whereby parameters are allowed to vary as a function of another continuous variable. The second continuous variable could be elevation, latitude, body mass or any other continuous variable of interest, over which rates of trait evolution might vary. This function uses the nlm optimizer to search for the maximum likelihood estimates under 10 evolutionary models. For details of the evolutionary models implemented see sisterContinuous. Running all models on a dataset of about 100 sisters should take less than 5 minutes. Excluding those models with more than 4 parameters will speed up the search considerably. For BM_2rate and OU_2rate models, Beta and alpha values before and after the breakpoint are each set to the values in Beta_starting and Alpha_starting, and all possible pairwise combinations of these are tested together with a variety of breakpoints. Thus keep the length of these vectors of starting values under 5, or expect to wait a long time for the function to execute.

### Value

Returns a table with log-likelihoods, Akaike information criterion, and parameter estimates. The final column returns the exit status of the nlm function (and results should not be trusted if this value is 3 or higher; see nlm documentation). parameters are output in the same order as specified in sisterContinuous.

**BM_null:**Beta = b1**OU_null:**Beta = b1, Alpha = a1**BM_linear:**Beta_C = b1; Beta_slope = b1_slope**OU_linear:**Beta_C = b1, Beta_slope = b1_slope, Alpha_C = a1, Alpha_slope = a1_slope**BM_linear_beta:**Beta_C = b1, Beta_slope = b1_slope**OU_linear_beta:**Beta_C = b1, Beta_slope = b1_slope, Alpha = a1**BM_2rate parameters:**Beta1 = b1, breakpoint = breakpoint, Beta2 = b2**OU_2rate parameters:**Beta1 = b1, breakpoint = breakpoint, Beta2 = b2, Alpha1 = a1, Alpha2 = a2**BM_linear_breakpoint:**Beta_C1 = b1, Beta_slope1=b1_slope, breakpoint=breakpoint, Beta_Slope2=b2_slope**OU_linear_breakpoint:**Beta_C1 = b1, Beta_slope1=b1_slope, breakpoint=breakpoint, Beta_Slope2=b2_slope, Alpha_C1 = a1, Alpha_slope1=a1_slope, and Alpha_slope2 = a2_slope**BM_quadratic:**Beta_c = Quadratic_c, Beta_b = Quadratic_b, Beta_a = Quadratic_a

### Author(s)

Jason T. Weir

### References

Weir JT, D Wheatcroft, & T Price. 2012. The role of ecological constraint in driving the evolution of avian song frequency across a latitudinal gradient. Evolution 66,2773-2783.

Weir JT, & D Wheatcroft. 2011. A latitudinal gradient in rates of evolution of avian syllable diversity and song length. Proceedings of the Royal Society of London, B 278,1713-1720.

### See Also

sisterContinuous

### Examples

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 | ```
## Not run:
##Example 1
###This example uses the four models used in Weir et al. 2012 to test for
###a latitudinal effect on Euclidean distances for bird song pitch on 87
###forest sister pairs.
data(bird.pitch)
attach(bird.pitch)
#STEP 1 Correct Euclidean distances for sampling and measurement bias
DIST_cor <- MScorrection(nA=bird.pitch$number_individuals_Species1,
nB=bird.pitch$number_individuals_Species2,
VarA=bird.pitch$Variance_PC1and2_Species1,
VarB=bird.pitch$Variance_PC1and2_Species2,
DIST_actual=bird.pitch$Uncorrected_Euclidean_Distance)
#STEP 2 Extract and test only forest species
DIST <- subset(DIST_cor, subset = (bird.pitch$Habitat == "forest"))
TIME <- subset(bird.pitch$TIME,subset = (bird.pitch$Habitat == "forest"))
GRAD <- subset(bird.pitch$GRAD,
subset = (bird.pitch$Habitat == "forest"))
models = c("BM_null", "BM_linear", "OU_null", "OU_linear")
#The following generally takes 1 to 2 minutes to run
FIT1 <- model.test.sisters(DIST=DIST, TIME=TIME, GRAD=GRAD, models=models)
###The best fit model for forest species is the OU_linear model in which
###rates of evolution increase with latitude (b1_slope is positive) and
###evolutionary constraint declines with increasing latitude (a1_slope is
###negative).High latitude species are evolving faster and in a less
###constrained fashion.
##Example 2
###This example tests to see if allopatric and sympatric species pairs
###have significantly different rates under the BM_null model
#STEP 1 Correct Euclidean distances for sampling and measurement bias
DIST_cor <- MScorrection(nA=bird.pitch$number_individuals_Species1,
nB=bird.pitch$number_individuals_Species2,
VarA=bird.pitch$Variance_PC1and2_Species1,
VarB=bird.pitch$Variance_PC1and2_Species2,
DIST_actual=bird.pitch$Uncorrected_Euclidean_Distance)
#STEP 2 First, fit BM_linear to the entire dataset
DIST <- DIST_cor
TIME <- bird.pitch$TIME
GRAD <- bird.pitch$GRAD
models = c("BM_null")
FIT2a <- model.test.sisters(DIST=DIST, TIME=TIME, GRAD=GRAD, models=models)
#STEP 3 Next, fit BM_linear to the allopatric subset
DIST <- subset(DIST_cor, subset = (bird.pitch$Patry == "allopatric"))
TIME <- subset(bird.pitch$TIME,
subset = (bird.pitch$Patry == "allopatric"))
GRAD <- subset(bird.pitch$GRAD,
subset = (bird.pitch$Patry == "allopatric"))
models = c("BM_null")
FIT2b <- model.test.sisters(DIST=DIST, TIME=TIME, GRAD=GRAD, models=models)
#STEP 4 Finally, fit BM_linear to the sympatric subset
DIST <- subset(DIST_cor, subset = (bird.pitch$Patry == "sympatric"))
TIME <- subset(bird.pitch$TIME,
subset = (bird.pitch$Patry == "sympatric"))
GRAD <- subset(bird.pitch$GRAD,
subset = (bird.pitch$Patry == "sympatric"))
models = c("BM_null")
FIT2c <- model.test.sisters(DIST=DIST, TIME=TIME, GRAD=GRAD, models=models)
#STEP 5 Compare the AIC of the model fit to the entire dataset to the model
#with separate rates for allopatric and sympatric subsets.
###To calculate AIC for the allopatric and sympatric model
###the loglikelihoods for the subsets are summed
logLikelihood <- as.numeric(FIT2b[1,]) + as.numeric(FIT2c[1,])
###The subsets model has 2 parameters (1 for each subset)
###thus AIC = 2*2 - 2*logLike
AIC_forest_nonforest <- 2*2 - 2*logLikelihood
###The AIC for the entire dataset is 319.86 and for the model with separate rates
###for allopatric and sympatric AIC is 320.13. The best fit model is the full dataset
###model without separate rates for different subsets, indicating a failure to reject
###the null hypothesis in favour of separate rates for allopatric and sympatric
###species pairs.
##Example 3
###using the same data as Example 1, this example demonstrates user control of
###starting parameters
#STEP 1 generate matrices of starting values for those models which the user
#wishes to use customized starting values
p_matrix <- c(0.0001, 0.001, 0.01, 0.1, 1,2,3,4,5,10,100,1000)
BM_null_starting <- matrix(p_matrix, length(p_matrix), 1, byrow=TRUE)
p_matrix <- c(10, -1, 10, 1, 0, -0.1, 0, 0.1)
BM_linear_starting <- matrix(p_matrix, length(p_matrix)/2, 2, byrow=TRUE)
#first, use only 2 models, each with customize starting parameters
models <- c("BM_null", "BM_linear")
FIT3a <- model.test.sisters(DIST=DIST, TIME=TIME, GRAD=GRAD, models=models,
starting = list(BM_null_starting, BM_linear_starting) )
#next use 4 models, but customize starting parameters for only the first two
models <- c("BM_null", "BM_linear", "OU_null", "OU_linear")
FIT3b <- model.test.sisters(DIST=DIST, TIME=TIME, GRAD=GRAD, models=models,
starting = list(BM_null_starting, BM_linear_starting, "NULL", "NULL") )
##EXAMPLE 4
###This example uses the syllable dataset for oscine songbirds Weir & Wheatcroft 2011
data(bird.syllables)
attach(bird.syllables)
#STEP 1 Correct Euclidean distances for sampling and measurement bias
DIST_cor <- MScorrection(nA=bird.syllables$number_individuals_Species1,
nB=bird.syllables$number_individuals_Species2,
VarA=bird.syllables$Species1_PC2_var,
VarB=bird.syllables$Species2_PC2_var,
DIST_actual=abs(bird.syllables$Species1_PC2_mean -
bird.syllables$Species2_PC2_mean))
#STEP 2 Test all models on oscines only (in which song has a strong
#culturally transmitted component)
DIST <- subset(DIST_cor, subset = (bird.syllables$Suboscine == "oscine"))
TIME <- subset(bird.syllables$TIME,subset = (bird.syllables$Suboscine == "oscine"))
GRAD <- subset(bird.syllables$GRAD,
subset = (bird.syllables$Suboscine == "oscine"))
FIT5 <- model.test.sisters(DIST=DIST, TIME=TIME, GRAD=GRAD, models=models)
#The best fit model in FIT5 is BM_linear in which tropical species have a
#much slower rate than temperate species.
## End(Not run)#end dontrun
``` |