fitContinuous: Model fitting for continuous comparative data

Description Usage Arguments Details Value Note Author(s) References Examples

View source: R/traits.R

Description

fitting macroevolutionary models to phylogenetic trees

Usage

1
2
3
4
fitContinuous(phy, dat, SE = 0,
    model = c("BM","OU","EB","rate_trend","lambda","kappa","delta","mean_trend","white"),
    bounds= list(), control = list(method = c("subplex","L-BFGS-B"),
    niter = 100, FAIL = 1e+200, hessian = FALSE, CI = 0.95), ncores=NULL, ...)

Arguments

phy

a phylogenetic tree of class phylo

dat

data vector for a single trait, with names matching tips in phy

SE

a single value or named vector of standard errors associated with values in dat; if any elements in the vector SE are NA, SE will be estimated

model

model to fit to comparative data (see Details)

bounds

range to constrain parameter estimates (see Details)

control

settings used for optimization of the model likelihood

ncores

Number of cores. If NULL then number of cores is detected

...

additional arguments to be passed to the internal likelihood function bm.lik

Details

This function fits various likelihood models for continuous character evolution. The function returns parameter estimates and the likelihood for univariate datasets.

The model likelihood is maximized using methods available in optim as well as subplex. Optimization methods to be used within optim can be specified through the control object (i.e., control$method).

A number of random starting points are used in optimization and are given through the niter element within the control object (e.g., control$niter). The FAIL value within the control object should be a large value that will be considerably far from -lnL of the maximum model likelihood. In most cases, the default setting for control$FAIL will be appropriate. The Hessian may be used to compute confidence intervals (CI) for the parameter estimates if the hessian element in control is TRUE.

Beware: difficulty in finding the optimal solution is determined by an interaction between the nature and complexity of the likelihood space (which is data- and model-dependent) and the numerical optimizer used to explore the space. There is never a guarantee that the optimal solution is found, but using many random starting points (control$niter) and many optimization methods (control$method) will increase these odds.

Bounds for the relevant parameters of the fitted model may be given through the bounds argument. Bounds may be necessary (particularly under the OU model) if the likelihood surface is characterized by a long, flat ridge which can be exceedingly difficult for optimization methods. Several bounds can be given at a time (e.g., bounds=list(SE=c(0,0.1),alpha=c(0,1)) would constrain measurement error as well as the 'constraint' parameter of the Ornstein-Uhlenbeck model). Default bounds under the different models are given below.

Possible models are as follows:

Value

fitContinuous returns a list with the following four elements:

lik

is the function used to compute the model likelihood. The returned function (lik) takes arguments that are necessary for the given model. For instance, if estimating a Brownian-motion model with unknown standard error, the arguments (pars) to the lik function would be sigsq and SE. By default, the function evaluates the likelihood of the model by assuming the maximum likelihood root state. This behavior can be changed in the call to lik with lik(pars, root=ROOT.GIVEN) where pars includes a value for the root state (z0). See Examples for a demonstration. The tree and data are stored internally within the lik function, which permits those elements to be efficiently reused when computing the likelihood under different parameter values

bnd

is a matrix of the used bounds for the relevant parameters estimated in the model. Warnings will be issued if any parameter estimates occur at the supplied (or default) parameter bounds

res

is a matrix of results from optimization. Rownames of the res matrix are the optimization methods (see optim and subplex). The columns in the res matrix are the estimated parameter values, the estimated model likelihood, and an indication of optimization convergence. Values of convergence not equal to zero are not to be trusted

opt

is a list of the primary results: estimates of the parameters, the maximum-likelihood estimate (lnL) of the model, the optimization method used to compute the MLE, the number of model parameters (k, including one parameter for the root state), the AIC (aic), sample-size corrected AIC (aicc). The number of observations for AIC computation is taken to be the number of trait values observed. If the Hessian is used, confidence intervals on the parameter estimates (CI) and the Hessian matrix (hessian) are also returned

Note

To speed the likelihood search, one may set an environment variable to make use of parallel processing, used by mclapply. To set the environment variable, use options(mc.cores=INTEGER), where INTEGER is the number of available cores. Alternatively, the mc.cores variable may be preset upon the initiation of an R session (see Startup for details).

Author(s)

LJ Harmon, W Challenger, and JM Eastman

References

Blomberg SP, T Garland, and AR Ives. 2003. Testing for phylogenetic signal in comparative data: behavioral traits are more labile. Evolution 57:717-745.

Butler MA and AA King, 2004. Phylogenetic comparative analysis: a modeling approach for adaptive evolution. American Naturalist 164:683-695.

Felsenstein J. 1973. Maximum likelihood estimation of evolutionary trees from continuous characters. American Journal of Human Genetics 25:471-492.

Harmon LJ et al. 2010. Early bursts of body size and shape evolution are rare in comparative data. Evolution 64:2385-2396.

Pagel M. 1999. Inferring the historical patterns of biological evolution. Nature 401:877-884

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
geo=get(data(geospiza))

tmp=treedata(geo$phy, geo$dat)
phy=tmp$phy
dat=tmp$data

#---- STORE RESULTS
brownFit <-  fitContinuous(phy, dat[,"wingL"], SE=NA, control=list(niter=50), ncores=2)

#---- PRINT RESULTS
print(names(brownFit))
print(brownFit)


#---- COMPUTE LIKELIHOOD
flik=brownFit$lik
print(argn(flik))

#---- CREATE a FUNCTION to COMPARE MODELS
fitGeospiza=function(trait=c("wingL","tarsusL","culmenL","beakD","gonysW")){

	trait=match.arg(trait, c("wingL","tarsusL","culmenL","beakD","gonysW"))

	# define set of models to compare
	models=c("BM", "OU", "EB", "white")
	summaries=c("diffusion", "Ornstein-Uhlenbeck", "early burst", "white noise")

	## ESTIMATING measurement error ##
	aic.se=numeric(length(models))
	lnl.se=numeric(length(models))

	for(m in 1:length(models)){
		cat("\n\n\n\n\t*** ", paste(toupper(summaries[m]),": fitting ", sep=""), models[m],
			" with SE *** \n", sep="")
		tmp=fitContinuous(phy,dat[,trait],SE=NA, model=models[m],
                                    bounds=list(SE=c(0,0.5)), ncores=2)
		print(tmp)
		aic.se[m]=tmp$opt$aicc
		lnl.se[m]=tmp$opt$lnL
	}


	## ASSUMING no measurement error ##
	aic=numeric(length(models))
	lnl=numeric(length(models))

	for(m in 1:length(models)){
		cat("\n\n\n\n\t*** ", paste(toupper(summaries[m]),": fitting ", sep=""), models[m],
			 " *** \n", sep="")
		tmp=fitContinuous(phy,dat[,trait],SE=0,model=models[m], ncores=2)
		print(tmp)
		aic[m]=tmp$opt$aicc
		lnl[m]=tmp$opt$lnL
	}

	## COMPARE AIC ##
	names(aic.se)<-names(lnl.se)<-names(aic)<-names(lnl)<-models
	delta_aic<-function(x) x-x[which(x==min(x))]

	# no measurement error
	daic=delta_aic(aic)
	cat("\n\n\n\t\t\t\t*** MODEL COMPARISON: ",trait," *** \n",sep="")
	cat("\tdelta-AIC values for models assuming no measurement error
    \t\t\t\t zero indicates the best model\n\n")
	print(daic, digits=2)

		# measurement error
	daic.se=delta_aic(aic.se)
	cat("\n\n\n\n\t\t\t\t*** MODEL COMPARISON: ",trait," ***\n",sep="")
	cat("\t\t   delta-AIC values for models estimating SE
    \t\t\t\t zero indicates the best model\n\n")
	print(daic.se, digits=2)
	cat("\n\n\n")

	res_aicc=rbind(aic, aic.se, daic, daic.se)
	rownames(res_aicc)=c("AICc","AICc_SE","dAICc", "dAICc_SE")

	return(res_aicc)
}

#---- COMPARE MODELS for WING LENGTH
res=fitGeospiza("wingL")
print(res)

geiger documentation built on July 8, 2020, 7:12 p.m.