overfitRR: Testing RRphylo overfit
In RRphylo: Phylogenetic Ridge Regression Methods for Comparative Studies

overfitRR

R Documentation

Testing RRphylo overfit

Description

Testing the robustness of RRphylo results to sampling effects and phylogenetic uncertainty.

Usage

overfitRR(RR,y, phylo.list, aces=NULL,x1=NULL, aces.x1=NULL, cov=NULL,
  rootV=NULL, clus=0.5, s = NULL, swap.args = NULL, nsim=NULL , trend.args =
  NULL, shift.args = NULL, conv.args = NULL, pgls.args = NULL)

Arguments

`RR`	an object produced by `RRphylo`.
`y`	a named vector of phenotypes.
`phylo.list`	a list (or `multiPhylo`) of alternative topologies (i.e. having the same species as the original tree arranged differently) to be tested.
`aces`	if used to produce the `RR` object, the vector of those ancestral character values at nodes known in advance must be specified. Names correspond to the nodes in the tree.
`x1`	the additional predictor to be specified if the RR object has been created using an additional predictor (i.e. multiple version of `RRphylo`). `'x1'` vector must be as long as the number of nodes plus the number of tips of the tree, which can be obtained by running `RRphylo` on the predictor as well, and taking the vector of ancestral states and tip values to form the `x1`.
`aces.x1`	a named vector of ancestral character values at nodes for `x1`. It must be indicated if the RR object has been created using both `aces` and `x1`. Names correspond to the nodes in the tree.
`cov`	if used to produce the `RR` object, the covariate must be specified. As in `RRphylo`, the covariate vector must be as long as the number of nodes plus the number of tips of the tree, which can be obtained by running `RRphylo` on the covariate as well, and taking the vector of ancestral states and tip values to form the covariate.
`rootV`	if used to produce the `RR` object, the phenotypic value at the tree root must be specified.
`clus`	the proportion of clusters to be used in parallel computing. To run the single-threaded version of `overfitRR` set `clus` = 0.
`s`, `swap.args`, `nsim`	are deprecated. Check the function `resampleTree` to generate alterative phylogenies.
`trend.args`	is deprecated. Check the function `overfitST` to test `search.trend` robustness.
`shift.args`	is deprecated. Check the function `overfitSS` to test `search.shift` robustness.
`conv.args`	is deprecated. Check the function `overfitSC` to test `search.conv` robustness.
`pgls.args`	is deprecated. Check the function `overfitPGLS` to test `PGLS_fossil` robustness.

Details

Methods using a large number of parameters risk being overfit. This usually translates in poor fitting with data and trees other than the those originally used. With RRphylo methods this risk is usually very low. However, the user can assess how robust the results of RRphylo are by running resampleTree and overfitRR. The former is used to subsample the tree according to a s parameter (that is the proportion of tips to be removed from the tree) and to alter tree topology by means of swapONE. The list of altered topologies is fed to overfitRR, which cross-references each tree with the phenotypic data and performs RRphylo on them. Thereby, both the potential for overfit and phylogenetic uncertainty are accounted for straight away.

Otherwise, a list of alternative phylogenies can be supplied to overfitRR. In this case subsampling and swapping arguments are ignored, and robustness testing is performed on the alternative topologies as they are.

Value

The function returns a 'RRphyloList' object containing:

$RR.list a 'RRphyloList' including the results of each RRphylo performed within overfitRR.

$root.est the estimated root value per simulation.

$rootCI the 95% confidence interval around the root value.

$ace.regressions a 'RRphyloList' including the results of linear regression between ancestral state estimates before and after the subsampling.

The output always has an attribute "Call" which returns an unevaluated call to the function.

Author(s)

Silvia Castiglione, Carmela Serio, Giorgia Girardi, Pasquale Raia

References

Castiglione, S., Tesone, G., Piccolo, M., Melchionna, M., Mondanaro, A., Serio, C., Di Febbraro, M., & Raia, P. (2018). A new method for testing evolutionary rate variation and shifts in phenotypic evolution. Methods in Ecology and Evolution, 9: 974-983.doi:10.1111/2041-210X.12954

Castiglione, S., Serio, C., Mondanaro, A., Di Febbraro, M., Profico, A., Girardi, G., & Raia, P. (2019a) Simultaneous detection of macroevolutionary patterns in phenotypic means and rate of change with and within phylogenetic trees including extinct species. PLoS ONE, 14: e0210101. https://doi.org/10.1371/journal.pone.0210101

Examples

## Not run: 
cc<- 2/parallel::detectCores()
library(ape)

## overfitRR routine
# load the RRphylo example dataset including Ornithodirans tree and data
data("DataOrnithodirans")
DataOrnithodirans$treedino->treedino
DataOrnithodirans$massdino->massdino
DataOrnithodirans$statedino->statedino

# extract Pterosaurs tree and data
extract.clade(treedino,746)->treeptero
massdino[match(treeptero$tip.label,names(massdino))]->massptero
massptero[match(treeptero$tip.label,names(massptero))]->massptero

# peform RRphylo on body mass
RRphylo(tree=treeptero,y=log(massptero),clus=cc)->RRptero

# generate a list of subsampled and swapped phylogenies to test
treeptero.list<-resampleTree(RRptero$tree,s = 0.25,swap.si = 0.1,swap.si2 = 0.1,nsim=10)

# test the robustness of RRphylo
ofRRptero<-overfitRR(RR = RRptero,y=log(massptero),phylo.list=treeptero.list,clus=cc)


## overfitRR routine on multiple RRphylo
# load the RRphylo example dataset including Cetaceans tree and data
data("DataCetaceans")
DataCetaceans$treecet->treecet
DataCetaceans$masscet->masscet
DataCetaceans$brainmasscet->brainmasscet
DataCetaceans$aceMyst->aceMyst

# cross-reference the phylogenetic tree and body and brain mass data. Remove from
# both the tree and vector of body sizes the species whose brain size is missing
drop.tip(treecet,treecet$tip.label[-match(names(brainmasscet),treecet$tip.label)])->treecet.multi
masscet[match(treecet.multi$tip.label,names(masscet))]->masscet.multi

# peform RRphylo on the variable (body mass) to be used as additional predictor
RRphylo(tree=treecet.multi,y=masscet.multi,clus=cc)->RRmass.multi
RRmass.multi$aces[,1]->acemass.multi

# create the predictor vector: retrieve the ancestral character estimates
# of body size at internal nodes from the RR object ($aces) and collate them
# to the vector of species' body sizes to create
c(acemass.multi,masscet.multi)->x1.mass

# peform RRphylo on brain mass by using body mass as additional predictor
RRphylo(tree=treecet.multi,y=brainmasscet,x1=x1.mass,clus=cc)->RRmulti

# generate a list of subsampled and swapped phylogenies to test
treecet.list<-resampleTree(RRmulti$tree,s = 0.25,swap.si=0.1,swap.si2=0.1,nsim=10)

# test the robustness of multiple RRphylo
ofRRcet<-overfitRR(RR = RRmulti,y=brainmasscet,phylo.list=treecet.list,clus=cc,x1 =x1.mass)

## End(Not run)

RRphylo documentation built on April 3, 2025, 9:43 p.m.