overfitRR | R Documentation |
Testing the robustness of search.trend
(Castiglione et al. 2019a), search.shift
(Castiglione et al. 2018), search.conv
(Castiglione et al. 2019b), and PGLS_fossil
results to
sampling effects and phylogenetic uncertainty.
overfitRR(RR,y,phylo.list=NULL,s=0.25,swap.args=NULL,trend.args=NULL,shift.args=NULL,
conv.args=NULL, pgls.args=NULL,aces=NULL,x1=NULL,aces.x1=NULL,cov=NULL,
rootV=NULL,nsim=100,clus=0.5)
RR |
an object produced by |
y |
a named vector of phenotypes. |
phylo.list |
a list (or multiPhylo) of alternative phylogenies to be tested. |
s |
the percentage of tips to be cut off. It is set at 25% by default.
If |
swap.args |
a list of arguments to be passed to the function
|
trend.args |
a list of arguments specific to the function
|
shift.args |
a list of arguments specific to the function
|
conv.args |
a list of arguments specific to the function
|
pgls.args |
a list of arguments specific to the function
|
aces |
if used to produce the |
x1 |
the additional predictor to be specified if the RR object has been
created using an additional predictor (i.e. multiple version of
|
aces.x1 |
a named vector of ancestral character values at nodes for
|
cov |
if used to produce the |
rootV |
if used to produce the |
nsim |
number of simulations to be performed. It is set at 100 by default. |
clus |
the proportion of clusters to be used in parallel computing. To
run the single-threaded version of |
Methods using a large number of parameters risk being overfit. This
usually translates in poor fitting with data and trees other than the those
originally used. With RRphylo
methods this risk is usually very low.
However, the user can assess how robust the results got by applying
search.shift
, search.trend
, search.conv
or
PGLS_fossil
are by running overfitRR
. With the latter, the
original tree and data are subsampled by specifying a s
parameter,
that is the proportion of tips to be removed from the tree. In some cases,
though, removing as many tips as imposed by s
would delete too many
tips right in clades and/or states under testing. In these cases, the
function maintains no less than 5 species at least in each clade/state
under testing (or all species if there is less), reducing the sampling
parameter s
if necessary. Internally, overfitRR
further
shuffles the tree by using the function swapONE
. Thereby,
both the potential for overfit and phylogenetic uncertainty are accounted
for straight away.
Otherwise, a list of alternative phylogenies can be supplied to
overfitRR
. In this case subsampling and swapping arguments are
ignored, and robustness testing is performed on the alternative topologies
as they are. If a clade has to be tested either in search.shift
,
search.trend
, or search.conv
, the function scans each
alternative topology searching for the corresponding clade. If the species
within such clade on the alternative topology differ more than 10
species within the clade in the original tree, the identity of the clade is
considered disrupted and the test is not performed.
The function returns a 'RRphyloList' object containing:
$mean.sampling the mean proportion of species actually removed from the tree over the iterations.
$tree.list a 'multiPhylo' list including the trees generated
within overfitRR
$RR.list a 'RRphyloList' including the results of each
RRphylo
performed within overfitRR
$rootCI the 95% confidence interval around the root value.
$ace.regressions a 'RRphyloList' including the results of linear regression between ancestral state estimates before and after the subsampling.
$conv.results a list including results for
search.conv
performed under clade
and state
conditions. If a node pair is specified within conv.args
, the
$clade
object contains the percentage of simulations producing
significant p-values for convergence between the clades, and the proportion
of tested trees (i.e. where the clades identity was preserved; always 1 if
no phylo.list
is supplied). If a state vector is supplied within
conv.args
, the object $state
contains the percentage of
simulations producing significant p-values for convergence within (single
state) or between states (multiple states).
$shift.results a list including results for
search.shift
performed under clade
and sparse
conditions. If one or more nodes are specified within shift.args
,
the $clade
object contains for each node the percentage of
simulations producing significant p-value separated by shift sign, and the
same figures by considering all the specified nodes as evolving under a
single rate (all.clades). For each node the proportion of tested trees
(i.e. where the clade identity was preserved; always 1 if no
phylo.list
is supplied) is also indicated. If a state vector is
supplied within shift.args
, the object $sparse
contains the
percentage of simulations producing significant p-value separated by shift
sign ($p.states).
$trend.results a list including the percentage of
simulations showing significant p-values for phenotypes versus age and
absolute rates versus age regressions for the entire tree separated by
slope sign ($tree). If one or more nodes are specified within
trend.args
, the list also includes the same results at nodes ($node)
and the results for comparison between nodes ($comparison). For each node the proportion
of tested trees (i.e. where the clade identity was preserved; always 1 if
no phylo.list
is supplied) is also indicated.
$pgls.results two 'RRphyloList' objects including results of
PGLS_fossil
performed by using the phylogeny as it is ($tree
)
or rescaled according to the RRphylo
rates ($RR
).
Silvia Castiglione, Carmela Serio, Pasquale Raia
Castiglione, S., Tesone, G., Piccolo, M., Melchionna, M., Mondanaro, A., Serio, C., Di Febbraro, M., & Raia, P. (2018). A new method for testing evolutionary rate variation and shifts in phenotypic evolution. Methods in Ecology and Evolution, 9: 974-983.doi:10.1111/2041-210X.12954
Castiglione, S., Serio, C., Mondanaro, A., Di Febbraro, M., Profico, A., Girardi, G., & Raia, P. (2019a) Simultaneous detection of macroevolutionary patterns in phenotypic means and rate of change with and within phylogenetic trees including extinct species. PLoS ONE, 14: e0210101. https://doi.org/10.1371/journal.pone.0210101
Castiglione, S., Serio, C., Tamagnini, D., Melchionna, M., Mondanaro, A., Di Febbraro, M., Profico, A., Piras, P.,Barattolo, F., & Raia, P. (2019b). A new, fast method to search for morphological convergence with shape data. PLoS ONE, 14, e0226949. https://doi.org/10.1371/journal.pone.0226949
overfitRR
vignette ;
search.trend
vignette ;
search.shift
vignette ;
search.conv
vignette ;
## Not run:
data("DataOrnithodirans")
DataOrnithodirans$treedino->treedino
DataOrnithodirans$massdino->massdino
DataOrnithodirans$statedino->statedino
cc<- 2/parallel::detectCores()
# Extract Pterosaurs tree and data
library(ape)
extract.clade(treedino,746)->treeptero
massdino[match(treeptero$tip.label,names(massdino))]->massptero
massptero[match(treeptero$tip.label,names(massptero))]->massptero
RRphylo(tree=treedino,y=massdino,clus=cc)->dinoRates
RRphylo(tree=treeptero,y=log(massptero),clus=cc)->RRptero
# Case 1 search.shift under both "clade" and "sparse" condition
search.shift(RR=dinoRates, status.type= "clade")->SSnode
search.shift(RR=dinoRates, status.type= "sparse", state=statedino)->SSstate
overfitRR(RR=dinoRates,y=massdino,swap.args =list(si=0.2,si2=0.2),
shift.args = list(node=rownames(SSnode$single.clades),state=statedino),
nsim=10,clus=cc)->orr.ss
# Case 2 search.trend on the entire tree
search.trend(RR=RRptero, y=log(massptero),nsim=100,clus=cc,cov=NULL,node=NULL)->STtree
overfitRR(RR=RRptero,y=log(massptero),swap.args =list(si=0.2,si2=0.2),
trend.args = list(),nsim=10,clus=cc)->orr.st1
# Case 3 search.trend at specified nodescov=NULL,
search.trend(RR=RRptero, y=log(massptero),node=143,clus=cc)->STnode
overfitRR(RR=RRptero,y=log(massptero),
trend.args = list(node=143),nsim=10,clus=cc)->orr.st2
# Case 4 overfitRR on multiple RRphylo
data("DataCetaceans")
DataCetaceans$treecet->treecet
DataCetaceans$masscet->masscet
DataCetaceans$brainmasscet->brainmasscet
DataCetaceans$aceMyst->aceMyst
ape::drop.tip(treecet,treecet$tip.label[-match(names(brainmasscet),
treecet$tip.label)])->treecet.multi
masscet[match(treecet.multi$tip.label,names(masscet))]->masscet.multi
RRphylo(tree=treecet.multi,y=masscet.multi,clus=cc)->RRmass.multi
RRmass.multi$aces[,1]->acemass.multi
c(acemass.multi,masscet.multi)->x1.mass
RRphylo(tree=treecet.multi,y=brainmasscet,x1=x1.mass,clus=cc)->RRmulti
search.trend(RR=RRmulti, y=brainmasscet,x1=x1.mass,clus=cc)->STcet
overfitRR(RR=RRmulti,y=brainmasscet,trend.args = list(),
x1=x1.mass,nsim=10,clus=cc)->orr.st3
search.trend(RR=RRmulti, y=brainmasscet,x1=x1.mass,x1.residuals=TRUE,
clus=cc)->STcet.resi
overfitRR(RR=RRmulti,y=brainmasscet,trend.args = list(x1.residuals=TRUE),
x1=x1.mass,nsim=10,clus=cc)->orr.st4
# Case 5 searching convergence between clades and within a single state
data("DataFelids")
DataFelids$PCscoresfel->PCscoresfel
DataFelids$treefel->treefel
DataFelids$statefel->statefel
RRphylo(tree=treefel,y=PCscoresfel,clus=cc)->RRfel
search.conv(RR=RRfel, y=PCscoresfel, min.dim=5, min.dist="node9",clus=cc)->SC.clade
as.numeric(c(rownames(SC.clade[[1]])[1],as.numeric(as.character(SC.clade[[1]][1,1]))))->conv.nodes
overfitRR(RR=RRfel, y=PCscoresfel,conv.args =
list(node=conv.nodes,state=statefel,declust=TRUE),nsim=10,clus=cc)->orr.sc
# Case 6 overfitRR on PGLS_fossil
library(phytools)
rtree(100)->tree
fastBM(tree)->resp
fastBM(tree,nsim=3)->resp.multi
fastBM(tree)->pred1
fastBM(tree)->pred2
PGLS_fossil(modform=y1~x1+x2,data=list(y1=resp,x2=pred1,x1=pred2),tree=tree)->pgls_noRR
RRphylo(tree,resp,clus=cc)->RR
PGLS_fossil(modform=y1~x1+x2,data=list(y1=resp,x2=pred1,x1=pred2),tree=tree,RR=RR)->pgls_RR
overfitRR(RR=RR,y=resp,
pgls.args=list(modform=y1~x1+x2,data=list(y1=resp,x2=pred1,x1=pred2),
tree=TRUE,RR=TRUE),nsim=10,clus=cc)->orr.pgls1
PGLS_fossil(modform=y1~x1+x2,data=list(y1=resp.multi,x2=pred1,x1=pred2),tree=tree)->pgls2_noRR
RRphylo(tree,resp.multi,clus=cc)->RR
PGLS_fossil(modform=y1~x1+x2,data=list(y1=resp.multi,x2=pred1,x1=pred2),tree=tree,RR=RR)->pgls2_RR
overfitRR(RR=RR,y=resp.multi,
pgls.args=list(modform=y1~x1+x2,data=list(y1=resp.multi,x2=pred1,x1=pred2),
tree=TRUE,RR=TRUE),nsim=10,clus=cc)->orr.pgls2
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.