overfitSC: Testing search.conv overfit
In RRphylo: Phylogenetic Ridge Regression Methods for Comparative Studies

overfitSC

R Documentation

Testing search.conv overfit

Description

Testing the robustness of search.conv (Castiglione et al. 2019b) results to sampling effects and phylogenetic uncertainty.

Usage

overfitSC(RR,y.list,phylo.list,s=0.25,
  nodes=NULL,state=NULL,declust=FALSE,
  aces=NULL,x1=NULL,aces.x1=NULL,cov=NULL,rootV=NULL, clus=0.5)

Arguments

`RR`	an object produced by `RRphylo`.
`y.list`	a list of multivariate phenotype related to the phylogenetic trees provided as `phylo.list` (see Details).
`phylo.list`	a list of phylogenetic trees. The phylogenies in `phylo.list` can either be generated by `resampleTree` or provided by the user. In this latter case, the function is meant to test the robustness of results on alternative topologies, thus the phylogenies must have the same species arranged differently.
`s`	the percentage of tips to be cut off. It is set at 25% by default. If `phylo.list` is provided, this argument is ignored.
`nodes`	the argument `nodes` as passed to `search.conv`. Please notice, the arguments `nodes` and `state` can be indicated at the same time.
`state`	the argument `state` as passed to `search.conv`. Please notice, the arguments `nodes` and `state` can be indicated at the same time.
`declust`	the argument `declust` as passed to `search.conv`.
`aces`	if used to produce the `RR` object, the vector of those ancestral character values at nodes known in advance must be specified. Names correspond to the nodes in the tree.
`x1`	the additional predictor to be specified if the RR object has been created using an additional predictor (i.e. multiple version of `RRphylo`). `'x1'` vector must be as long as the number of nodes plus the number of tips of the tree, which can be obtained by running `RRphylo` on the predictor as well, and taking the vector of ancestral states and tip values to form the `x1`.
`aces.x1`	a named vector of ancestral character values at nodes for `x1`. It must be indicated if the RR object has been created using both `aces` and `x1`. Names correspond to the nodes in the tree.
`cov`	if used to produce the `RR` object, the covariate must be specified. As in `RRphylo`, the covariate vector must be as long as the number of nodes plus the number of tips of the tree, which can be obtained by running `RRphylo` on the covariate as well, and taking the vector of ancestral states and tip values to form the covariate.
`rootV`	if used to produce the `RR` object, the phenotypic value at the tree root must be specified.
`clus`	the proportion of clusters to be used in parallel computing. To run the single-threaded version of `overfitSC` set `clus` = 0.

Details

Methods using a large number of parameters risk being overfit. This usually translates in poor fitting with data and trees other than the those originally used. With RRphylo methods this risk is usually very low. However, the user can assess how robust the results of search.conv are by running resampleTree and overfitSC. The former is used to subsample the tree according to a s parameter (that is the proportion of tips to be removed from the tree) and to alter tree topology by means of swapONE. Once a list of new phylogenetic trees (phylo.list) is generated, in case the shape data to feed to search.conv are reduced (e.g. via SVD), it is necessary to recompute data reduction, thus obtaining a list of multivariate phenotypes related to the phylogenetic trees (y.list). Finally, overfitSC performs RRphylo and search.conv on each new set of tree and data. Thereby, both the potential for overfit and phylogenetic uncertainty are accounted for straight away.

Otherwise, a list of alternative phylogenies can be supplied to overfitSC. In this case subsampling and swapping arguments are ignored, and robustness testing is performed on the alternative topologies as they are. If a clade has to be tested in search.conv, the function scans each alternative topology searching for the corresponding clade. If the species within such clade on the alternative topology differ more than 10% from the species within the clade in the original tree, the identity of the clade is considered disrupted and the test is not performed.

Value

The function returns a 'RRphyloList' object containing:

$RR.list a 'RRphyloList' including the results of each RRphylo performed within overfitSC

$SCnode.list a 'RRphyloList' including the results of each search.conv - clade condition performed within overfitSC

$SCstate.list a 'RRphyloList' including the results of each search.conv - state condition performed within overfitSC

$conv.results a list including results for search.conv performed under clade and state conditions. If a node pair is specified within conv.args, the $clade object contains the percentage of simulations producing significant p-values for convergence between the clades, and the proportion of tested trees (i.e. where the clades identity was preserved; always 1 if no phylo.list is supplied). If a state vector is supplied within conv.args, the object $state contains the percentage of simulations producing significant p-values for convergence within (single state) or between states (multiple states).

The output always has an attribute "Call" which returns an unevaluated call to the function.

Author(s)

Silvia Castiglione, Giorgia Girardi, Carmela Serio

References

Castiglione, S., Serio, C., Tamagnini, D., Melchionna, M., Mondanaro, A., Di Febbraro, M., Profico, A., Piras, P.,Barattolo, F., & Raia, P. (2019b). A new, fast method to search for morphological convergence with shape data. PLoS ONE, 14, e0226949. https://doi.org/10.1371/journal.pone.0226949

Examples

## Not run: 
require(phytools)
require(Morpho)
require(ape)

cc<- 2/parallel::detectCores()

DataFelids$treefel->treefel
DataFelids$statefel->statefel
DataFelids$landfel->feldata

# perform data reduction via Procrustes superimposition (in this case) and RRphylo
procSym(feldata)->pcafel
pcafel$PCscores->PCscoresfel

RRphylo(treefel,PCscoresfel,clus=cc)->RRfelids

# apply search.conv under nodes and state condition
search.conv(RR=RRfelids, y=PCscoresfel, min.dim=5, min.dist="time38", clus=cc)->sc.clade.time

search.conv(tree=treefel, y=PCscoresfel, state=statefel, declust=TRUE, clus=cc)->sc.state

# select converging clades returned in sc.clade.time
felnods<-rbind(c(85,155),c(85,145))

## overfitSC routine

# generate a list of subsampled and swapped phylogenies to test for search.conv
# robustness. Use as reference tree the phylogeny returned by RRphylo.
# Set the nodes and the categories under testing as arguments of
# resampleTree so that it maintains no less than 5 species at least in each
# clade/state.
treefel.list<-resampleTree(RRfelids$tree,s=0.15,nodes=unique(c(felnods)),categories=statefel,
                        nsim=15,swap.si=0.1,swap.si2=0.1)

# match the original data with each subsampled-swapped phylogeny in treefel.list
#  and repeat data reduction
y.list<-lapply(treefel.list,function(k){
  treedataMatch(k,feldata)[[1]]->ynew
  procSym(ynew)$PCscores
})

# test for robustness of search.conv results by overfitSC
oSC<-overfitSC(RR=RRfelids,phylo.list=treefel.list,y.list=y.list,
               nodes = felnods,state=statefel,clus=cc)


## End(Not run)

RRphylo documentation built on April 3, 2025, 9:43 p.m.