View source: R/searchOptimalConfiguration.R
| searchOptimalConfiguration | R Documentation |
Greedy, stepwise search for evolutionary regime shifts on a phylogeny
using multivariate mvgls fits from mvMORPH. The routine:
builds one-shift candidate trees for all internal nodes meeting a tip-size threshold
(via generatePaintedTrees),
fits each candidate in parallel and ranks them by improvement in the chosen
information criterion (IC; GIC or BIC),
iteratively adds shifts that pass a user-defined acceptance threshold,
optionally revisits accepted shifts to prune overfitting using a small IC tolerance window,
optionally computes per-shift IC weights by refitting the model with each shift removed.
Models are fitted directly in multivariate trait space (no PCA), assuming a multi-rate
Brownian Motion with proportional VCV scaling across regimes. Extra arguments in ...
are forwarded to mvgls. In practice, method and
error are often the most important of these: the package vignettes use
method = "H&L" for intercept-only, high-dimensional response matrices and
method = "LL" for formula-based searches with predictors, while
error = TRUE asks mvgls() to estimate a nuisance measurement-error
(intraspecific-variance) term from the data.
searchOptimalConfiguration(
baseline_tree,
trait_data,
formula = "trait_data ~ 1",
min_descendant_tips,
num_cores = 2,
ic_uncertainty_threshold = 1,
shift_acceptance_threshold = 1,
uncertaintyweights = FALSE,
uncertaintyweights_par = FALSE,
plot = FALSE,
IC = "GIC",
store_model_fit_history = TRUE,
verbose = FALSE,
...
)
baseline_tree |
A rooted |
trait_data |
A |
formula |
Character formula passed to |
min_descendant_tips |
Integer ( |
num_cores |
Integer. Number of workers for parallel candidate scoring. Uses
|
ic_uncertainty_threshold |
Numeric ( |
shift_acceptance_threshold |
Numeric ( |
uncertaintyweights |
Logical. If |
uncertaintyweights_par |
Logical. As above, but compute per-shift IC weights in parallel
using future.apply. Exactly one of |
plot |
Logical. If |
IC |
Character. Which information criterion to use, one of |
store_model_fit_history |
Logical. If |
verbose |
Logical. If |
... |
Additional arguments passed to |
Input requirements.
Tree: baseline_tree should be a rooted phylo tree
with branch lengths interpreted in units of time. An ultrametric tree is not required.
The starting tree does not need to already be painted; searchOptimalConfiguration()
paints a single baseline regime internally before building shifted candidates.
Trait data alignment: rownames(trait_data) must match
baseline_tree$tip.label in both names and order; any tips without data should be
pruned beforehand.
Data type: trait_data is typically a numeric matrix of continuous traits;
high-dimensional settings (p \ge n) are supported via penalized-likelihood
mvgls() fits.
Search outline.
Baseline: Fit mvgls on the baseline tree (single regime) to obtain the baseline IC.
Candidates: Build one-shift trees for eligible internal nodes
(generatePaintedTrees); fit each with
fitMvglsAndExtractGIC.formula or fitMvglsAndExtractBIC.formula
(internal helpers; not exported) and rank by \DeltaIC.
Greedy add: Add the top candidate, refit, and accept if
\DeltaIC \ge shift_acceptance_threshold; continue down the ranked list.
Optional IC weights: If uncertaintyweights (or uncertaintyweights_par)
is TRUE, compute an IC weight for each accepted shift by refitting the final model with that
shift removed and comparing the two ICs via aicw.
Parallelization. Candidate sub-model fits are distributed with future + future.apply.
On Unix, multicore is used; on Windows, multisession. A sequential plan is restored afterward.
Plotting. If plot = TRUE, trees are rendered with
plotSimmap(); shift IDs are labeled with nodelabels().
Regime VCVs. The returned $VCVs are extracted from the fitted multi-regime model via
extractRegimeVCVs and reflect regime-specific covariance
estimates (when mvgls is fitted under a PL/ML method).
For high-dimensional trait datasets (p \ge n), penalized-likelihood settings in
mvgls() are often required for stable estimation. The package vignettes
distinguish two common workflows. For intercept-only searches on high-dimensional
response matrices (for example, GPA-aligned landmark data), the jaw-shape vignette
uses method = "H&L" with the default "RidgeArch" penalty; in
mvMORPH, this is a fast approximation to penalized LOOCV and is only available
for intercept-only models. For formula-based searches with predictors, the avian
skeleton vignette uses method = "LL" instead. When IC = "BIC",
method = "LL" should be used. Across empirical workflows, error = TRUE
is often a sensible default because it asks mvgls() to estimate a nuisance
measurement-error (intraspecific-variance) term from the data. Users should consult
the mvMORPH documentation for details on available methods and penalties and
tune these choices to the structure of their data.
A named list with (at minimum):
user_input: captured call (as a list) for reproducibility.
tree_no_uncertainty_transformed: SIMMAP tree from the optimal (no-uncertainty) model
on the transformed scale used internally by mvgls.
tree_no_uncertainty_untransformed: same topology with original edge lengths restored.
model_no_uncertainty: the final mvgls model object.
shift_nodes_no_uncertainty: integer vector of accepted shift nodes.
optimal_ic: final IC value; baseline_ic: baseline IC.
IC_used: "GIC" or "BIC"; num_candidates: count of candidate one-shift models evaluated.
model_fit_history: if store_model_fit_history = TRUE, a list of per-iteration fits
(loaded from temporary files written during the run) and an ic_acceptance_matrix
(IC value and acceptance flag per step).
VCVs: named list of regime-specific VCV matrices extracted from the final model
(penalized-likelihood estimates if PL was used).
Additional components appear conditionally:
ic_weights: a data.frame of per-shift IC weights and evidence ratios when
uncertaintyweights or uncertaintyweights_par is TRUE.
warnings: character vector of warnings/errors encountered during fitting (if any).
The search is greedy and may converge to a local optimum. Use a stricter
shift_acceptance_threshold to reduce overfitting, and re-run the search
with different min_descendant_tips and IC choices ("GIC" vs "BIC")
to assess stability of the inferred shifts. For a given run, the optional IC-weight
calculations (uncertaintyweights or uncertaintyweights_par) can be used
to quantify support for individual shifts. It is often helpful to repeat the analysis
under slightly different settings (e.g., thresholds or candidate-size constraints) and
compare the resulting sets of inferred shifts.
Internally, this routine coordinates multiple unexported helper functions:
generatePaintedTrees, fitMvglsAndExtractGIC.formula,
fitMvglsAndExtractBIC.formula, addShiftToModel,
removeShiftFromTree, and extractRegimeVCVs. Through these,
it may also invoke lower-level utilities such as paintSubTree_mod
and paintSubTree_removeShift. These helpers are internal
implementation details and are not part of the public API.
mvgls, GIC, BIC,
plot_ic_acceptance_matrix for visualizing IC trajectories and shift
acceptance decisions, and generateViridisColorScale for mapping
regime-specific rates or parameters to a viridis color scale when plotting trees;
packages: mvMORPH, future, future.apply, phytools, ape.
library(ape)
library(phytools)
library(mvMORPH)
set.seed(1)
# Simulate a tree
tr <- pbtree(n = 50, scale = 1)
# Define two regimes: "0" (baseline) and "1" (high-rate) on a subset of tips
states <- setNames(rep("0", Ntip(tr)), tr$tip.label)
high_clade_tips <- tr$tip.label[1:20]
states[high_clade_tips] <- "1"
# Make a SIMMAP tree for the BMM simulation
simmap <- phytools::make.simmap(tr, states, model = "ER", nsim = 1)
# Simulate traits under a BMM model with ~10x higher rate in regime "1"
sigma <- list(
"0" = diag(0.1, 2),
"1" = diag(1.0, 2)
)
theta <- c(0, 0)
sim <- mvMORPH::mvSIM(
tree = simmap,
nsim = 1,
model = "BMM",
param = list(
ntraits = 2,
sigma = sigma,
theta = theta
)
)
# mvSIM returns either a matrix or a list of matrices depending on mvMORPH version
X <- if (is.list(sim)) sim[[1]] else sim
rownames(X) <- simmap$tip.label
# Run the search on the unpainted tree (single baseline regime)
res <- searchOptimalConfiguration(
baseline_tree = as.phylo(simmap),
trait_data = X,
formula = "trait_data ~ 1",
min_descendant_tips = 10,
num_cores = 1, # keep it simple / CRAN-safe
shift_acceptance_threshold = 20, # conservative GIC threshold
IC = "GIC",
plot = FALSE,
store_model_fit_history = FALSE,
verbose = FALSE
)
res$shift_nodes_no_uncertainty
res$optimal_ic - res$baseline_ic
str(res$VCVs)
## Not run:
# Intercept-only empirical-style search:
# high-dimensional response matrix with H&L + measurement error
res_hl <- searchOptimalConfiguration(
baseline_tree = as.phylo(simmap),
trait_data = X,
formula = "trait_data ~ 1",
min_descendant_tips = 10,
num_cores = 2,
shift_acceptance_threshold = 20,
uncertaintyweights_par = TRUE,
IC = "GIC",
plot = FALSE,
method = "H&L",
error = TRUE,
store_model_fit_history = TRUE,
verbose = TRUE
)
# Formula-based search with a predictor:
# use LL when the model includes predictors
dat <- data.frame(
trait1 = X[, 1],
trait2 = X[, 2],
predictor = rnorm(nrow(X))
)
rownames(dat) <- simmap$tip.label
res_ll <- searchOptimalConfiguration(
baseline_tree = as.phylo(simmap),
trait_data = dat,
formula = "trait_data[, 1:2] ~ trait_data[, 3]",
min_descendant_tips = 10,
num_cores = 2,
shift_acceptance_threshold = 20,
IC = "GIC",
plot = FALSE,
method = "LL",
error = TRUE,
store_model_fit_history = TRUE,
verbose = TRUE
)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.