kFoldCrossValidation: Perform K-Fold Cross-Validation for corHMM Models
In thej022214/corHMM: Hidden Markov Models of Character Evolution

kFoldCrossValidation

R Documentation

Perform K-Fold Cross-Validation for corHMM Models

Description

This function performs k-fold cross-validation on a given corHMM model by dividing the data into k equally sized subsets. The function evaluates model performance across multiple lambda regularization values, if provided. Optionally, it can save the trained models for each fold and return the cross-validation results.

Usage

kFoldCrossValidation(corhmm_obj, k, lambdas = NULL, return_model = TRUE, 
save_model_dir = NULL, model_name = NULL)

Arguments

`corhmm_obj`	A `corHMM` object that contains a fitted model.
`k`	An integer specifying the number of folds to divide the data into for cross-validation.
`lambdas`	A numeric vector of lambda regularization values to evaluate during cross-validation. If `NULL`, the lambda value from `corhmm_obj` will be used. Defaults to `NULL`.
`return_model`	A logical value indicating whether to return the trained models for each fold. Defaults to `TRUE`.
`save_model_dir`	A character string specifying the directory to save the trained models for each fold. If `NULL`, models will not be saved. Defaults to `NULL`.
`model_name`	A character string specifying the base name for saved model files. If `NULL`, a default name `"corhmm.obj"` is used. Defaults to `NULL`.

Details

The function splits the data into k folds and trains a separate corHMM model for each fold by leaving one fold out as the test set. The remaining folds are used for training the model. The performance of the model is evaluated on the test set using a divergence-based (Jensen-Shannon Divergence) scoring method. Evaluations are based on estimating the tips which were removed for that particular fold given the newly fitted model.

The function supports evaluating models across different lambda regularization values. If lambdas are provided, models are trained and evaluated for each lambda value. The results, including the models (if return_model = TRUE) and cross-validation scores, are returned as a list.

Value

A list of cross-validation results, including the following components:

`models`	A list of the trained models for each fold (if `return_model = TRUE`).
`scores`	A numeric vector of the cross-validation scores for each fold.
`averageScore`	The average cross-validation score across all folds.

Author(s)

James D. Boyko

Examples


#data(primates)
#phy <- multi2di(primates[[1]])
#data <- primates[[2]]
#dredge_fits <- corHMMDredge(phy = phy, data = data, 
# max.rate.cat = 1, pen.type = "l1", 
#	root.p = "maddfitz", lambda = 1, nstarts = 10, n.cores = 10)
#model_table <- getModelTable(dredge_fits)
#dredge_model <- dredge_fits[[which.min(model_table$dAIC)]]
#k_fold_res <- kFoldCrossValidation(dredge_model,
# k = 5, lambdas = c(0,0.25,0.5,0.75,1))
#cv_table <- getCVTable(k_fold_res)

thej022214/corHMM documentation built on April 13, 2025, 9:37 a.m.