xValidation: Run a BEDASSLE cross-validation analysis

View source: R/model.comparison.R

xValidationR Documentation

Run a BEDASSLE cross-validation analysis

Description

xValidation runs a BEDASSLE cross-validation analysis

Usage

xValidation(
  partsFile,
  nReplicates,
  nPartitions,
  genDist,
  geoDist = NULL,
  envDist = NULL,
  nLoci,
  prefix,
  nIter = 2000,
  parallel = FALSE,
  nNodes = 1,
  saveFiles = FALSE,
  ...
)

Arguments

partsFile

A filename (in quotes, with the full file path) to the data partitions object to be used in the k-fold cross-validation procedure. This object can be created using the makePartitions function in this package.

nReplicates

An integer giving the number of cross-validation replicates to be run. This should be the same as the length of the list specified in the partsFile.

nPartitions

An integer giving the number of data folds within each run. This should be the same as the length of each the list specified for each replicate in the partsFile.

genDist

A matrix of pairwise pi measured between all pairs of samples.

geoDist

A matrix of pairwise geographic distances measured between all pairs of samples. A value of NULL runs a model without geographic distance as a predictor of genetic differentiation.

envDist

A matrix of pairwise environmental distances measured between all pairs of samples. If there are multiple environmental distance measures, this argument should be a list of distance matrices. A value of NULL runs a model without geographic distance as a predictor of genetic differentiation.

nLoci

The total number of independent loci used to calculate pairwise pi (genDist) in the dataset.

prefix

A character vector giving the prefix to be attached to all output files.

nIter

An integer giving the number of iterations each MCMC chain is run. Default is 2e3. If the number of iterations is greater than 500, the MCMC is thinned so that the number of retained iterations is 500 (before burn-in).

parallel

A logical value indicating whether or not to run the different cross-validation replicates in parallel. Default is FALSE. For more details on how to set up runs in parallel, see the model comparison vignette.

nNodes

Number of nodes to run parallel analyses on. Default is NULL. Ignored if parallel is FALSE. For more details in how to set up runs in parallel, see the model comparison vignette.

saveFiles

A logical value indicating whether to automatically save the output files from each cross-validation replicate. Default is FALSE.

...

Further options to be passed to rstan::sampling (e.g., adapt_delta).

Details

This function initiates a k-fold cross-validation analysis to determine the statistical support for the specified model.

Value

This function returns a matrix with nReplicates columns and nPartitions columns giving the likelihood of each data partition (averaged over the posterior distribution of the MCMC) in each replicate analysis. The mean of these values gives an estimate of the predictive accuracy of the specified model given the data provided. The mean and standard error of the data partition likelihoods across replicates can be used for comparing models (e.g., with a t-test).

In addition, this function saves a text file ("..._xval_results.txt"), containing the returned likelihoods for each replicate and data partition.


gbradburd/bedassle documentation built on May 20, 2022, 1 p.m.