cross_validate_genomic_prediction: Run a single cross-validation with BGLR.
In hemstrow/snpR: Whole-Genome Analysis Tools for Use with Single Nucleotide Polymorphism Data

cross_validate_genomic_prediction

R Documentation

Run a single cross-validation with BGLR.

Description

Run genomic prediction given a single response variable (usually a phenotype) using the BGLR function. Unlike other snpR functions, this returns the resulting model directly, so overwrite with caution. This will leave a specified portion of the samples out when running the model, then perform cross-validation.

Usage

cross_validate_genomic_prediction(
  x,
  response,
  iterations = 10000,
  burn_in = 1000,
  thin = 100,
  cross_percentage = 0.9,
  model = "BayesB",
  cross_samples = NULL,
  plot = TRUE,
  interpolate = "bernoulli"
)

Arguments

`x`	snpRdata object
`response`	character. Name of the column containing the response variable of interest. Must match a column name in sample metadata.
`iterations`	numeric. Number of iterations to run the MCMC chain for.
`burn_in`	numeric. Number of burn in iterations to run prior to the MCMC chain.
`thin`	numeric. Number of iterations to discard between each recorded data point.
`cross_percentage`	numeric, default 0.9. The proportion of sample to use to create the model. Must be greater than 0 and less than 1.
`model`	character, default "BayesB". Prediction model to use, see description for the ETA argument in `BGLR`.
`cross_samples`	numeric, default NULL. Optional vector of sample indices to use for cross-validation. If provided, the cross_percentage argument will be ignored.
`plot`	logical, default TRUE. If TRUE, will generate a ggplot of the cross-validation results.
`interpolate`	character, default "bernoulli". Interpolation method for missing data. Options: bernoulli: binomial draws for the minor allele. af: insertion of the average allele frequency .

Details

This function is provided as a wrapper to plug snpRdata objects into the BGLR function in order to easily run genomic prediction on a simple model where a single, sample specific meta data variable is provided as the response variable. To do so, this function formats the data into a transposed "sn" format, as described in format_snps using the bernoulli method to interpolate missing genotypes. Several different prediction models are available, see the documentation the ETA argument in BGLR for details. Defaults to the "BayesB" model, which assumes a "spike-slab" prior for allele effects on phenotype where most markers have a very small effect size and a few can have a much larger effect.

Unlike most snpR functions, this function does not support facets, since each run can be very slow. Instead, an individual facet and facet level of interest should be selected with subset_snpR_data. See examples.

The portion of samples left out for cross-validation is defined by the cross_percentage argument. Specifically, the proportion given will be used to create the model, the remaining portion will be left out. For example, if cross_percentage = 0.9, 90 the remaining 10 cross_samples are provided, the specified samples (by column index) will be used used for cross validation instead. This may be useful for a systematic leave-one-out cross-validation.

See documentation for BGLR for more details and for a full list of references.

Value

A list containing:

model: The results from run_genomic_prediction.
model.samples: Indices of the samples used to construct the model.
cross.samples: Indices of the samples used to cross-validate the model.
comparison: A data.frame containing the observed and predicted phenotypes/Breeding Values for the cross-validation samples.
rsq: The r^2 value for the observed and predicted phenotypes/Breeding Values for the cross-validation samples.

Author(s)

William Hemstrom

References

Pérez, P., and de los Campos, G. (2014). Genetics.

Examples

# run and plot a basic prediction
## add some dummy phenotypic data.
dat <- stickSNPs
sample.meta(dat) <- cbind(weight = rnorm(ncol(stickSNPs)), 
                          sample.meta(stickSNPs))
## run cross_validation
cross_validate_genomic_prediction(dat, response = "weight", 
                                  iterations = 1000, burn_in = 100, 
                                  thin = 10)

hemstrow/snpR documentation built on July 5, 2025, 4:38 a.m.