View source: R/association_functions.R
cross_validate_genomic_prediction | R Documentation |
Run genomic prediction given a single response variable (usually a phenotype)
using the BGLR
function. Unlike other snpR functions,
this returns the resulting model directly, so overwrite with caution. This
will leave a specified portion of the samples out when running the model,
then perform cross-validation.
cross_validate_genomic_prediction(
x,
response,
iterations = 10000,
burn_in = 1000,
thin = 100,
cross_percentage = 0.9,
model = "BayesB",
cross_samples = NULL,
plot = TRUE,
interpolate = "bernoulli"
)
x |
snpRdata object |
response |
character. Name of the column containing the response variable of interest. Must match a column name in sample metadata. |
iterations |
numeric. Number of iterations to run the MCMC chain for. |
burn_in |
numeric. Number of burn in iterations to run prior to the MCMC chain. |
thin |
numeric. Number of iterations to discard between each recorded data point. |
cross_percentage |
numeric, default 0.9. The proportion of sample to use to create the model. Must be greater than 0 and less than 1. |
model |
character, default "BayesB". Prediction model to use, see
description for the ETA argument in |
cross_samples |
numeric, default NULL. Optional vector of sample indices to use for cross-validation. If provided, the cross_percentage argument will be ignored. |
plot |
logical, default TRUE. If TRUE, will generate a ggplot of the cross-validation results. |
interpolate |
character, default "bernoulli". Interpolation method for missing data. Options:
. |
This function is provided as a wrapper to plug snpRdata objects into the
BGLR
function in order to easily run genomic prediction
on a simple model where a single, sample specific meta data variable is
provided as the response variable. To do so, this function formats the data
into a transposed "sn" format, as described in format_snps
using the bernoulli method to interpolate missing genotypes. Several
different prediction models are available, see the documentation the ETA
argument in BGLR
for details. Defaults to the "BayesB"
model, which assumes a "spike-slab" prior for allele effects on phenotype
where most markers have a very small effect size and a few can have a much
larger effect.
Unlike most snpR functions, this function does not support facets, since each
run can be very slow. Instead, an individual facet and facet level of
interest should be selected with subset_snpR_data
. See
examples.
The portion of samples left out for cross-validation is defined by the cross_percentage argument. Specifically, the proportion given will be used to create the model, the remaining portion will be left out. For example, if cross_percentage = 0.9, 90 the remaining 10 cross_samples are provided, the specified samples (by column index) will be used used for cross validation instead. This may be useful for a systematic leave-one-out cross-validation.
See documentation for BGLR
for more details and for a
full list of references.
A list containing:
model: The results from
run_genomic_prediction
.
model.samples: Indices of the samples used to construct the model.
cross.samples: Indices of the samples used to cross-validate the model.
comparison: A data.frame containing the observed and predicted phenotypes/Breeding Values for the cross-validation samples.
rsq: The r^2 value for the observed and predicted phenotypes/Breeding Values for the cross-validation samples.
William Hemstrom
Pérez, P., and de los Campos, G. (2014). Genetics.
# run and plot a basic prediction
## add some dummy phenotypic data.
dat <- stickSNPs
sample.meta(dat) <- cbind(weight = rnorm(ncol(stickSNPs)),
sample.meta(stickSNPs))
## run cross_validation
cross_validate_genomic_prediction(dat, response = "weight",
iterations = 1000, burn_in = 100,
thin = 10)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.