Description Usage Arguments Details Value Examples
View source: R/cross_validation.R
x.val
performs cross-validation (CV) to estimate the accuracy of genome-wide prediction (otherwise known as genomic selection) for a specific training population (TP), i.e. a set of individuals for which phenotypic and genotypic data is available. Cross-validation can be conducted via one of two methods within x.val
, see Details
for more information.
1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | x.val(
G.in = NULL,
y.in = NULL,
min.maf = 0.01,
mkr.cutoff = 0.5,
entry.cutoff = 0.5,
remove.dups = T,
impute = "EM",
frac.train = 0.6,
nCV.iter = 100,
nFold = NULL,
nFold.reps = 1,
return.estimates = F,
CV.burnIn = 750,
CV.nIter = 1500,
models = c("rrBLUP", "BayesA", "BayesB", "BayesC", "BL", "BRR")
)
|
G.in |
TIP - Set header= |
y.in |
|
min.maf |
Optional |
mkr.cutoff |
Optional |
entry.cutoff |
Optional |
remove.dups |
Optional |
impute |
Options include |
frac.train |
Optional |
nCV.iter |
Optional |
nFold |
Optional |
nFold.reps |
Optional |
return.estimates |
Optional |
CV.burnIn |
Optional |
CV.nIter |
Optional |
models |
Optional |
Two CV methods are available within PopVar
:
CV method 1
: During each iteration a training (i.e. model training) set will be randomly sampled from the TP of size N*(frac.train), where N is the size of the TP, and the remainder of the TP is assigned to the validation set. The accuracies of individual models are expressed as average Pearson's correlation coefficient (r) between the genome estimated breeding value (GEBV) and observed phenotypic values in the validation set across all nCV.iter
iterations. Due to its amendibility to various TP sizes, CV method 1 is the default CV method in pop.predict
.
CV method 2
: nFold
independent validation sets are sampled from the TP and predicted by the remainder. For example, if nFold = 10 the TP will be split into 10 equal sets, each containing 1/10-th of the TP, which will be predicted by the remaining 9/10-ths of the TP. The accuracies of individual models are expressed as the average (r) between the GEBV and observed phenotypic values in the validation set across all nFold
folds. The process can be repeated nFold.reps
times with nFold
new independent sets being sampled each replication, in which case the reported prediction accuracies are averages across all folds and replications.
A list containing:
CVs
A dataframe
of CV results for each trait/model combination specified
If return.estimates
is TRUE
the additional items will be returned:
models.used
A list
of the models chosen to estimate marker effects for each trait
mkr.effects
A vector
of marker effect estimates for each trait generated by the respective prediction model used
betas
A list
of beta values for each trait generated by the respective prediction model used
1 2 3 4 5 6 7 8 9 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.