runCrossVal: Run k-fold cross-validation

View source: R/gmsFunctions.R

runCrossValR Documentation

Run k-fold cross-validation

Description

Assess the accuracy of predicted previously unobserved genotypes (individuals) based on the available training data. Runs k-fold cross-validation for potentially multiple traits and optionally computing prediction accuracy on user-specified selection index. Three models are enabled: additive-only ("A"), additive-plus-dominance ("AD") and a directional-dominance model that incorporates a genome-wide homozygosity effect ("DirDom"). The union of all genotypes scored for all traits is broken into k-folds a user specified number of times. Subsequently each train-test pair is predicted for each trait and accuracies are computed.

Usage

runCrossVal(
  blups,
  modelType,
  selInd,
  SIwts = NULL,
  grms,
  dosages = NULL,
  nrepeats,
  nfolds,
  ncores = 1,
  nBLASthreads = NULL,
  gid = "GID",
  seed = NULL,
  ...
)

Arguments

blups

nested data.frame with list-column "TrainingData" containing BLUPs. Each element of "TrainingData" list, is data.frame with de-regressed BLUPs, BLUPs and weights (WT) for training and test.

modelType

string, "A", "AD", "DirDom". modelType="A": additive-only, GEBVS modelType="AD": the "classic" add-dom model, GEBVS+GEDDs = GETGVs modelType="DirDom": the "genotypic" add-dom model with prop. homozygous fit as a fixed-effect, to estimate a genome-wide inbreeding effect. obtains add-dom effects, computes allele sub effects (α = a + d(q-p)) incorporates into GEBV and GETGV. "DirDom" requires dosages

selInd

logical, TRUE/FALSE, selection index accuracy estimates, requires input weights via SIwts

SIwts

required if selInd=FALSE, named vector of selection index weights, names match the "Trait" variable in blups

grms

list of GRMs where each element is named either A, D, or, AD. Matrices supplied must match required by A, AD and ADE models. For ADE grms=list(A=A,D=D)

dosages

dosage matrix. required only for modelType=="DirDom". Assumes SNPs coded 0, 1, 2. Nind rows x Nsnp cols, numeric matrix, with rownames and colnames to indicate SNP/ind ID

nrepeats

number of repeats

nfolds

number of folds

ncores

number of cores, parallelizes across repeat-folds

nBLASthreads

number of cores for each worker to use for multi-thread BLAS

gid

string variable name used for genotype ID's/ in e.g. blups (default="GID")

seed

numeric, use seed to achieve reproducibile train-test folds.

...

Value

Returns tidy results in a tibble with accuracy estimates for each rep-fold in a list-column "accuracyEstOut".

See Also

Other CrossVal: runParentWiseCrossVal()


wolfemd/genomicMateSelectR documentation built on July 1, 2022, 10:42 p.m.