mpp_CV: MPP cross-validation

View source: R/mpp_CV.R

mpp_CVR Documentation

MPP cross-validation

Description

Evaluation of MPP QTL detection procedure by cross-validation (CV).

Usage

mpp_CV(
  pop.name = "MPP_CV",
  trait.name = "trait1",
  mppData,
  trait = 1,
  her = 1,
  Rep = 10,
  k = 5,
  Q.eff = "cr",
  thre.cof = 3,
  win.cof = 50,
  N.cim = 1,
  window = 20,
  thre.QTL = 3,
  win.QTL = 20,
  backward = TRUE,
  alpha.bk = 0.05,
  n.cores = 1,
  verbose = TRUE,
  output.loc
)

Arguments

pop.name

Character name of the studied population. Default = "MPP_CV".

trait.name

Character name of the studied trait. Default = "trait1".

mppData

An object of class mppData.

trait

Numerical or character indicator to specify which trait of the mppData object should be used. Default = 1.

her

Numeric value between 0 and 1 representing the heritability of the trait. her can be a single value or a vector specifying each within cross heritability. Default = 1.

Rep

Numeric value representing the number of repetition of the k-fold procedure. Default = 10.

k

Numeric value representing the number of folds for the within cross partition of the population. Default = 5.

Q.eff

Character expression indicating the assumption concerning the QTL effects: 1) "cr" for cross-specific; 2) "par" for parental; 3) "anc" for ancestral; 4) "biall" for a bi-allelic. For more details see mpp_SIM. Default = "cr".

thre.cof

Numeric value representing the -log10(p-value) threshold above which a position can be peaked as a cofactor. Default = 3.

win.cof

Numeric value in centi-Morgan representing the minimum distance between two selected cofactors. Default = 50.

N.cim

Numeric value specifying the number of time the CIM analysis is repeated. Default = 1.

window

Numeric distance (cM) on the left and the right of a cofactor position where it is not included in the model. Default = 20.

thre.QTL

Numeric value representing the -log10(p-value) threshold above which a position can be selected as QTL. Default = 3.

win.QTL

Numeric value in centi-Morgan representing the minimum distance between two selected QTLs. Default = 20.

backward

Logical value. If backward = TRUE, the function performs a backward elimination on the list of selected QTLs. Default = TRUE.

alpha.bk

Numeric value indicating the significance level for the backward elimination. Terms with p-values above this value will iteratively be removed. Default = 0.05.

n.cores

Numeric. Specify here the number of cores you like to use. Default = 1.

verbose

Logical value indicating if the progresses of the CV should be printed. Default = TRUE.

output.loc

Path where a folder will be created to save the results.

Details

For details on the MPP QTL detection models see mpp_SIM documentation. The CV scheme is adapted from Utz et al. (2000) to the MPP context. A single CV run works like that:

  1. Generation of a k-fold partition of the data. The partition is done within crosses. Each cross is divided into k subsets. Then for the kth repetition, the kth subset is used as validation set, the rest goes into the training set.

  2. For the kth repetition, utilization of the training set for cofactor selection and multi-QTL model determination (mpp_SIM and mpp_CIM). If backward = TRUE, the final list of QTLs is tested simultaneously using a backward elimination (mpp_back_elim).

  3. Use the list of detected QTLs in the training set to calculate the proportion of genetic variance explained by all detected QTLs in the training set (p.ts = R2.ts/h2). Where R2.ts is the adjusted R squared and h2 is the average within cross heritability (her). By default, her = 1, which mean that

    For each single QTL effect, difference partial R squared are also calculated. Difference R squared are computed by doing the difference between a model with all QTLs and a model without the ith position. For details about R squared computation and adjustment look at QTL_R2.

  4. Use the estimates of the QTL effects in the training set (B.ts) to predict the phenotypic values of the validation set. y.pred.vs = X.vs*B.ts. Computes the predicted R squared in the validation set using the squared Pearson correlation coefficient between the real values (y.vs) and the predicted values (y.pred.vs). R2.vs = cor(y.ts,y.pred.ts)^2. Then the predicted genetic variance in the validation set (p.vs) is equal to p.vs = R2.vs/h2. For heritability correction, the user can provide a single value for the average within cross heritability or a vector specifying each within cross heritability. By default, her = 1, which means that the results represent the proportion of phenotypic variance explained (predicted) in the training (validation) sets.

    The predicted R squared is computed per cross and then averaged at the population level (p.ts). Both results are returned. Partial QTL predicted R squared are also calculated using the difference between the predicted R squared using all QTL and the predicted R squared without QTL i. The bias between p.ts and p.vs is calculated as bias = 1 - (p.vs/p.ts).

Value

List containing the following results items:

CV_res

Data.frame containing for each CV run: 1) the number of detected QTL; 2) the proportion of explained genetic variance in the TS (p.ts); 3) the proportion of predicted genetic variance in the VS (p.vs) at the population level (average of within cross prediction); the bias between p.ts and p.vs (bias = 1-(p.vs/p.ts)).

p.vs.cr

Matrix containing the within cross p.vs for each CV run.

QTL

Data.frame containing: 1) the list of QTL position detected at least one time during the entire CV process; 2) the number of times the position has been detected; 3) the average partial p.ts of the QTL position; 4) the average partial p.vs of the QTL position; 5) the average partial bias of the QTL position.

QTL.profiles

Data.frame -log10(p-value) QTL profiles of the different CV runs.

The results elements return as R object are also saved as text files at the specified output location (output.loc). A transparency plot of the CV results (plot.pdf) is also saved.

Author(s)

Vincent Garin

References

Utz, H. F., Melchinger, A. E., & Schon, C. C. (2000). Bias and sampling error of the estimated proportion of genotypic variance explained by quantitative trait loci determined from experimental data in maize using cross validation and validation with independent samples. Genetics, 154(4), 1839-1849.

See Also

mpp_back_elim, mpp_CIM, mpp_perm, mpp_SIM, QTL_R2

Examples


## Not run: 

data(mppData)

# Specify a location where your results will be saved
my.loc <- tempdir()

CV <- mpp_CV(pop.name = "USNAM", trait.name = "ULA", mppData = mppData,
her = .4, Rep = 1, k = 3, verbose = FALSE, output.loc = my.loc)


## End(Not run)


vincentgarin/mppR documentation built on March 13, 2024, 7:30 p.m.