inferenceBGLR: Function to make inference on a cross validation analysis and...

Description Usage Arguments Details Value Author(s) References Examples

Description

Function to make inference on a cross validation analysis and a multi-location trial data set using BGLR

Usage

1
2
3
inferenceBGLR(P, id = "GERMPLASM", factor = "LOCATION", trait = "YIELD",
  CVscheme = NULL, modelName = c("model1", "model2", "model0"), G = G,
  outputDir = getwd(), verbose = TRUE, replications = 3, ...)

Arguments

P

a data frame that holds the design information and phenotypes for the data to be modeled. The data should hold following features, which are detailed below and represent the columns in the data frame. For inference on the thesis data set we use the data set P obtained by typing data(P) in the console.

id

character describing the column name for the names of the observations. Default is GERMPLASM.

factor

character describing the column name for the factor that describes the geographic location in the mutli-location trial. Default is LOCATION.

trait

character describing the column name for the phenotype to be modeled. Default is YIELD.

CVscheme

data frame output from the crossValidate function which was based on a user decided sampling strategy to use in the cross-validation. Default is NULL, which applies prediction on full data set, and which is not yet implemented, making specificiation of this argument required.

modelName

character name describing the model used for modeling:

modelG:

Model where entries and locations are seen as random terms in the model. The G-matrix is used to include the genetic relatedness between the entries. See reference 4 for more detail.

modelGE:

Model where entries and locations are seen as random terms in the model, and where a GxE interaction term is included. The G-matrix is used to include the genetic relatedness between the entries. The GxE interactions are modeled following Jarquin et al. (2014). See reference 3 and 4 for more detail.

modelL:

Model where entries and locations are seen as random terms in the model, and where no information about the relatedness between the entries in included. This model is included for didactic and testing purposes.

G

matrix containing the realized G-matrix obtained for the entries in the dataset specified in P.

outputDir

character specifying the name of the directory where to output the files used in the modeling and inference. Default is the working directory.

verbose

logical whether to output information about the progress of the cross-validation. Default is FALSE.

replications

numeric defining the number of replications of the cross-validation. Default is 3.

...

additional arguments for the BGLR function. Of interest are nIter for the number of iterations and burnIn specifying the burn-in used in MCMC analysis.

Details

The function uses the cross-validation scheme information (CVscheme argument) to split the data into training and test sets. While running through the replications (replication argument) and folds, the model specified in the modelName argument is fitted using the BGLR framework following the specifications in Appendix B of reference 4. After model fit a series of metrics are calculated to support inference, which is further detailed in reference 4. This includes the predictive ability, the mean squared prediction error (MSPE), and the bias which is calculated using a linear model (lm function) of observed phenotype values on the predicted phenotype values of the test set under evaluation. The outputted information is detailed in the Value section. The files used for inference are stored in a folder named BGLR which is a subdirectory of the directory specified in the outputDir argument.

Value

list with following slots, where TS stands for test set.

Author(s)

Ruud Derijcker

References

1:

Albrecht, T., et al. (2011). Genome-based prediction of testcross values in maize. Theor Appl Genet 123:339-350.

2:

De Los Campos, G., Perez, P. (2014). BGLR: Bayesian Generalized Linear Regression. Version 1.0.3. (http://CRAN.R-project.org/package=BGLR).

3:

Jarquin, D. et al. (2014). A reaction norm model for genomic selection using high-dimensional genomic and environmental data. Theor Appl Genet 127(3):595-607.

4:

Derijcker, R. (2015). Investigating incorporation of genotype x environment interaction (G x E) for genomic selection in a practical setting. Unpublished M.Sc. thesis. University of Ghent:Belgium.

Examples

1
2
3
4
5
6
7
8
9
data(G)
data(P)
scheme <- crossValidate(x=P, id="GERMPLASM", factor="LOCATION", k=5,
                        replication=2, seed=NULL, exclusive=TRUE,
                        sampling="randomByID",verbose=TRUE)
output <- inferenceBGLR(P, CVscheme=scheme, modelName="modelG", id="GERMPLASM",
                       G=G, factor="LOCATION", trait="YIELD", nIter=1500, burnIn=250,
                       replications=2)
str(output)

digiYozhik/msc_thesis documentation built on May 14, 2019, 5:16 p.m.