Description Usage Arguments Details Value Author(s) References Examples
Function to make inference on a cross validation analysis and a multi-location trial data set using BGLR
1 2 3 | inferenceBGLR(P, id = "GERMPLASM", factor = "LOCATION", trait = "YIELD",
CVscheme = NULL, modelName = c("model1", "model2", "model0"), G = G,
outputDir = getwd(), verbose = TRUE, replications = 3, ...)
|
P |
a data frame that holds the design information and phenotypes for the data to be modeled. The data should hold following features, which are detailed below and represent the columns in the data frame. For inference on the thesis data set we use the data set P obtained by typing data(P) in the console. |
id |
character describing the column name for the names of the observations. Default is GERMPLASM. |
factor |
character describing the column name for the factor that describes the geographic location in the mutli-location trial. Default is LOCATION. |
trait |
character describing the column name for the phenotype to be modeled. Default is YIELD. |
CVscheme |
data frame output from the crossValidate function which was based on a user decided sampling strategy to use in the cross-validation. Default is NULL, which applies prediction on full data set, and which is not yet implemented, making specificiation of this argument required. |
modelName |
character name describing the model used for modeling:
|
G |
matrix containing the realized G-matrix obtained for the entries in the dataset specified in P. |
outputDir |
character specifying the name of the directory where to output the files used in the modeling and inference. Default is the working directory. |
verbose |
logical whether to output information about the progress of the cross-validation. Default is FALSE. |
replications |
numeric defining the number of replications of the cross-validation. Default is 3. |
... |
additional arguments for the BGLR function. Of interest are nIter for the number of iterations and burnIn specifying the burn-in used in MCMC analysis. |
The function uses the cross-validation scheme information (CVscheme
argument) to split the data into training and test sets. While running through
the replications (replication
argument) and folds, the model specified in
the modelName
argument is fitted using the BGLR framework following the
specifications in Appendix B of reference 4. After model fit a series of
metrics are calculated to support inference, which is further detailed in
reference 4. This includes the predictive ability, the mean squared prediction
error (MSPE), and the bias which is calculated using a linear model
(lm
function) of observed phenotype values on the predicted phenotype
values of the test set under evaluation. The outputted information is detailed
in the Value
section. The files used for inference are stored in a folder
named BGLR which is a subdirectory of the directory specified in the
outputDir
argument.
list with following slots, where TS stands for test set.
n.SNP
Number of SNPs used in analysis.
Not relevant here, put to zero.
n.T
Matrix with number of entries in the test set for each fold
(rows) by replications (columns).
n.DS
Matrix with the number of observations in the total
dataset for each fold(rows) by replications (columns).
id.TS
List of IDs of each test set within a list of each
replication.
bu
Estimated fixed and random effects of each fold within
each replication (see crossVal function)
y.TS
Predicted values of all test sets within each replication.
PredAbi
Predictive ability of each fold within each replication
calculated as correlation coefficient r(y_{TS},\hat y_{TS}).
rankCor
Spearman's rank correlation of each fold within each
replication calculated between y_{TS} and \hat y_{TS}.
bias
Regression coefficients of a regression of the observed
values on the predicted values in the TS. A regression coefficient < 1
implies inflation of predicted values, and a coefficient of > 1
deflation of predicted values.
k
Integer defining the number of folds.
Rep
Numeric defining the number of replications.
sampling
Character defining the sampling method.
Seed
Seed for set.seed()
rep.seed
vector with the values for the seeds used for each
replication
nr.ranEff
Number of random effects used (see crossVal function)
VC.est.method
Method for the variance components
(committed
or re-estimated with ASReml/BRR/BL
),
see crossVal function. We recommend the default, BGLR.
m10
Mean of observed values for the 10% best predicted of
each replication. The k test sets are pooled within each replication.
mse
Mean squared error (of prediction, MSPE) of each fold
within each replication calculated between y_{TS} and \hat y_{TS}.
This is in reference 4 referred to as MSPE, the mean squared prediction
error.
topRecovery
Array of topx recovery of entries across the
different locations. Array contains a matrix for every fold in the cross-
validation. Every matrix hold as as many rows as replications defined. The
columns in the matrix hold values for the different topx recoveries, where
x is element of (10, 20, 30, 40, 50, 100, 200). The elements in the matrix
are calculated as the percentage entries intersecting between the entries in
the raw and predicted test set under consideration.
residualErrors
Matrix of residual errors, with as columns the
different folds in the cross-validation and the number of columns representing
the different replications. The variance was taken from the varE component in
the fitted BGLR object.
Ruud Derijcker
1
:Albrecht, T., et al. (2011). Genome-based prediction of testcross values in maize. Theor Appl Genet 123:339-350.
2
:De Los Campos, G., Perez, P. (2014). BGLR: Bayesian Generalized Linear Regression. Version 1.0.3. (http://CRAN.R-project.org/package=BGLR).
3
:Jarquin, D. et al. (2014). A reaction norm model for genomic selection using high-dimensional genomic and environmental data. Theor Appl Genet 127(3):595-607.
4
:Derijcker, R. (2015). Investigating incorporation of genotype x environment interaction (G x E) for genomic selection in a practical setting. Unpublished M.Sc. thesis. University of Ghent:Belgium.
1 2 3 4 5 6 7 8 9 | data(G)
data(P)
scheme <- crossValidate(x=P, id="GERMPLASM", factor="LOCATION", k=5,
replication=2, seed=NULL, exclusive=TRUE,
sampling="randomByID",verbose=TRUE)
output <- inferenceBGLR(P, CVscheme=scheme, modelName="modelG", id="GERMPLASM",
G=G, factor="LOCATION", trait="YIELD", nIter=1500, burnIn=250,
replications=2)
str(output)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.