apps/Hybrid_prediction/README.md

Hybrid prediction using GBLUP and BRR

This app fits GBLUP models to predict hybrid performance. The problem is to predict hybrid performance using genotipic information from parents. The data that we have available is as follows:

The desired output is the prediction of hybrids.

For more details see Acosta-Pech et al.(2017) and references therein.

Briefly the statistical model for predicting hybrid performance is as follows:

$$ \boldsymbol y = \boldsymbol Z_E \boldsymbol \beta_E + \boldsymbol Z_1 \boldsymbol g_1 + \boldsymbol Z_2 \boldsymbol g_2 + \boldsymbol Z_h \boldsymbol h + \boldsymbol e, $$

where

The model can be rewritten as:

$$ \boldsymbol y = \boldsymbol Z_E \boldsymbol \beta_E + \boldsymbol Z_1^\ast \boldsymbol g_1^\ast + \boldsymbol Z_2^\ast \boldsymbol g_2^\ast + \boldsymbol Z_h^\ast \boldsymbol h^\ast + \boldsymbol e, $$

where

The last model can be fitted in BGLR easily using ``Bayesian Ridge Regression'' and after the model is fitted the posterior mean of random effects is obtained as follows:

Computations can be handled automatically by the app BRR.Hybrid_prediction and user just needs to provide locations, ids for parents, ids for hybrids, genomic relationship matrices and response variable.

Example

For illustrative purposes we consider the maize dataset described in Covarrubias-Pazaran (2016), and included in the sommer package. The dataset contains phenotypic data for plant height and grain yield for 100 out of 400 possible hybrids originated from 40 inbreed lines belonging to two heterotic groups, 20 lines each, 1600 rows exist for the 400 possible hybrids evaluated in 4 locations but only 100 crosses have phenotypic information. The purpose is to predict the other 300 crosses.

Data preparation


#Load the library
library(BGLR)

#Load the data, you need to download the file "cornHybrid.RData" included in this app
#This loads a list, from where we extract the information
load('cornHybrid.RData')

#Extract the hybid information, a data.frame with columns: 
#1)Location, 2)GCA1, ids for parent1, 
#3)GCA2, ids fro parent 2,
#4)SCA id for hybrids, 5)Yield and 6)PlantHeight
pheno<-cornHybrid$hybrid
head(pheno)

pheno$GCA1<-as.character(pheno$GCA1)
pheno$GCA2<-as.character(pheno$GCA2)
pheno$SCA<-as.character(pheno$SCA)

#Extract relationship matrix for both parents
G<-cornHybrid$K

#Genomic relationship matrix for parent 1
GCA1<-unique(pheno$GCA1)
selected<-rownames(G)%in%GCA1
G1<-G[selected,selected]
dim(G1)
rownames(G1)

#Genomic relationship matrix for parent 2
GCA2<-unique(pheno$GCA2)
selected<-rownames(G)%in%GCA2
G2<-G[selected,selected]
dim(G2)
rownames(G2)

#Generate H
#kronecker, make.dimmanes is necessary to identify the hybrids
#with the label Parent 1:Parent 2, using the same convention in pheno data.frame
H<-kronecker(G1,G2,make.dimnames=TRUE)

#At this point we need to have 4 objects:
#1)pheno
#2)G1
#3)G2
#4)H

Sourcing the app

 source('https://raw.githubusercontent.com/gdlc/BGLR-R/master/apps/Hybrid_prediction/Hybrid_prediction.R')

Fitting the model

 set.seed(456)
 fm<-BRR.Hybrid_prediction(y=pheno$Yield,
                            location=pheno$Location,
                            id1=pheno$GCA1,
                            id2=pheno$GCA2,
                            idH=pheno$SCA,
                            G1=G1,
                            G2=G2,
                            H=H,
                            nIter=10000,
                            burnIn=5000,
                            thin=10,
                            verbose=TRUE)

Extracting results

#Variance component for parent 1
fm$ETA[[2]]$varB

#Variance component for parent 2
fm$ETA[[3]]$varB

#Variance component for hybrids
fm$ETA[[4]]$varB

#Variance component for error
fm$varE

#predictions
predictions<-data.frame(Loc=pheno$Location, yObs=pheno$Yield,yPred=fm$yHat,hybrid=pheno$SCA)
head(predictions)

#Posterior means for random effects
#Parent1
fm$ETA[[2]]$u

#Parent2
fm$ETA[[3]]$u

#Hybrid
fm$ETA[[4]]$u



gdlc/BGLR-R documentation built on April 23, 2024, 11:01 p.m.