cSSBR: Single Step Bayesian Regression In cpgen: Parallelized Genomic Prediction and GWAS

Description

This function runs Single Step Bayesian Regression (SSBR) for the prediction of breeding values in a unified model that incorporates genotyped and non genotyped individuals (Fernando et al., 2014).

Usage

 1 2 cSSBR(data, M, M.id, X=NULL, par_random=NULL, scale_e=0, df_e=0, niter=5000, burnin=2500, seed=NULL, verbose=TRUE) 

Arguments

 data data.frame with four columns: id, sire, dam, y M Marker Matrix for genotyped individuals M.id Vector of length nrow(M) representing rownames for M X Fixed effects design matrix of type: matrix or dgCMatrix. If omitted a column-vector of ones will be assigned. Must have as many rows as data par_random as in clmm niter as in clmm burnin as in clmm verbose as in clmm scale_e as in clmm df_e as in clmm seed as in clmm

Details

The function sets up the following model using cSSBR.setup:

\mathbf{y} = \mathbf{Xb} + \mathbf{Mα} + \mathbf{Zε} + \mathbf{e}

The matrix \mathbf{M} denotes a combined marker matrix consisting of actual and imputed marker covariates. Best linear predictions of gene content (Gengler et al., 2007) for the non-genotyped individuals are obtained using: \mathbf{A}^{11}\hat{\mathbf{M}_1} = -\mathbf{A}^{12}\mathbf{M}_2 (Fernando et al., 2014). \mathbf{A}^{11} and \mathbf{A}^{12} are submatrices of the inverse of the numerator relationship matrix, which is easily obtained (Henderson, 1976). The subscripts 1 and 2 denote non genotyped and genotyped individuals respectively. The very sparse equation system is being solved using a sparse cholesky solver provided by the Eigen library. The residual imputation error has variance: (\mathbf{A}^{11})^{-1}σ_{ε}^2.

Value

List of 4 + number of random effects as in clmm +

 SSBR List of 7: ids - ids used in the model (ordered as in other model terms) y - phenotype vector X - Design matrix for fixed effects Marker_Matrix - Combined Marker Matrix including imputed and genotyped individuals Z_residual - Design Matrix used to model the residual error for the imputed individuals ginverse_residual - Submatrix of the inverse of the numerator relationship matrix. Used to model the residual error for the imputed individuals Breeding_Values - Predicted Breeding Values for all animals in data that have genotypes and/or phenotypes

Claas Heuer

References

Fernando, R.L., Dekkers, J.C., Garrick, D.J.: A class of bayesian methods to combine large numbers of genotyped and non-genotyped animals for whole-genome analyses. Genetics Selection Evolution 46(1), 50 (2014)

Gengler, N., Mayeres, P., Szydlowski, M.: A simple method to approximate gene content in large pedigree populations: application to the myostatin gene in dual-purpose belgian blue cattle. animal 1(01), 21 (2007)

Henderson, C.R.: A simple method for computing the inverse of a numerator relationship matrix used in prediction of breeding values. Biometrics 32(1), 69-83 (1976)

cSSBR.setup, clmm
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 # example dataset id <- 1:6 sire <- c(rep(NA,3),rep(1,3)) dam <- c(rep(NA,3),2,2,3) # phenotypes y <- c(NA, 0.45, 0.87, 1.26, 1.03, 0.67) dat <- data.frame(id=id,sire=sire,dam=dam,y=y) # Marker genotypes M <- rbind(c(1,2,1,1,0,0,1,2,1,0), c(2,1,1,1,2,0,1,1,1,1), c(0,1,0,0,2,1,2,1,1,1)) M.id <- 1:3 var_y <- var(y,na.rm=TRUE) var_e <- (10*var_y / 21) var_a <- var_e var_m <- var_e / 10 # put emphasis on the prior df = 500 par_random=list(list(method="ridge",scale=var_m,df = df),list(method="ridge",scale=var_a,df=df)) set_num_threads(1) mod<-cSSBR(data = dat, M=M, M.id=M.id, par_random=par_random, scale_e = var_e, df_e=df, niter=50000, burnin=30000) # check marker effects print(round(mod[[4]]$posterior$estimates_mean,digits=2)) # check breeding value prediction: print(round(mod$SSBR$Breeding_Values,digits=2))