cSSBR: Single Step Bayesian Regression
In cpgen: Parallelized Genomic Prediction and GWAS

Description Usage Arguments Details Value Author(s) References See Also Examples

This function runs Single Step Bayesian Regression (SSBR) for the prediction of breeding values in a unified model that incorporates genotyped and non genotyped individuals (Fernando et al., 2014).

1 2	cSSBR(data, M, M.id, X=NULL, par_random=NULL, scale_e=0, df_e=0, niter=5000, burnin=2500, seed=NULL, verbose=TRUE)

`data`	`data.frame` with four columns: `id, sire, dam, y`
`M`	Marker Matrix for genotyped individuals
`M.id`	Vector of length `nrow(M)` representing rownames for M
`X`	Fixed effects design matrix of type: `matrix` or `dgCMatrix`. If omitted a column-vector of ones will be assigned. Must have as many rows as `data`
`par_random`	as in `clmm`
`niter`	as in `clmm`
`burnin`	as in `clmm`
`verbose`	as in `clmm`
`scale_e`	as in `clmm`
`df_e`	as in `clmm`
`seed`	as in `clmm`

The function sets up the following model using cSSBR.setup:

\mathbf{y} = \mathbf{Xb} + \mathbf{Mα} + \mathbf{Zε} + \mathbf{e}

The matrix \mathbf{M} denotes a combined marker matrix consisting of actual and imputed marker covariates. Best linear predictions of gene content (Gengler et al., 2007) for the non-genotyped individuals are obtained using: \mathbf{A}^{11}\hat{\mathbf{M}_1} = -\mathbf{A}^{12}\mathbf{M}_2 (Fernando et al., 2014). \mathbf{A}^{11} and \mathbf{A}^{12} are submatrices of the inverse of the numerator relationship matrix, which is easily obtained (Henderson, 1976). The subscripts 1 and 2 denote non genotyped and genotyped individuals respectively. The very sparse equation system is being solved using a sparse cholesky solver provided by the Eigen library. The residual imputation error has variance: (\mathbf{A}^{11})^{-1}σ_{ε}^2.

List of 4 + number of random effects as in clmm +

SSBR

List of 7:

ids - ids used in the model (ordered as in other model terms)
y - phenotype vector
X - Design matrix for fixed effects
Marker_Matrix - Combined Marker Matrix including imputed and genotyped individuals
Z_residual - Design Matrix used to model the residual error for the imputed individuals
ginverse_residual - Submatrix of the inverse of the numerator relationship matrix. Used to model the residual error for the imputed individuals
Breeding_Values - Predicted Breeding Values for all animals in data that have genotypes and/or phenotypes

Claas Heuer

Fernando, R.L., Dekkers, J.C., Garrick, D.J.: A class of bayesian methods to combine large numbers of genotyped and non-genotyped animals for whole-genome analyses. Genetics Selection Evolution 46(1), 50 (2014)

Gengler, N., Mayeres, P., Szydlowski, M.: A simple method to approximate gene content in large pedigree populations: application to the myostatin gene in dual-purpose belgian blue cattle. animal 1(01), 21 (2007)

Henderson, C.R.: A simple method for computing the inverse of a numerator relationship matrix used in prediction of breeding values. Biometrics 32(1), 69-83 (1976)

cSSBR.setup, clmm

# example dataset

id <- 1:6
sire <- c(rep(NA,3),rep(1,3))
dam <- c(rep(NA,3),2,2,3)

# phenotypes
y <- c(NA, 0.45, 0.87, 1.26, 1.03, 0.67)

dat <- data.frame(id=id,sire=sire,dam=dam,y=y)


# Marker genotypes
M <- rbind(c(1,2,1,1,0,0,1,2,1,0),
           c(2,1,1,1,2,0,1,1,1,1),
           c(0,1,0,0,2,1,2,1,1,1))

M.id <- 1:3

var_y <- var(y,na.rm=TRUE)
var_e <- (10*var_y / 21)
var_a <- var_e 
var_m <- var_e / 10

# put emphasis on the prior
df = 500

par_random=list(list(method="ridge",scale=var_m,df = df),list(method="ridge",scale=var_a,df=df))

set_num_threads(1)
mod<-cSSBR(data = dat,
           M=M,
           M.id=M.id,
           par_random=par_random,
           scale_e = var_e,
           df_e=df,
           niter=50000,
           burnin=30000)

# check marker effects
print(round(mod[[4]]$posterior$estimates_mean,digits=2))

# check breeding value prediction:
print(round(mod$SSBR$Breeding_Values,digits=2))