Single Step Bayesian Regression

Share:

Description

This function runs Single Step Bayesian Regression (SSBR) for the prediction of breeding values in a unified model that incorporates genotyped and non genotyped individuals (Fernando et al., 2014).

Usage

1
2
cSSBR(data, M, M.id, X=NULL, par_random=NULL, scale_e=0, df_e=0, 
      niter=5000, burnin=2500, seed=NULL, verbose=TRUE)

Arguments

data

data.frame with four columns: id, sire, dam, y

M

Marker Matrix for genotyped individuals

M.id

Vector of length nrow(M) representing rownames for M

X

Fixed effects design matrix of type: matrix or dgCMatrix. If omitted a column-vector of ones will be assigned. Must have as many rows as data

par_random

as in clmm

niter

as in clmm

burnin

as in clmm

verbose

as in clmm

scale_e

as in clmm

df_e

as in clmm

seed

as in clmm

Details

The function sets up the following model using cSSBR.setup:

\mathbf{y} = \mathbf{Xb} + \mathbf{Mα} + \mathbf{Zε} + \mathbf{e}

The matrix \mathbf{M} denotes a combined marker matrix consisting of actual and imputed marker covariates. Best linear predictions of gene content (Gengler et al., 2007) for the non-genotyped individuals are obtained using: \mathbf{A}^{11}\hat{\mathbf{M}_1} = -\mathbf{A}^{12}\mathbf{M}_2 (Fernando et al., 2014). \mathbf{A}^{11} and \mathbf{A}^{12} are submatrices of the inverse of the numerator relationship matrix, which is easily obtained (Henderson, 1976). The subscripts 1 and 2 denote non genotyped and genotyped individuals respectively. The very sparse equation system is being solved using a sparse cholesky solver provided by the Eigen library. The residual imputation error has variance: (\mathbf{A}^{11})^{-1}σ_{ε}^2.

Value

List of 4 + number of random effects as in clmm +

SSBR

List of 7:

  • ids - ids used in the model (ordered as in other model terms)

  • y - phenotype vector

  • X - Design matrix for fixed effects

  • Marker_Matrix - Combined Marker Matrix including imputed and genotyped individuals

  • Z_residual - Design Matrix used to model the residual error for the imputed individuals

  • ginverse_residual - Submatrix of the inverse of the numerator relationship matrix. Used to model the residual error for the imputed individuals

  • Breeding_Values - Predicted Breeding Values for all animals in data that have genotypes and/or phenotypes

Author(s)

Claas Heuer

References

Fernando, R.L., Dekkers, J.C., Garrick, D.J.: A class of bayesian methods to combine large numbers of genotyped and non-genotyped animals for whole-genome analyses. Genetics Selection Evolution 46(1), 50 (2014)

Gengler, N., Mayeres, P., Szydlowski, M.: A simple method to approximate gene content in large pedigree populations: application to the myostatin gene in dual-purpose belgian blue cattle. animal 1(01), 21 (2007)

Henderson, C.R.: A simple method for computing the inverse of a numerator relationship matrix used in prediction of breeding values. Biometrics 32(1), 69-83 (1976)

See Also

cSSBR.setup, clmm

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
# example dataset

id <- 1:6
sire <- c(rep(NA,3),rep(1,3))
dam <- c(rep(NA,3),2,2,3)

# phenotypes
y <- c(NA, 0.45, 0.87, 1.26, 1.03, 0.67)

dat <- data.frame(id=id,sire=sire,dam=dam,y=y)


# Marker genotypes
M <- rbind(c(1,2,1,1,0,0,1,2,1,0),
           c(2,1,1,1,2,0,1,1,1,1),
           c(0,1,0,0,2,1,2,1,1,1))

M.id <- 1:3

var_y <- var(y,na.rm=TRUE)
var_e <- (10*var_y / 21)
var_a <- var_e 
var_m <- var_e / 10

# put emphasis on the prior
df = 500

par_random=list(list(method="ridge",scale=var_m,df = df),list(method="ridge",scale=var_a,df=df))

set_num_threads(1)
mod<-cSSBR(data = dat,
           M=M,
           M.id=M.id,
           par_random=par_random,
           scale_e = var_e,
           df_e=df,
           niter=50000,
           burnin=30000)

# check marker effects
print(round(mod[[4]]$posterior$estimates_mean,digits=2))

# check breeding value prediction:
print(round(mod$SSBR$Breeding_Values,digits=2))