fitBLUP: fitBLUP function
In MarcooLopez/SFSI_data: Sparse Family and Selection Index

Description Usage Arguments Details Author(s) References Examples

View source: R/fitBLUP.R

Solves the Linear Mixed Model and calculates Best Linear Unbiased Predictor (BLUP)

fitBLUP(y, X = NULL, Z = NULL, K = NULL, U = NULL, d = NULL,
          indexK = NULL, h2 = NULL, BLUP = TRUE, method = "ML",
          return.Hinv = FALSE, tol = 1E-5, maxIter = 1000,
          interval = c(1E-9,1E9), warn = TRUE)

`y`	Response variable
`X`	Design matrix for the fixed effects. When `X=NULL` a vector of ones is constructed only for the intercept (default)
`Z`	Design matrix for the random effects. When `Z=NULL` an identity matrix is considered (default) thus G = K; otherwise G = Z K Z' is used
`K`	Kinship relationships matrix. This can be a name of a binary file where the matrix is stored
`U`	Matrix with eigenvectors from spectral value decomposition of G = U D U'
`d`	Vector with eigenvalues from spectral value decomposition of G = U D U'
`indexK`	Vector of integers indicating which columns and rows will be read when `K` is the name of a binary file. Default `indexK=NULL` will read the whole matrix
`h2`	Heritability of the response variable. When `h2=NULL`, the heritability is calculated from variance components estimated using Maximum Likelihood (ML)
`BLUP`	`TRUE` or `FALSE` to whether return the random effects estimates
`method`	Either 'ML' (Maximum Likelihood) or 'REML' (Restricted Maximum Likelihood). Only 'ML' method is implemented in this version
`return.Hinv`	`TRUE` or `FALSE` to whether return the inverse of the matrix H
`tol`	Maximum error between two consecutive solutions when solving the root
`maxIter`	Maximum number of iterations to run before convergence is reached
`interval`	Range of values in which the root is searched
`warn`	`TRUE` or `FALSE` to whether show warnings

The basic linear mixed model that relates phenotypes with genetic values is of the form

y = X b + Z g + e

where y is a vector with the response, b is the vector of fixed effects, g is the vector of the genetic values of the genotypes, e is the vector of environmental residuals, and X and Z are design matrices conecting the fixed and genetic effects with replicates. Genetic values are assumed to follow a Normal distribution as g ~ N(0,σ²_uK), and environmental terms are assumed e ~ N(0,σ²_eI).

The resulting vector of genetic values u = Z g will therefore follow u ~ N(0,σ²_uG) where G = Z K Z'. In the un-replicated case, Z = I is an identity matrix, and hence u = g and G = K.

The values u_tst = (u_i), i = 1,2,...,n_tst, for a testing set are estimated using (as predictors) all available observations in a training set as

u_tst = H (y_trn - X_trnb)

where H is a matrix of weights given by

H = G_tst,trn (G_trn,trn + λ₀I)^-1

where G_tst,trn is the sub-matrix of G whose rows correspond to the testing set and columns to the training set, G_trn,trn is the sub-matrix corresponding to the training set, and λ₀ = (1 - h²)/h² is a shrinkage parameter expressed in terms of the heritability, h² = σ²_u/(σ²_u + σ²_e).

Paulino Perez, Marco Lopez-Cruz (lopezcru@msu.edu) and Gustavo de los Campos

VanRaden PM (2008). Efficient methods to compute genomic predictions. Journal of Dairy Science, 91(11), 4414–4423.

Zhou X, Stephens M (2012). Genome-wide efficient mixed-model analysis for association studies. Nature Genetics, 44(7), 821-824

  require(SFSI)
  data(wheatHTP)
  X = scale(X[1:300,])        # Subset and scale markers
  G = tcrossprod(X)/ncol(X)   # Genomic relationship matrix
  y = scale(Y[1:300,"YLD"])   # Subset response variable

  # Training and testing sets
  tst = sample(seq_along(y),ceiling(0.3*length(y)))
  trn = seq_along(y)[-tst]

  yNA <- y
  yNA[tst] <- NA
  fm = fitBLUP(yNA,K=G)
  plot(y[tst],fm$u[tst])      # Predicted vs observed values in testing set
  cor(y[tst],fm$u[tst])       # Prediction accuracy in testing set
  cor(y[trn],fm$u[trn])       # Prediction accuracy in training set
  fm$h2                       # Heritability