# EM3.general: Equation of mixed model for multi-kernel including using... In RAINBOWR: Genome-Wide Association Study with SNP-Set Methods

## Description

This function solves the following multi-kernel linear mixed effects model using MMEst function in 'MM4LMM' package, lmm.aireml or lmm.diago functions in 'gaston' package, or EM3.cpp function in 'RAINBOWR' package.

y = X β + ∑ _{l=1} ^ {L} Z _ {l} u _ {l} + ε

where Var[y] = ∑ _{l=1} ^ {L} Z _ {l} K _ {l} Z _ {l}' σ _ {l} ^ 2 + I σ _ {e} ^ {2}.

## Usage

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 EM3.general( y, X0 = NULL, ZETA, eigen.G = NULL, package = "gaston", tol = NULL, n.core = 1, optimizer = "nlminb", REML = TRUE, pred = TRUE, return.u.always = TRUE, return.u.each = TRUE, return.Hinv = TRUE, recheck.RAINBOWR = TRUE, var.ratio.range = c(1e-09, 1e+07) ) 

## Arguments

 y A n \times 1 vector. A vector of phenotypic values should be used. NA is allowed. X0 A n \times p matrix. You should assign mean vector (rep(1, n)) and covariates. NA is not allowed. ZETA A list of variance matrices and its design matrices of random effects. You can use more than one kernel matrix. For example, ZETA = list(A = list(Z = Z.A, K = K.A), D = list(Z = Z.D, K = K.D)) (A for additive, D for dominance) Please set names of lists "Z" and "K"! eigen.G A list with $valuesEigen values$vectorsEigen vectors The result of the eigen decompsition of G = ZKZ'. You can use "spectralG.cpp" function in RAINBOWR. If this argument is NULL, the eigen decomposition will be performed in this function. We recommend you assign the result of the eigen decomposition beforehand for time saving. package Package name to be used in this function. We only offer the following three packages: "RAINBOWR", "MM4LMM" and "gaston". Default package is 'gaston'. tol The tolerance for detecting linear dependencies in the columns of G = ZKZ'. Eigen vectors whose eigen values are less than "tol" argument will be omitted from results. If tol is NULL, top 'n' eigen values will be effective. n.core Setting n.core > 1 will enable parallel execution on a machine with multiple cores. (‘n.core' will be replaced by 1 for 'package = ’gaston'') optimizer The function used in the optimization process. We offer "optim", "optimx", and "nlminb" functions. This argument is only valid when ‘package = ’RAINBOWR''. REML You can choose which method you will use, "REML" or "ML". If REML = TRUE, you will perform "REML", and if REML = FALSE, you will perform "ML". pred If TRUE, the fitting values of y is returned. return.u.always When using the "gaston" package with missing values or using the "MM4LMM" package (with/without missings), computing BLUP will take some time in addition to solving the mixed-effects model. You can choose whether BLUP ('u'; u) will be returned or not. return.u.each If TRUE, the function also computes each BLUP corresponding to different kernels (when solving multi-kernel mixed-effects model). It takes additional time compared to the one with 'return.u.each = FALSE' when using packages other than 'RAINBOWR'. return.Hinv If TRUE, H ^ {-1} = (Var[y] / ∑ _{l=1} ^ {L} σ _ {l} ^ 2) ^ {-1} will be computed. It also returns V ^ {-1} = (Var[y]) ^ {-1}. It will take some time in addition to solving the mixed-effects model when using packages other than 'RAINBOWR'. recheck.RAINBOWR When you use the package other than 'RAINBOWR' and the ratio of variance components is out of the range of 'var.ratio.range', the function will solve the mixed-effects model again with 'RAINBOWR' package, if 'recheck.RAINBOWR = TRUE'. var.ratio.range The range of variance components to check that the results by the package other than RAINBOWR is correct or not when 'recheck.RAINBOWR = TRUE'.

## Value

$y.pred The fitting values of y y = Xβ + Zu$Vu

Estimator for σ^2_u, all of the genetic variance

$Ve Estimator for σ^2_e$beta

BLUE(β)

$u BLUP(Sum of Zu)$u.each

BLUP(Each u)

$weights The proportion of each genetic variance (corresponding to each kernel of ZETA) to Vu$LL

Maximized log-likelihood (full or restricted, depending on method)

$Vinv The inverse of V = Vu \times ZKZ' + Ve \times I$Hinv

The inverse of H = ZKZ' + λ I

## References

Kang, H.M. et al. (2008) Efficient Control of Population Structure in Model Organism Association Mapping. Genetics. 178(3): 1709-1723.

Zhou, X. and Stephens, M. (2012) Genome-wide efficient mixed-model analysis for association studies. Nat Genet. 44(7): 821-824.

Johnson, D. L., & Thompson, R. (1995). Restricted maximum likelihood estimation of variance components for univariate animal models using sparse matrix techniques and average information. Journal of dairy science, 78(2), 449-456.

Hunter, D. R., & Lange, K. (2004). A tutorial on MM algorithms. The American Statistician, 58(1), 30-37.

Zhou, H., Hu, L., Zhou, J., & Lange, K. (2015). MM algorithms for variance components models. arXiv preprint arXiv:1509.07426.

Gilmour, A. R., Thompson, R., & Cullis, B. R. (1995), Average information REML: an efficient algorithm for variance parameter estimation in linear mixed models, Biometrics, 1440-1450.

MMEst, lmm.aireml, lmm.diago
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98  ### Import RAINBOWR require(RAINBOWR) ### Load example datasets data("Rice_Zhao_etal") Rice_geno_score <- Rice_Zhao_etal$genoScore Rice_geno_map <- Rice_Zhao_etal$genoMap Rice_pheno <- Rice_Zhao_etal$pheno ### View each dataset See(Rice_geno_score) See(Rice_geno_map) See(Rice_pheno) ### Select one trait for example trait.name <- "Flowering.time.at.Arkansas" y <- as.matrix(Rice_pheno[, trait.name, drop = FALSE]) ### Remove SNPs whose MAF <= 0.05 x.0 <- t(Rice_geno_score) MAF.cut.res <- MAF.cut(x.0 = x.0, map.0 = Rice_geno_map) x <- MAF.cut.res$x map <- MAF.cut.res$map ### Estimate additive genomic relationship matrix (GRM) & epistatic relationship matrix K.A <- calcGRM(genoMat = x) K.AA <- K.A * K.A ### additive x additive epistatic effects ### Modify data Z <- design.Z(pheno.labels = rownames(y), geno.names = rownames(K.A)) ### design matrix for random effects pheno.mat <- y[rownames(Z), , drop = FALSE] ZETA <- list(A = list(Z = Z, K = K.A), AA = list(Z = Z, K = K.AA)) ### Solve multi-kernel linear mixed effects model using gaston package (2 random efects) EM3.gaston.res <- EM3.general(y = pheno.mat, X0 = NULL, ZETA = ZETA, package = "gaston", return.u.always = TRUE, pred = TRUE, return.u.each = TRUE, return.Hinv = TRUE) (Vu <- EM3.gaston.res$Vu) ### estimated genetic variance (Ve <- EM3.gaston.res$Ve) ### estimated residual variance (weights <- EM3.gaston.res$weights) ### estimated proportion of two genetic variances (herit <- Vu * weights / (Vu + Ve)) ### genomic heritability (additive, additive x additive) (beta <- EM3.gaston.res$beta) ### Here, this is an intercept. u.each <- EM3.gaston.res$u.each ### estimated genotypic values (additive, additive x additive) See(u.each) ### Perform genomic prediction with 10-fold cross validation using gaston package (multi-kernel) noNA <- !is.na(c(pheno.mat)) ### NA (missing) in the phenotype data phenoNoNA <- pheno.mat[noNA, , drop = FALSE] ### remove NA ZETANoNA <- ZETA ZETANoNA <- lapply(X = ZETANoNA, FUN = function (List) { List$Z <- List$Z[noNA, ] return(List) }) ### remove NA nFold <- 10 ### # of folds nLine <- nrow(phenoNoNA) idCV <- sample(1:nLine %% nFold) ### assign random ids for cross-validation idCV[idCV == 0] <- nFold yPred <- rep(NA, nLine) for (noCV in 1:nFold) { print(paste0("Fold: ", noCV)) yTrain <- phenoNoNA yTrain[idCV == noCV, ] <- NA ### prepare test data EM3.gaston.resCV <- EM3.general(y = yTrain, X0 = NULL, ZETA = ZETANoNA, package = "gaston", return.u.always = TRUE, pred = TRUE, return.u.each = TRUE, return.Hinv = TRUE) ### prediction yTest <- EM3.gaston.resCV\$y.pred ### predicted values yPred[idCV == noCV] <- yTest[idCV == noCV] } ### Plot the results plotRange <- range(phenoNoNA, yPred) plot(x = phenoNoNA, y = yPred,xlim = plotRange, ylim = plotRange, xlab = "Observed values", ylab = "Predicted values", main = "Results of Genomic Prediction (multi-kernel)", cex.lab = 1.5, cex.main = 1.5, cex.axis = 1.3) abline(a = 0, b = 1, col = 2, lwd = 2, lty = 2) R2 <- cor(x = phenoNoNA[, 1], y = yPred) ^ 2 text(x = plotRange - 10, y = plotRange + 10, paste0("R2 = ", round(R2, 3)), cex = 1.5)