EMM.cpp: Equation of mixed model for one kernel, a wrapper of two...

View source: R/EMM_functions_cpp.R

EMM.cppR Documentation

Equation of mixed model for one kernel, a wrapper of two methods

Description

This function estimates maximum-likelihood (ML/REML; resticted maximum likelihood) solutions for the following mixed model.

y = X \beta + Z u + \epsilon

where \beta is a vector of fixed effects and u is a vector of random effects with Var[u] = K \sigma^2_u. The residual variance is Var[\epsilon] = I \sigma^2_e.

Usage

EMM.cpp(
  y,
  X = NULL,
  ZETA,
  eigen.G = NULL,
  eigen.SGS = NULL,
  n.thres = 450,
  reestimation = FALSE,
  n.core = NA,
  lam.len = 4,
  init.range = c(1e-06, 100),
  init.one = 0.5,
  conv.param = 1e-06,
  count.max = 20,
  bounds = c(1e-06, 1e+06),
  tol = NULL,
  optimizer = "nlminb",
  traceInside = 0,
  REML = TRUE,
  silent = TRUE,
  plot.l = FALSE,
  SE = FALSE,
  return.Hinv = TRUE
)

Arguments

y

A n \times 1 vector. A vector of phenotypic values should be used. NA is allowed.

X

A n \times p matrix. You should assign mean vector (rep(1, n)) and covariates. NA is not allowed.

ZETA

A list of variance (relationship) matrix (K; m \times m) and its design matrix (Z; n \times m) of random effects. You can use only one kernel matrix. For example, ZETA = list(A = list(Z = Z, K = K)) Please set names of list "Z" and "K"!

eigen.G

A list with

$values

Eigen values

$vectors

Eigen vectors

The result of the eigen decompsition of G = ZKZ'. You can use "spectralG.cpp" function in RAINBOWR. If this argument is NULL, the eigen decomposition will be performed in this function. We recommend you assign the result of the eigen decomposition beforehand for time saving.

eigen.SGS

A list with

$values

Eigen values

$vectors

Eigen vectors

The result of the eigen decompsition of SGS, where S = I - X(X'X)^{-1}X', G = ZKZ'. You can use "spectralG.cpp" function in RAINBOWR. If this argument is NULL, the eigen decomposition will be performed in this function. We recommend you assign the result of the eigen decomposition beforehand for time saving.

n.thres

If n >= n.thres, perform EMM1.cpp. Else perform EMM2.cpp.

reestimation

If TRUE, EMM2.cpp is performed when the estimation by EMM1.cpp may not be accurate.

n.core

Setting n.core > 1 will enable parallel execution on a machine with multiple cores.

lam.len

The number of initial values you set. If this number is large, the estimation will be more accurate, but computational cost will be large. We recommend setting this value 3 <= lam.len <= 6.

init.range

The range of the initial parameters. For example, if lam.len = 5 and init.range = c(1e-06, 1e02), corresponding initial heritabilities will be calculated as seq(1e-06, 1 - 1e-02, length = 5), and then initial lambdas will be set.

init.one

The initial parameter if lam.len = 1.

conv.param

The convergence parameter. If the diffrence of log-likelihood by updating the parameter "lambda" is smaller than this conv.param, the iteration steps will be stopped.

count.max

Sometimes algorithms won't converge for some initial parameters. So if the iteration steps reache to this argument, you can stop the calculation even if algorithm doesn't converge.

bounds

Lower and Upper bounds of the parameter lambda. If the updated parameter goes out of this range, the parameter is reset to the value in this range.

tol

The tolerance for detecting linear dependencies in the columns of G = ZKZ'. Eigen vectors whose eigen values are less than "tol" argument will be omitted from results. If tol is NULL, top 'n' eigen values will be effective.

optimizer

The function used in the optimization process. We offer "optim", "optimx", and "nlminb" functions.

traceInside

Perform trace for the optimzation if traceInside >= 1, and this argument shows the frequency of reports.

REML

You can choose which method you will use, "REML" or "ML". If REML = TRUE, you will perform "REML", and if REML = FALSE, you will perform "ML".

silent

If this argument is TRUE, warning messages will be shown when estimation is not accurate.

plot.l

If you want to plot log-likelihood, please set plot.l = TRUE. We don't recommend plot.l = TRUE when lam.len >= 2.

SE

If TRUE, standard errors are calculated.

return.Hinv

If TRUE, the function returns the inverse of H = ZKZ' + \lambda I where \lambda = \sigma^2_e / \sigma^2_u. This is useful for GWAS.

Value

$Vu

Estimator for \sigma^2_u

$Ve

Estimator for \sigma^2_e

$beta

BLUE(\beta)

$u

BLUP(u)

$LL

Maximized log-likelihood (full or restricted, depending on method)

$beta.SE

Standard error for \beta (If SE = TRUE)

$u.SE

Standard error for u^*-u (If SE = TRUE)

$Hinv

The inverse of H = ZKZ' + \lambda I (If return.Hinv = TRUE)

$Hinv2

The inverse of H2 = ZKZ'/\lambda + I (If return.Hinv = TRUE)

$lambda

Estimators for \lambda = \sigma^2_e / \sigma^2_u (If n >= n.thres)

$lambdas

Lambdas for each initial values (If n >= n.thres)

$reest

If parameter estimation may not be accurate, reest = 1, else reest = 0 (If n >= n.thres)

$counts

The number of iterations until convergence for each initial values (If n >= n.thres)

References

Kang, H.M. et al. (2008) Efficient Control of Population Structure in Model Organism Association Mapping. Genetics. 178(3): 1709-1723.

Zhou, X. and Stephens, M. (2012) Genome-wide efficient mixed-model analysis for association studies. Nat Genet. 44(7): 821-824.

Examples



### Perform genomic prediction with 10-fold cross validation

  ### Import RAINBOWR
  require(RAINBOWR)

  ### Load example datasets
  data("Rice_Zhao_etal")
  Rice_geno_score <- Rice_Zhao_etal$genoScore
  Rice_geno_map <- Rice_Zhao_etal$genoMap
  Rice_pheno <- Rice_Zhao_etal$pheno

  ### View each dataset
  See(Rice_geno_score)
  See(Rice_geno_map)
  See(Rice_pheno)

  ### Select one trait for example
  trait.name <- "Flowering.time.at.Arkansas"
  y <- as.matrix(Rice_pheno[, trait.name, drop = FALSE])

  ### Remove SNPs whose MAF <= 0.05
  x.0 <- t(Rice_geno_score)
  MAF.cut.res <- MAF.cut(x.0 = x.0, map.0 = Rice_geno_map)
  x <- MAF.cut.res$x
  map <- MAF.cut.res$map

  ### Estimate genomic relationship matrix (GRM)
  K.A <- calcGRM(genoMat = x)

  ### Modify data
  modify.res <- modify.data(pheno.mat = y, geno.mat = x, return.ZETA = TRUE)
  pheno.mat <- modify.res$pheno.modi
  ZETA <- modify.res$ZETA


  ### Solve linear mixed effects model
  EMM.res <- EMM.cpp(y = pheno.mat, X = NULL, ZETA = ZETA)
  (Vu <- EMM.res$Vu)   ### estimated genetic variance
  (Ve <- EMM.res$Ve)   ### estimated residual variance
  (herit <- Vu / (Vu + Ve))   ### genomic heritability

  (beta <- EMM.res$beta)   ### Here, this is an intercept.
  u <- EMM.res$u   ### estimated genotypic values
  See(u)

  ### Estimate marker effects from estimated genotypic values
  x.modi <- modify.res$geno.modi
  WMat <- calcGRM(genoMat = x.modi, methodGRM = "addNOIA",
                  returnWMat = TRUE)
  K.A <- ZETA$A$K
  if (min(eigen(K.A)$values) < 1e-08) {
    diag(K.A) <- diag(K.A) + 1e-06
  }

  mrkEffectsForW <- crossprod(x = WMat,
                              y = solve(K.A)) %*% as.matrix(u)
  mrkEffects <- mrkEffectsForW / mean(scale(x.modi %*% mrkEffectsForW, scale = FALSE) / u)




  #### Cross-validation for genomic prediction
  noNA <- !is.na(c(pheno.mat))   ### NA (missing) in the phenotype data

  phenoNoNA <- pheno.mat[noNA, , drop = FALSE]   ### remove NA
  ZETANoNA <- ZETA
  ZETANoNA$A$Z <- ZETA$A$Z[noNA, ]   ### remove NA


  nFold <- 10    ### # of folds
  nLine <- nrow(phenoNoNA)
  idCV <- sample(1:nLine %% nFold)   ### assign random ids for cross-validation
  idCV[idCV == 0] <- nFold

  yPred <- rep(NA, nLine)

  for (noCV in 1:nFold) {
    yTrain <- phenoNoNA
    yTrain[idCV == noCV, ] <- NA   ### prepare test data

    EMM.resCV <- EMM.cpp(y = yTrain, X = NULL, ZETA = ZETANoNA)   ### prediction
    yTest <-  EMM.resCV$beta + EMM.resCV$u   ### predicted values

    yPred[idCV == noCV] <- (yTest[noNA])[idCV == noCV]
  }

  ### Plot the results
  plotRange <- range(phenoNoNA, yPred)
  plot(x = phenoNoNA, y = yPred,xlim = plotRange, ylim = plotRange,
       xlab = "Observed values", ylab = "Predicted values",
       main = "Results of Genomic Prediction",
       cex.lab = 1.5, cex.main = 1.5, cex.axis = 1.3)
  abline(a = 0, b = 1, col = 2, lwd = 2, lty = 2)
  R2 <- cor(x = phenoNoNA[, 1], y = yPred) ^ 2
  text(x = plotRange[2] - 10,
       y = plotRange[1] + 10,
       paste0("R2 = ", round(R2, 3)),
       cex = 1.5)


RAINBOWR documentation built on July 4, 2024, 1:11 a.m.