gmm: Genomic mixed model
In alenxav/NAM: Nested Association Mapping

Description Usage Arguments Details Value Author(s) Examples

View source: R/gmm.R

This function was developed to solve a mixel model for multi-environmental trials and/or replicated trials when genomic is available. The model includes a semi-parametric term to account for spatial variation when the field layout of the experiment is know. Covariates (fixed effects), genetics (random effect) and spatial term are fitted all in a single step.

1	gmm(y,gen,dta=NULL,it=75,bi=25,th=1,model="BRR",...)

`y`	Numeric vector of phenotypes of length obs (missing values are allowed). `NA` is allowed.
`gen`	Numeric matrix, with dimension n x p. Attention: Rows must be named with genotype IDs. Missing values are replaced by the SNP expectation.
`dta`	Data frame with obs number of rows containg the genotype ID, spatial information and any other set covariates. Make sure to add a column called "ID" (with capital letters) informing the genotype ID with some match in the object `gen`. For the spatial adjustment, it is necessary to add three numeric columns (not factors) to `dta`: "Block", "Row" and "Col" (case sensitive), where Block may refer to a field block or an enviroment without blocks Therefore, make sure that blocks from different environments are named differently. Row and Col provide the coordinates for the identification of neighbor plots.
`it`	Integer. Total numeric of MCMC iterations used to fit the model.
`bi`	Integer. Burn-in of MCMC iterations, i.e., number of iteration to be discarted prior to model convergence.
`th`	Integer. Thinning parameter: saves only 1 interation every `th`. Thinning is used to reduce the auto-correlation between Markov chains.
`model`	Prediction model: The options are: `BRR`, `BayesA`, `GBLUP` and `RKHS`.
`...`	Pass arguments to the function that builds the spatial splines `NNsrc`: rho and dist. By default, `rho=1` and `dist=3`. To check how it looks like in the field, type `NNsrc(rho=1,dist=3)`.

The general model is y=Xb+Zu+f(x)+e, where y is the response variable, Xb refers to the fixed effects, Zu regards the genetic effect, f(x) represents the field variation, and e is the vector of residuals. In this model u is a link function that represents the genetic component, which depends on the model specified.

For whole-genome regression models (BRR or BayesA), u = Ma, where M is the matrix of genotypes. For kernel models (RKHS and GBLUP), u=N(0,Kσ2a), where K is either a Gaussian kernel (RKHS) or a linear kernel (GBLUP). To avoid over-representation of genotypes, u is not weighted according to the number of observations of each genotype.

Unobserved genotypes not provided in dta but provided in gen are predicted in the output of the function. Genotypes without genotypic information are transfered to the fixed effect (eg. checks). Missing loci are imputed with the expectation. If dta is not provided, the function will work as a regular genomic prediction model, so the length of y must match the number of rows of gen.

In whole-genome regression models, the regularization of the genetic term is either based on chosen prior (t, Gaussian), Gaussian (from ridge regression) and t (from BayesA). Kernel models (GBLUP and RKHS) are regularized as Gaussian process, which is similar to a ridge regression of Eigenvectors where the regularization of Eigenpairs also relies on the Eigenvalues.

If there is a large number of trials and users acknowledge the necessity of sparse matrices, we recommend installing the Matrix package and run the following code that enables sparsity:

source(system.file("add","sparseGMM.R",package="NAM"))

The function gmm returns a list containing the fitted values (hat), observed values (obs), intercept (mu, incidence matrix of genotypes (Z) and the vector of breeding values (EBV). If fixed effects are provided, it also returns the design matrices and coefficients of fixed effects (X,b). If the model was kernel or regression, output will include the random effect coefficients (g), variance components of markers (Vg) and residuals (Ve). Kernel models regress the Eigenvectors of the kernel, weighted by the Eigenvalue. The coefficient (cxx) used in the BRR model to convert marker variance Vb into genetic variance Va. If spatial information is provided, the output includes the fitted spatial term (sp).

Alencar Xavier

## Not run: 

  # Checking heritability
data(tpod)
fit = gmm(y,gen,model = 'BRR')
fit$Vg*fit$cxx / (fit$Vg*fit$cxx+fit$Ve)
  
# For a demo with wulti-environmental trial type:
# demo(fittingMET)
  

## End(Not run)