# Kernel_Ridge_MM: Kernel ridge regression in the mixed model framework In KRMM: Kernel Ridge Mixed Model

## Description

Kernel_Ridge_MM solves kernel ridge regression for various kernels within the following mixed model framework: Y =X*Beta + Z*U + E, where X and Z correspond to the design matrices of predictors with fixed and random effects respectively.

## Usage

 ``` 1 2 3 4 5 6 7 8 9 10 11``` ``` Kernel_Ridge_MM( Y_train, X_train=as.vector(rep(1,length(Y_train))), Z_train=diag(1,length(Y_train)), Matrix_covariates_train, method="RKHS", kernel="Gaussian", rate_decay_kernel=0.1, degree_poly=2, scale_poly=1, offset_poly=1, degree_anova=3, init_sigma2K=2, init_sigma2E=3, convergence_precision=1e-8, nb_iter=1000, display="FALSE" ) ```

## Arguments

 `Y_train` numeric vector; response vector for training data `X_train` numeric matrix; design matrix of predictors with fixed effects for training data (default is a vector of ones) `Z_train` numeric matrix; design matrix of predictors with random effects for training data (default is identity matrix) `Matrix_covariates_train` numeric matrix of entries used to build the kernel matrix `method` character string; RKHS, GBLUP or RR-BLUP `kernel` character string; Gaussian, Laplacian or ANOVA (kernels for RKHS regression ONLY, the linear kernel is automatically built for GBLUP and RR-BLUP and hence no kernel is supplied for these methods) `rate_decay_kernel` numeric scalar; hyperparameter of the Gaussian, Laplacian or ANOVA kernel (default is 0.1) `degree_poly, scale_poly, offset_poly` numeric scalars; parameters for polynomial kernel (defaults are 2, 1 and 1 respectively) `degree_anova` numeric scalar; parameter for ANOVA kernel (defaults is 3) `init_sigma2K, init_sigma2E` numeric scalars; initial guess values, associated to the mixed model variance parameters, for the EM-REML algorithm (defaults are 2 and 3 respectively) `convergence_precision, nb_iter` numeric scalars; convergence precision (i.e. tolerance) associated to the mixed model variance parameters, for the EM-REML algorithm, and number of maximum iterations allowed if convergence is not reached (defaults are 1e-8 and 1000 respectively) `display` boolean (TRUE or FALSE character string); should estimated components be displayed at each iteration

## Details

The matrix Matrix_covariates_train is mandatory to build the kernel matrix for model estimation, and prediction (see Predict_kernel_Ridge_MM).

## Value

 `Beta_hat` Estimated fixed effect(s) `Sigma2K_hat, Sigma2E_hat` Estimated variance components `Vect_alpha` Estimated dual variables `Gamma_hat` RR-BLUP of covariates effects (i.e. available for RR-BLUP method only)

## Author(s)

Laval Jacquin <[email protected]>

## References

Jacquin et al. (2016). A unified and comprehensible view of parametric and kernel methods for genomic prediction with application to rice (in peer review).

Robinson, G. K. (1991). That blup is a good thing: the estimation of random effects. Statistical science, 534 15-32

Foulley, J.-L. (2002). Algorithme em: théorie et application au modèle mixte. Journal de la Société française de Statistique 143, 57-109

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79``` ```## Not run: library(KRMM) ### SIMULATE DATA set.seed(123) p=200 N=100 beta=rnorm(p, mean=0, sd=1.0) X=matrix(runif(p*N, min=0, max=1), ncol=p, byrow=TRUE) #X: covariates (i.e. predictors) f=X%*%beta #f: data generating process (i.e. DGP) E=rnorm(N, mean=0, sd=0.5) Y=f+E #Y: observed response data hist(f) hist(beta) Nb_train=floor((2/3)*N) ###======================================================================### ### CREATE TRAINING AND TARGET SETS FOR RESPONSE AND PREDICTOR VARIABLES ### ###======================================================================### Index_train=sample(1:N, size=Nb_train, replace=FALSE) ### Covariates (i.e. predictors) for training and target sets Predictors_train=X[Index_train, ] Response_train=Y[Index_train] Predictors_target=X[-Index_train, ] True_value_target=f[-Index_train] #True value (generated by DGP) we want to predict ###=================================================================================### ### PREDICTION WITH KERNEL RIDGE REGRESSION SOLVED WITHIN THE MIXED MODEL FRAMEWORK ### ###=================================================================================### #Linear kernel Linear_KRR_model_train = Kernel_Ridge_MM(Y_train=Response_train, Matrix_covariates_train=Predictors_train, method="RR-BLUP") f_hat_target_Linear_KRR = Predict_kernel_Ridge_MM( Linear_KRR_model_train, Matrix_covariates_target=Predictors_target ) #Gaussian kernel Gaussian_KRR_model_train = Kernel_Ridge_MM( Y_train=Response_train, Matrix_covariates_train=Predictors_train, method="RKHS", rate_decay_kernel=5.0) f_hat_target_Gaussian_KRR = Predict_kernel_Ridge_MM( Gaussian_KRR_model_train, Matrix_covariates_target=Predictors_target ) #Graphics for RR-BLUP dev.new(width=30, height=20) par(mfrow=c(3,1)) plot(f_hat_target_Linear_KRR, True_value_target) plot(Linear_KRR_model_train\$Gamma_hat, xlab="Feature (i.e. covariate) number", ylab="Feature effect (i.e. Gamma_hat)", main="BLUP of covariate effects based on training data") hist(Linear_KRR_model_train\$Gamma_hat, main="Distribution of BLUP of covariate effects based on training data" ) # Compare prediction based on linear (i.e. RR-BLUP) and Gaussian kernel par(mfrow=c(1,2)) plot(f_hat_target_Linear_KRR, True_value_target) plot(f_hat_target_Gaussian_KRR, True_value_target) mean((f_hat_target_Linear_KRR - True_value_target)^2) mean((f_hat_target_Gaussian_KRR - True_value_target)^2) ## End(Not run) ```

KRMM documentation built on May 2, 2019, 2:50 p.m.