A simulation study of various methods estimating variance components. We compare their prediction and computation performance.
We currently simulate a single response vector $y$ from two kernels:
$$ y = s(\eta) $$ $$ \eta \sim N(0, \Sigma) $$ $$ \Sigma = \epsilon I + \sigma^2 V $$ $$ V_{i,j} = k(G_i, G_j) $$
where the response $y$ was created by distorting the total effect $\eta$ via a function $s$, such that non-normality can be generated.
The total effect $\eta$ was drawn from a centered normal distribution with covariance matrix $\Sigma$;
The two variance components $\epsilon$ and $\sigma^2$ weight the contribution of white noise and genetics to the variation of $y$, respectively;
The genetic covariance matrix $V$ was calculated from the genome data $G$ via a kernel function $k$;
to induce heterogeneity between the training and testing data, a small variation N(0, 0.1) was added to the two components.
Genome Data $G$:
N = 1200 for both training and testing data.
Genomic Kernels $V$:
Gaussian Kernel (standardized), $V_{i,j} = e^{\frac{||g_i - g_j||_2^2}{2P}}$
Distortion s on total effect $\eta$
As part of the later, variance component driven meta analysis, all methods were only allowed to see the pre-computed genomic kernel matrix $V$ and the already centered response $y$.
The current development relies on MINQUE, formally developed by [Rao 1971], which estimate the variance components via solving a transformed linear system in closed form instead of via gradient guided iterative approach.
A modified version was recently implemented by one of the authors [xiaoxi 2018] to guaranteed the estimated variance components are never nagative.
Some variants are now implemented:
The combination of 2 and 3 are now expected to yield optimal performance in both prediction accuray and computation, since previously we had demonstrated that "polynomial" MINQUE can out-perform linear MINQUE and other competitors by a large margin (i.e., versus REML and MLE).
For a tolerable loss of accuracy, we expect the batched training significantly reducing the computation time due to the $O(N^3)$ complexity of inversion and eigen decomposion involved in MINQUE.
Batched MINQUE is deviates from the of mini-batch stochastic gradient descent because it does not involve the mechanism of iterative update.
The main competitor is GCTA-REML developed by [2012 Yang], which have undergone continuous improvement and has become the leading tool for fast whole genomic analysis. However, GCTA itself does not compute other than linear kernel, we therefore wrap GCTA to allow any valid PSD kernel as input so a fair comparison can be made.
We use both kernels to generate data with no distortion, and allow all methods to use the very correct kernel for development and prediction.
All methods were expected to perform well, as a proof of fair and correct use or coding of competing methods (i.e., GCTA's REML, R's MLE, and the oracle covariance matrix)
Generate data using Gaussian kernel, without distortion, but force all methods to develop models with a linear kernel.
This time we generate total effect with linear kernel, and apply the tow types of distortion aforementioned. We force all methods to build their model as if the prior knowledge of distortion was unavailable.
For all methods, the model fitting are directly or indirectly driven by the likihood of $y$. We hereby choose mean squre error of the predicted $\hat{y}$ as the benchmark of prediction performance, since MSE was not driving the model development. All MSE were gauged relatve to the null model that makes prediction by the averaging $\bar{y}$.
For GCTA, we use its reported time which excludes the creation of kernel matrices which are indeed pre-computed. For others methods including MINQUE, the estimation of variance components are counted as spending time.
The time are gauged relative to that consumed by GCTA.
In this section we will show the prediction performance under the aforementioned four scenarios, followed by the computation time under one of the scenarios since each method and variant spent roughly the same amount of time in each simulation.
In short, linear kernel for response generation with no distortion, the same type of kernel for model development.
The prediction performance is show here:
As expected, the MSE of GCTA-REML and Linear MINQUE were very close, also very close to that of MLE (not plotted here), suggesting that the wrapping of GCTA was reliable.
Unexpected, for both training and testing data, model developed on smaller batches showed lower MSE than larger batches and even the one built on the entire sample.
We use Gaussian kernel to generate response without distortion, use linear kernel for model development.
In this scenario REML was inadequate, but both MINQUE variants were robust. The same unexpected inverse association between batch size and MSE also happend again.
We use lineaer kernel to generate total effect which was in turn plugged into a sigmoid function to get the final response variable $y$.
Again it show that MINQUE was robust with non normal response.
We selectively plotted the time performance of unchallanged baseline test. The time spent were similar for other scenarios.
Under chanllenge, MSE on testing data, in assending order * Polynomial MINQUE, batched * Polynomial MINQUE, Whole sample * Linear MINQUE, batched * Linear MINQUE, Whole sample * R's MLE implementation (not plotted) * REML's GCTA
The training time was, in assending order * Linear MINQUE, batched * Polynomial MINQUE, batched * Linear MINQUE, Whole sample * Polynomial MINQUE, Whole sample * REML's GCTA (the reference) * R's MLE implementation (not plotted)
MINQUE is not a iterative procedure, the current batched training was done by averaging the solutions of all batches; an epoch is defined by the result collecting from N/M batches, where N is the sample size, M is the batch size, respectively. When an epoch is through, the N samples were shuffled and an new epoch begins.
Here we reported the result of 2 epochs for batched MINQUE, and the unreported performance of a single epoch is similar in terms of MSE, but spent less time.
The batched MINQUE -- the averaging of sub-sample models may imply * Logistically, Meta-analysis be a better choice than Mega-analysis if the participating populations are roughly homogenous. * a way to improvement over gradient guided iterative precedures, by averaging a number of models built from mini-batches, instead of going thourgh all the mini-batches to build just one model.
New issues: while the standard error of variance component estimates under $H_0$ is now implemented for a single MINQUE problem. How do we get the pooled SE for a number of batches?
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.