sim_gen | R Documentation |
Generate simulated GWAS results from the data given an effect size distribution
and distributions of the parameters for that distribution. Can accept ABC results
as produced by ABC_on_hyperparameters
to generate a joint effect size distribution
using 2d kernal smoothing.
sim_gen(
x,
meta,
iters,
center = T,
scheme = "gwas",
effect_distribution = rbayesB,
parameter_distributions = list(pi = function(x) rbeta(x, 25, 1), d.f = function(x)
runif(x, 1, 100), scale = function(x) rbeta(x, 1, 3) * 100),
h_dist = function(x) rep(0.5, x),
par = 1,
joint_res = NULL,
joint_acceptance = NULL,
joint_res_dist = "ks",
peak_delta = 0.5,
peak_pcut = 5e-04,
window_sigma = 50,
phased = FALSE,
maf = 0.05,
pass_windows = NULL,
pass_G = NULL,
GMMAT_infile = NULL,
reg_res = NULL,
find_similar_effects = F,
real_effects = NULL,
save_effects = FALSE
)
x |
object coercible to a matrix or a |
meta |
data.frame. Metadata for the snps included in the GWAS, where the first column is chromosome/scaffold information and the second is position in base pairs. Note that if a subset of the SNPs are used for the GWAS via the pass_windows, pass_G, and GMMAT_infile options, this metadata should correspond to those SNPs, not those in x. |
iters |
numeric. The number of simulations to perform. |
center |
logical, default TRUE. Determines if the phenotypes should be centered prior to the GWAS. |
scheme |
character, default "gwas". The method to use for p-value/effect size estimation. Currenly only supports gwas. |
effect_distribution |
function, default |
parameter_distributions |
list containing named functions, default list(pi = function(x) rbeta(x, 25, 1), d.f = function(x) runif(x, 1, 100), scale = function(x) rbeta(x, 1, 3)*100). Named functions giving the distributions from which to draw effect_size distribution hyperparameters. |
save_effects |
character or FALSE, default FALSE. If true, simulated effects will be saved to filepath provided here. Uses the provided |
GWAS results are generated by first drawing hyperparameter values from either the provided distributions
or from a joint distribution produced by kernal smoothing the results of ABC_on_hyperparameters
using KernSur
. These values are then passed to the provided effect size distribution
in order to draw allele effect sizes. Phenotypes are then calculated for all individuals based on a heritiablity
randomly drawn from the provided heritability distribution. A population and family structure corrected GWAS is
then conducted on the phenotypes and genotypes using the glmmkin
and
glmm.score
functions using a genetic relationship matrix (GRM) between individuals as a covariate.
The GRM is calculated using the method introduced in Yang et al 2010 via Gmatrix
. Note that
this matrix should be identical to that produced by GCTA or other programs that use the same method.
The resulting p-value distributions are then summarized by a wide range of statistics, which are returned for comaparison to GWAS result from the real phenotypes.
Note that very large datasets can result in huge memory and time requirements during this step. As such,
it is possible to pass genotypes as a FBM
instead of a standard matrix/data.table/data.frame.
This will result in quicker phenotype calculations. In addition, it is possible to pre-subset the genotypic data and
produce GRMs, input data for glmm.score
, window identity data for the subset data, and snp
metadata for the subset and pass
this data alongside the full genotypic data. These will be used for the GWAS instead of the full data, which can result
in much faster run-times and much lower memory requirements. As long as the same input files are used for both
sim_gen
and for calculating the real GWAS during downstream application, the overall method should
still be valid.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.