sim_gen: Generate simulated GWAS results
In hemstrow/GeneArchEst: Estimate Genomic Architecture for Quantitative Traits

sim_gen

R Documentation

Generate simulated GWAS results

Description

Generate simulated GWAS results from the data given an effect size distribution and distributions of the parameters for that distribution. Can accept ABC results as produced by ABC_on_hyperparameters to generate a joint effect size distribution using 2d kernal smoothing.

Usage

sim_gen(
  x,
  meta,
  iters,
  center = T,
  scheme = "gwas",
  effect_distribution = rbayesB,
  parameter_distributions = list(pi = function(x) rbeta(x, 25, 1), d.f = function(x)
    runif(x, 1, 100), scale = function(x) rbeta(x, 1, 3) * 100),
  h_dist = function(x) rep(0.5, x),
  par = 1,
  joint_res = NULL,
  joint_acceptance = NULL,
  joint_res_dist = "ks",
  peak_delta = 0.5,
  peak_pcut = 5e-04,
  window_sigma = 50,
  phased = FALSE,
  maf = 0.05,
  pass_windows = NULL,
  pass_G = NULL,
  GMMAT_infile = NULL,
  reg_res = NULL,
  find_similar_effects = F,
  real_effects = NULL,
  save_effects = FALSE
)

Arguments

`x`	object coercible to a matrix or a `FBM`. Input genotypic data.
`meta`	data.frame. Metadata for the snps included in the GWAS, where the first column is chromosome/scaffold information and the second is position in base pairs. Note that if a subset of the SNPs are used for the GWAS via the pass_windows, pass_G, and GMMAT_infile options, this metadata should correspond to those SNPs, not those in x.
`iters`	numeric. The number of simulations to perform.
`center`	logical, default TRUE. Determines if the phenotypes should be centered prior to the GWAS.
`scheme`	character, default "gwas". The method to use for p-value/effect size estimation. Currenly only supports gwas.
`effect_distribution`	function, default `rbayesB`. The effect size distribution to use.
`parameter_distributions`	list containing named functions, default list(pi = function(x) rbeta(x, 25, 1), d.f = function(x) runif(x, 1, 100), scale = function(x) rbeta(x, 1, 3)*100). Named functions giving the distributions from which to draw effect_size distribution hyperparameters.
`save_effects`	character or FALSE, default FALSE. If true, simulated effects will be saved to filepath provided here. Uses the provided

Details

GWAS results are generated by first drawing hyperparameter values from either the provided distributions or from a joint distribution produced by kernal smoothing the results of ABC_on_hyperparameters using KernSur. These values are then passed to the provided effect size distribution in order to draw allele effect sizes. Phenotypes are then calculated for all individuals based on a heritiablity randomly drawn from the provided heritability distribution. A population and family structure corrected GWAS is then conducted on the phenotypes and genotypes using the glmmkin and glmm.score functions using a genetic relationship matrix (GRM) between individuals as a covariate. The GRM is calculated using the method introduced in Yang et al 2010 via Gmatrix. Note that this matrix should be identical to that produced by GCTA or other programs that use the same method.

The resulting p-value distributions are then summarized by a wide range of statistics, which are returned for comaparison to GWAS result from the real phenotypes.

Note that very large datasets can result in huge memory and time requirements during this step. As such, it is possible to pass genotypes as a FBM instead of a standard matrix/data.table/data.frame. This will result in quicker phenotype calculations. In addition, it is possible to pre-subset the genotypic data and produce GRMs, input data for glmm.score, window identity data for the subset data, and snp metadata for the subset and pass this data alongside the full genotypic data. These will be used for the GWAS instead of the full data, which can result in much faster run-times and much lower memory requirements. As long as the same input files are used for both sim_gen and for calculating the real GWAS during downstream application, the overall method should still be valid.

hemstrow/GeneArchEst documentation built on June 10, 2025, 5:06 a.m.