new_PolyMRDataSim: Simulate exposure, outcome, and genotype data

View source: R/PolyMRDataSim.R

new_PolyMRDataSimR Documentation

Simulate exposure, outcome, and genotype data

Description

Simulates exposure, outcome, and genotype data corresponding to the provided causal function.

Usage

new_PolyMRDataSim(
  sample_size = 1e+05,
  n_exposure_snps = 100,
  exposure_heritability = 0.3,
  causal_function = get_polynomial_function(c(0.1, 0.05)),
  confounders_list = list(new_Confounder(sample_size)),
  finalize = TRUE,
  gws_thr = 5e-08
)

Arguments

sample_size

Sample size (default is 10^5).

n_exposure_snps

Number of SNPs explaining the exposure_heritability (default is 100). Note that this is not equal to the number of instruments as it includes SNPs which will be filtered out upon finalizing the data (see gws_thr).

exposure_heritability

Heritability of the exposure explained by the n_exposure_snps (default is 0.3).

causal_function

Function defining the true relationship between the exposure and the outcome. It should accept a vector of exposure values and return a vector of outcome values of the same length. This represents the pure contribution of the exposure to the outcome and should not include confounding or noise. Default is function(x) 0.1*x + 0.05*x^2. See also [get_polynomial_function()]

confounders_list

A list of objects of class Confounder (see [new_Confounder()]) to be added to the data. Default is a single confounder with linear contributions to both exposure and outcome with coefficients 0.2 and 0.5, respectively.

finalize

Logical indicating whether the data set should be finalized, i.e. errors added to contribute remaining variance and SNPs filtered based on genome-wide significance (default is TRUE).

gws_thr

P-value genome-wide significance threshold to filter SNPs for instrumental variable selection (default is 5e-8).

Value

A list-like object of class PolyMRDataSim. It's main constituents are the exposure and outcome vectors, and the genotypes matrix. In addition, a number of parameters used in data generation are kept as named elements, including n_exposure_snps, exposure_heritability, and causal_function. Some intermediate values are also included, namely the minor allele frequencies (mafs) and the effects on the exposure (exposure_coefficients) of the SNPs remaining after filtering. These represent the ground-truth values used in the data generation.

Examples

  simulated_data <- new_PolyMRDataSim(
    sample_size = 50000,
    n_exposure_snps = 200,
    causal_function = function(x) 0.05*exp(x))


JonSulc/PolyMR documentation built on April 26, 2023, 10:42 a.m.