DGP: Data Generating Process
In AllelicSeries: Allelic Series Test

View source: R/generate_data.R

DGP	R Documentation

Data Generating Process

Description

Generate a data set consisting of:

anno: (snps x 1) annotation vector.
covar: (subjects x 6) covariate matrix.
geno: (subjects x snps) genotype matrix.
pheno: (subjects x 1) phenotype vector.
type: Either "binary" or "quantitative".

Usage

DGP(
  anno = NULL,
  beta = c(1, 2, 3),
  binary = FALSE,
  geno = NULL,
  include_residual = TRUE,
  indicator = FALSE,
  maf_range = c(0.001, 0.005),
  method = "none",
  n = 100,
  prop_anno = c(0.5, 0.4, 0.1),
  prop_causal = 1,
  random_signs = FALSE,
  random_var = 0,
  snps = 100,
  weights = c(1, 1, 1)
)

Arguments

`anno`	Annotation vector, if providing genotypes. Should match the number of columns in geno.
`beta`	If method = "none", a (L x 1) coefficient with effect sizes for each annotation category. By default, there are L = 3 annotation categories corresponding to BMVs, DMVs, and PTVs. If method != "none", a scalar effect size for the allelic series burden score.
`binary`	Generate binary phenotype? Default: FALSE.
`geno`	Genotype matrix, if providing genotypes.
`include_residual`	Include residual? If FALSE, returns the expected value. Intended for testing.
`indicator`	Convert raw counts to indicators? Default: FALSE.
`maf_range`	Range of minor allele frequencies: c(MIN, MAX).
`method`	Genotype aggregation method. Default: "none".
`n`	Sample size.
`prop_anno`	Proportions of annotations in each category. Length should equal the number of annotation categories. Default of c(0.5, 0.4, 0.1) is based on the approximate empirical frequencies of BMVs, DMVs, and PTVs.
`prop_causal`	Proportion of variants which are causal. Default: 1.0.
`random_signs`	Randomize signs? FALSE for burden-type genetic architecture, TRUE for SKAT-type.
`random_var`	Frailty variance in the case of random signs. Default: 0.
`snps`	Number of SNP in the gene. Default: 100.
`weights`	Annotation category weights. Length should match `prop_anno`.

Value

List containing: genotypes, annotations, covariates, phenotypes.

Examples

# Generate data.
data <- DGP(n = 100)

# View components.
table(data$anno)
head(data$covar)
head(data$geno[, 1:5])
hist(data$pheno)

# Generate data with L != 3 categories.
data <- DGP(
  beta = c(1, 2, 3, 4),
  prop_anno = c(0.25, 0.25, 0.25, 0.25),
  weights = c(1, 1, 1, 1)
)

AllelicSeries documentation built on April 3, 2025, 7:46 p.m.