View source: R/generate_data.R
DGP | R Documentation |
Generate a data set consisting of:
anno
: (snps x 1) annotation vector.
covar
: (subjects x 6) covariate matrix.
geno
: (subjects x snps) genotype matrix.
pheno
: (subjects x 1) phenotype vector.
type
: Either "binary" or "quantitative".
DGP(
anno = NULL,
beta = c(1, 2, 3),
binary = FALSE,
geno = NULL,
include_residual = TRUE,
indicator = FALSE,
maf_range = c(0.001, 0.005),
method = "none",
n = 100,
prop_anno = c(0.5, 0.4, 0.1),
prop_causal = 1,
random_signs = FALSE,
random_var = 0,
snps = 100,
weights = c(1, 1, 1)
)
anno |
Annotation vector, if providing genotypes. Should match the number of columns in geno. |
beta |
If method = "none", a (L x 1) coefficient with effect sizes for each annotation category. By default, there are L = 3 annotation categories corresponding to BMVs, DMVs, and PTVs. If method != "none", a scalar effect size for the allelic series burden score. |
binary |
Generate binary phenotype? Default: FALSE. |
geno |
Genotype matrix, if providing genotypes. |
include_residual |
Include residual? If FALSE, returns the expected value. Intended for testing. |
indicator |
Convert raw counts to indicators? Default: FALSE. |
maf_range |
Range of minor allele frequencies: c(MIN, MAX). |
method |
Genotype aggregation method. Default: "none". |
n |
Sample size. |
prop_anno |
Proportions of annotations in each category. Length should equal the number of annotation categories. Default of c(0.5, 0.4, 0.1) is based on the approximate empirical frequencies of BMVs, DMVs, and PTVs. |
prop_causal |
Proportion of variants which are causal. Default: 1.0. |
random_signs |
Randomize signs? FALSE for burden-type genetic architecture, TRUE for SKAT-type. |
random_var |
Frailty variance in the case of random signs. Default: 0. |
snps |
Number of SNP in the gene. Default: 100. |
weights |
Annotation category weights. Length should match |
List containing: genotypes, annotations, covariates, phenotypes.
# Generate data.
data <- DGP(n = 100)
# View components.
table(data$anno)
head(data$covar)
head(data$geno[, 1:5])
hist(data$pheno)
# Generate data with L != 3 categories.
data <- DGP(
beta = c(1, 2, 3, 4),
prop_anno = c(0.25, 0.25, 0.25, 0.25),
weights = c(1, 1, 1, 1)
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.