generate_phenodata: Functions to generate phenotype data.

Description Usage Arguments Details Value Examples

Description

Functions to generate standard normal or binary phenotypes based on provided genetic data, for specified effect sizes. The functions generate_phenodata_1_simple and generate_phenodata_1 generate one phenotype Y conditional on single nucleotide variants (SNVs) and two covariates. generate_phenodata_2_bvn as well as generate_phenodata_2_copula generate two phenotypes Y1, Y2 with dependence Kendall's tau conditional on the provided SNVs and two covariates.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
generate_phenodata_1_simple(genodata = NULL, type = "quantitative",
  b = 0, a = c(0, 0.5, 0.5))

generate_phenodata_1(genodata = NULL, type = "quantitative", b = 0.6,
  a = c(0, 0.5, 0.5), MAF_cutoff = 1, prop_causal = 0.1,
  direction = "a")

generate_phenodata_2_bvn(genodata = NULL, tau = NULL, b1 = 0,
  b2 = 0, a1 = c(0, 0.5, 0.5), a2 = c(0, 0.5, 0.5))

generate_phenodata_2_copula(genodata = NULL, phi = NULL, tau = 0.5,
  b1 = 0.6, b2 = 0.6, a1 = c(0, 0.5, 0.5), a2 = c(0, 0.5, 0.5),
  MAF_cutoff = 1, prop_causal = 0.1, direction = "a")

Arguments

genodata

Numeric input vector or dataframe containing the genetic variant(s) in columns. Must be in allelic coding 0, 1, 2.

type

String with value "quantitative" or "binary" specifying whether normally-distributed or binary phenotypes are generated.

b

Integer or vector specifying the genetic effect size(s) of the provided SNVs (genodata) in the data generation.

a

Numeric vector specifying the effect sizes of the covariates X1, X2 in the data generation.

MAF_cutoff

Integer specifying a minor allele frequency cutoff to determine among which SNVs the causal SNVs are sampled for the phenotype generation.

prop_causal

Integer specifying the desired percentage of causal SNVs among all SNVs.

direction

String with value "a", "b", or "c" specifying whether all causal SNVs have a positive effect on the phenotypes ("a"), 20% of the causal SNVs have a negative effect and 80% a positive effect on the phenotypes ("b"), or 50% of the causal SNVs have a negative effect and 50% a positive effect on the phenotypes ("c").

tau

Integer specifying Kendall's tau, which determines the dependence between the two generated phenotypes.

b1

Integer or vector specifying the genetic effect size(s) of the provided SNVs (genodata) on the first phenotype in the data generation.

b2

Integer or vector specifying the genetic effect size(s) of the provided SNVs (genodata) on the second phenotype in the data generation.

a1

Numeric vector specifying the effect sizes of the covariates X1, X2 on the first phenotype in the data generation.

a2

Numeric vector specifying the effect sizes of the covariates X1, X2 on the second phenotype in the data generation.

phi

Integer specifying the parameter φ for the dependence between the two generated phenotypes.

Details

In more detail, the function generate_phenodata_1_simple generates a quantitative or binary phenotype Y with n observations, conditional on the specified SNVs with given effect sizes and conditional on one binary and one standard normally-distributed covariate with specified effect sizes. n is given through the provided SNVs.

generate_phenodata_1 provides an extension of generate_phenodata_1_simple and allows to further select the percentage of causal SNVs, a minor allele frequency cutoff on the causal SNVs, and varying effect directions. n is given through the provided SNVs.

The function generate_phenodata_2_bvn generates two quantitative phenotypes Y1, Y2 conditional on one binary and one standard normally-distributed covariate X1, X2 from the bivariate normal distribution so that they have have dependence τ given by Kendall's tau.

The function generate_phenodata_2_copula generates two quantitative phenotypes Y1, Y2 conditional on one binary and one standard normally-distributed covariate X1, X2 from the Clayton copula so that Y1, Y2 are marginally normally distributed and have dependence Kendall's tau specified by tau or phi, using the function generate_clayton_copula.

The genetic effect sizes are the specified numeric values b and b1, b2, respectively, in the functions generate_phenodata_1_simple and generate_phenodata_2_bvn. In generate_phenodata_1 and generate_phenodata_2_copula, the genetic effect sizes are computed by multiplying b or b1, b2, respectively, with the absolute value of the log10-transformed minor allele frequencies, so that rarer variants have larger effect sizes.

Value

A dataframe containing n observations of the phenotype Y or phenotypes Y1, Y2 and of the covariates X1, X2.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
# Generate genetic data:
set.seed(10)
genodata <- generate_genodata(n_SNV = 20, n_ind = 1000)
compute_MAF(genodata)

# Generate different phenotype data:
phenodata1 <- generate_phenodata_1_simple(genodata = genodata[,1],
                                          type = "quantitative", b = 0)
phenodata2 <- generate_phenodata_1_simple(genodata = genodata[,1],
                                          type = "quantitative", b = 2)
phenodata3 <- generate_phenodata_1_simple(genodata = genodata,
                                          type = "quantitative", b = 2)
phenodata4 <- generate_phenodata_1_simple(genodata = genodata,
                                          type = "quantitative",
                                          b = seq(0.1, 2, 0.1))
phenodata5 <- generate_phenodata_1_simple(genodata = genodata[,1],
                                          type = "binary", b = 0)
phenodata6 <- generate_phenodata_1(genodata = genodata[,1],
                                   type = "quantitative", b = 0,
                                   MAF_cutoff = 1, prop_causal = 0.1,
                                   direction = "a")
phenodata7 <- generate_phenodata_1(genodata = genodata,
                                   type = "quantitative", b = 0.6,
                                   MAF_cutoff = 0.1, prop_causal = 0.05,
                                   direction = "a")
phenodata8 <- generate_phenodata_1(genodata = genodata,
                                   type = "quantitative",
                                   b = seq(0.1, 2, 0.1),
                                   MAF_cutoff = 0.1, prop_causal = 0.05,
                                   direction = "a")
phenodata9 <- generate_phenodata_2_bvn(genodata = genodata[,1],
                                       tau = 0.5, b1 = 0, b2 = 0)
phenodata10 <- generate_phenodata_2_bvn(genodata = genodata,
                                        tau = 0.5, b1 = 0, b2 = 0)
phenodata11 <- generate_phenodata_2_bvn(genodata = genodata,
                                        tau = 0.5, b1 = 1,
                                        b2 = seq(0.1,2,0.1))
phenodata12 <- generate_phenodata_2_bvn(genodata = genodata,
                                        tau = 0.5, b1 = 1, b2 = 2)
par(mfrow = c(3, 1))
hist(phenodata12$Y1)
hist(phenodata12$Y2)
plot(phenodata12$Y1, phenodata12$Y2)

phenodata13 <- generate_phenodata_2_copula(genodata = genodata[,1],
                                           MAF_cutoff = 1, prop_causal = 1,
                                           tau = 0.5, b1 = 0, b2 = 0)
phenodata14 <- generate_phenodata_2_copula(genodata = genodata,
                                           MAF_cutoff = 1, prop_causal = 0.5,
                                           tau = 0.5, b1 = 0, b2 = 0)
phenodata15 <- generate_phenodata_2_copula(genodata = genodata,
                                           MAF_cutoff = 1, prop_causal = 0.5,
                                           tau = 0.5, b1 = 0, b2 = 0)
phenodata16 <- generate_phenodata_2_copula(genodata = genodata,
                                           MAF_cutoff = 1, prop_causal = 0.5,
                                           tau = 0.2, b1 = 0.3,
                                           b2 = seq(0.1, 2, 0.1))
phenodata17 <- generate_phenodata_2_copula(genodata = genodata,
                                           MAF_cutoff = 1, prop_causal = 0.5,
                                           tau = 0.2, b1 = 0.3, b2 = 0.3)
par(mfrow = c(3, 1))
hist(phenodata17$Y1)
hist(phenodata17$Y2)
plot(phenodata17$Y1, phenodata17$Y2)

CJAMP documentation built on May 1, 2019, 9:15 p.m.