data.gen: Data Generation

Description Usage Arguments Value Examples

View source: R/data.gen.R

Description

Generates genotypes data matrix G (sample_size by p), vector of environmental measurments E, and an outcome vector Y of size sample_size. Simulates training, validation, and test datasets.

Usage

1
2
3
4
5
data.gen(sample_size = 100, p = 20, n_g_non_zero = 15, n_gxe_non_zero = 10, 
         family = "gaussian", mode = "strong_hierarchical", 
         normalize = FALSE, normalize_response = FALSE, 
         seed = 1, pG = 0.2, pE = 0.3,
         n_confounders = NULL)

Arguments

sample_size

sample size of the data

p

total number of main effects

n_g_non_zero

number of non-zero main effects to generate

n_gxe_non_zero

number of non-zero interaction effects to generate

family

"gaussian" for continous outcome Y and "binomial" for binary 0/1 outcome

mode

either "strong_hierarchical", "hierarchical", or "anti_hierarchical". In the strong hierarchical mode the hierarchical structure is maintained (beta_g = 0 then beta_gxe = 0) and also |beta_g| >= |beta_gxe|. In the hierarchical mode the hierarchical structure is maintained, but |beta_G| < |beta_gxe|. In the anti_hierarchical mode the hierarchical structure is violated (beta_g = 0 then beta_gxe != 0).

normalize

TRUE to normalize matrix G and vector E

normalize_response

TRUE to normalize vector Y

pG

genotypes prevalence, value from 0 to 1

pE

environment prevalence, value from 0 to 1

seed

random seed

n_confounders

number of confounders to generate, either NULL or >1

Value

A list of simulated datasets and generating coefficients

G_train, G_valid, G_test

generated genotypes matrices

E_train, E_valid, E_test

generated vectors of environmental values

Y_train, Y_valid, Y_test

generated outcome vectors

C_train, C_valid, C_test

generated confounders matrices

GxE_train, GxE_valid, GxE_test

generated GxE matrix

Beta_G

main effect coefficients vector

Beta_GxE

interaction coefficients vector

beta_0

intercept coefficient value

beta_E

environment coefficient value

Beta_C

confounders coefficient values

index_beta_non_zero, index_beta_gxe_non_zero, index_beta_zero, index_beta_gxe_zero

inner data generation variables

n_g_non_zero

number of non-zero main effects generated

n_gxe_non_zero

number of non-zero interactions generated

n_total_non_zero

total number of non-zero variables

SNR_g

signal-to-noise ratio for the main effects

SNR_gxe

signal-to-noise ratio for the interactions

family, p, sample_size, mode, seed

input simulation parameters

Examples

1
2
3
data = data.gen(sample_size=100, p=100)
G = data$G_train; GxE = data$GxE_train
E = data$E_train; Y = data$Y_train

gesso documentation built on Nov. 30, 2021, 9:09 a.m.