View source: R/generate_dependence.R
generate_dependence | R Documentation |
This function sets the association pattern and the effect sizes between SNP
and phenotype objects previously obtained from the functions
generate_snps
or replicate_real_snps
, and
generate_phenos
or replicate_real_phenos
. It
therefore adds a genetic contribution to the phenotypic data.
generate_dependence(
list_snps,
list_phenos,
ind_d0,
ind_p0,
vec_prob_sh,
pat = NULL,
family = "gaussian",
pve_per_snp = NULL,
max_tot_pve = 0.5,
block_phenos = FALSE,
user_seed = NULL
)
list_snps |
An object of class "list_snps" or "sim_snps" containing
SNPs and their corresponding sample minor allele frequencies. It must be
obtained from the function |
list_phenos |
An object of class "list_phenos" or "sim_phenos"
containing phenotypic data variables, their sample variance and block
structure information. It must be obtained from the function
|
ind_d0 |
A vector of indices specifying the position of the "active"
phenotypes (i.e., which will be associated with at least one SNP). Must
range between 1 and |
ind_p0 |
A vector of indices specifying the position of the "active"
SNPs (i.e., which will be associated with at least one phenotype). Must
range between 1 and |
vec_prob_sh |
If |
pat |
Boolean matrix of size |
family |
Distribution used to generate the phenotypes. Must be either
" |
pve_per_snp |
Average proportion of phenotypic variance explained by
each active SNP (for an active phenotype). Must be |
max_tot_pve |
Maximum proportion of phenotypic variance explained by the
active SNPs across all phenotypes. Must be |
block_phenos |
Boolean for deciding whether the values in
|
user_seed |
Seed set for reproducibility. Default is |
The user can provide using the argument vec_prob_sh
a selection of
probabilities describing the propensity with which a given active SNP (i.e.,
associated with at least one phenotype) will be associated with active
phenotypes (i.e., associated with at least one SNP). If block_phenos
is FALSE
(default), the association pattern is created independently
of any structure in the phenotype matrix. If block_phenos
is
TRUE
, then if the phenotypes have been generated with some block
correlation structure, this block structure will be used to specify the
correlation pattern, else, if the phenotypes were generated independently one
from another, blocks are defined articially and the number of blocks
corresponds to the length of the vector vec_prob_sh
). More precisely,
for each active SNP and each phenotypic block, a value from this vector is
selected uniformly at random; for instance a large probability implies that
the SNPs is highly likely to be associated with each active phenotype in the
block. If a single value is provided, all active SNPs will have the same
probability to be associated with active phenotypes of all blocks.
The user can provide either argument pve_per_snp
, specifying the
average proportion of phenotypic variance explained per active SNP for a
given active phenotype, or max_tot_pve
, specifying the maximum value
for an active phenotype of its proportion of variance explained by the
cummulated genetic effects. If both pve_per_snp
and max_tot_pve
are NULL
, the proportion of phenotypic variance explained per SNP is
set to its maximum value so that the total proportion of variance explained
for the phenotypes are all below 1. Individual proportions of variance
explained are drawn from a Beta distribution with shape parameters 2 and 5,
putting more weights on smaller effects.
If family is "binomial
", the phenotypes are generated from a probit
model, and the phenotypic variance explained by the SNPs is with respect to
the latent Gaussian variables involved in the probit model.
An object of class "sim_data
".
phenos |
Matrix containing the updated phenotypic data (whose variance is now partly explained by genetic effects). |
snps |
Matrix containing the original SNPs data. |
beta |
Matrix containing the generated effect sizes between the SNPs (rows) and phenotypes (columns). |
pat |
Matrix of booleans specifying the generated association pattern between the SNPs (rows) and phenotypes (columns). |
pve_per_snp |
Average proportion of phenotypic variance explained by each active SNP (for an active phenotype). |
convert_snps
, generate_snps
,
replicate_real_snps
, convert_phenos
,
generate_phenos
, replicate_real_phenos
user_seed <- 123; set.seed(user_seed)
n <- 500; p <- 5000; p0 <- 200; d <- 500; d0 <- 400
list_snps <- generate_snps(n = n, p = p)
cor_type <- "equicorrelated"; vec_rho <- runif(100, min = 0.25, max = 0.95)
list_phenos <- generate_phenos(n, d, cor_type = cor_type, vec_rho = vec_rho,
n_cpus = 1)
# Gaussian phenotypes
dat_g <- generate_dependence(list_snps, list_phenos, ind_d0 = sample(1:d, d0),
ind_p0 = sample(1:p, p0), vec_prob_sh = 0.05,
family = "gaussian", max_tot_pve = 0.5)
# Binary phenotypes
dat_b <- generate_dependence(list_snps, list_phenos, ind_d0 = sample(1:d, d0),
ind_p0 = sample(1:p, p0), vec_prob_sh = 0.05,
family = "binomial", max_tot_pve = 0.5)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.