generate_dependence: Generate pleiotropic associations between SNPs and...

View source: R/generate_dependence.R

generate_dependenceR Documentation

Generate pleiotropic associations between SNPs and phenotypes.

Description

This function sets the association pattern and the effect sizes between SNP and phenotype objects previously obtained from the functions generate_snps or replicate_real_snps, and generate_phenos or replicate_real_phenos. It therefore adds a genetic contribution to the phenotypic data.

Usage

generate_dependence(
  list_snps,
  list_phenos,
  ind_d0,
  ind_p0,
  vec_prob_sh,
  pat = NULL,
  family = "gaussian",
  pve_per_snp = NULL,
  max_tot_pve = 0.5,
  block_phenos = FALSE,
  user_seed = NULL
)

Arguments

list_snps

An object of class "list_snps" or "sim_snps" containing SNPs and their corresponding sample minor allele frequencies. It must be obtained from the function convert_snps, generate_snps or replicate_real_snps.

list_phenos

An object of class "list_phenos" or "sim_phenos" containing phenotypic data variables, their sample variance and block structure information. It must be obtained from the function convert_phenos, generate_phenos or replicate_real_phenos.

ind_d0

A vector of indices specifying the position of the "active" phenotypes (i.e., which will be associated with at least one SNP). Must range between 1 and ncol(list_phenos$phenos). Must be NULL if pat is supplied.

ind_p0

A vector of indices specifying the position of the "active" SNPs (i.e., which will be associated with at least one phenotype). Must range between 1 and ncol(list_snps$snps). Must be NULL if pat is supplied.

vec_prob_sh

If block_phenos is FALSE (default), vector of length 1 or length(ind_p0) providing the probabilities with which each active SNP will be associated with an additional active phenotype. If block_phenos is TRUE, the vector must have size between 1 and ncol(list_phenos$phenos) and gives the set of probabilities with which an active SNP is associated with an additional active phenotype is specific to each phenotypic block. Must be NULL if pat is supplied.

pat

Boolean matrix of size ncol(list_snps$snps) x ncol(list_phenos$phenos) which can be supplied to set the association pattern, instead of providing ind_d0 and ind_p0. Must be NULL if ind_d0 and ind_p0 are provided.

family

Distribution used to generate the phenotypes. Must be either "gaussian" or "binomial" for binary phenotypes.

pve_per_snp

Average proportion of phenotypic variance explained by each active SNP (for an active phenotype). Must be NULL if max_tot_pve is provided. See Details section.

max_tot_pve

Maximum proportion of phenotypic variance explained by the active SNPs across all phenotypes. Must be NULL if pve_per_snp is provided. See Details section.

block_phenos

Boolean for deciding whether the values in vec_prob_sh should be randomly selected and assigned differently to each block of phenotypes (if the phenotypes have no block-correlation structure, blocks are defined articially and the number of blocks corresponds to the length of the vector vec_prob_sh). Default is FALSE, no phenotypic block structure is used to create the association pattern. Not used if pat is supplied.

user_seed

Seed set for reproducibility. Default is NULL, no seed set.

Details

The user can provide using the argument vec_prob_sh a selection of probabilities describing the propensity with which a given active SNP (i.e., associated with at least one phenotype) will be associated with active phenotypes (i.e., associated with at least one SNP). If block_phenos is FALSE (default), the association pattern is created independently of any structure in the phenotype matrix. If block_phenos is TRUE, then if the phenotypes have been generated with some block correlation structure, this block structure will be used to specify the correlation pattern, else, if the phenotypes were generated independently one from another, blocks are defined articially and the number of blocks corresponds to the length of the vector vec_prob_sh). More precisely, for each active SNP and each phenotypic block, a value from this vector is selected uniformly at random; for instance a large probability implies that the SNPs is highly likely to be associated with each active phenotype in the block. If a single value is provided, all active SNPs will have the same probability to be associated with active phenotypes of all blocks.

The user can provide either argument pve_per_snp, specifying the average proportion of phenotypic variance explained per active SNP for a given active phenotype, or max_tot_pve, specifying the maximum value for an active phenotype of its proportion of variance explained by the cummulated genetic effects. If both pve_per_snp and max_tot_pve are NULL, the proportion of phenotypic variance explained per SNP is set to its maximum value so that the total proportion of variance explained for the phenotypes are all below 1. Individual proportions of variance explained are drawn from a Beta distribution with shape parameters 2 and 5, putting more weights on smaller effects.

If family is "binomial", the phenotypes are generated from a probit model, and the phenotypic variance explained by the SNPs is with respect to the latent Gaussian variables involved in the probit model.

Value

An object of class "sim_data".

phenos

Matrix containing the updated phenotypic data (whose variance is now partly explained by genetic effects).

snps

Matrix containing the original SNPs data.

beta

Matrix containing the generated effect sizes between the SNPs (rows) and phenotypes (columns).

pat

Matrix of booleans specifying the generated association pattern between the SNPs (rows) and phenotypes (columns).

pve_per_snp

Average proportion of phenotypic variance explained by each active SNP (for an active phenotype).

See Also

convert_snps, generate_snps, replicate_real_snps, convert_phenos, generate_phenos, replicate_real_phenos

Examples

user_seed <- 123; set.seed(user_seed)
n <- 500; p <- 5000; p0 <- 200; d <- 500; d0 <- 400

list_snps <- generate_snps(n = n, p = p)

cor_type <- "equicorrelated"; vec_rho <- runif(100, min = 0.25, max = 0.95)

list_phenos <- generate_phenos(n, d, cor_type = cor_type, vec_rho = vec_rho,
                               n_cpus = 1)

# Gaussian phenotypes
dat_g <- generate_dependence(list_snps, list_phenos, ind_d0 = sample(1:d, d0),
                           ind_p0 = sample(1:p, p0), vec_prob_sh = 0.05,
                           family = "gaussian", max_tot_pve = 0.5)

# Binary phenotypes
dat_b <- generate_dependence(list_snps, list_phenos, ind_d0 = sample(1:d, d0),
                           ind_p0 = sample(1:p, p0), vec_prob_sh = 0.05,
                           family = "binomial", max_tot_pve = 0.5)


hruffieux/echoseq documentation built on Jan. 10, 2024, 10:06 p.m.