mc_sim_pheno: mc_sim_pheno

View source: R/mc_sim_pheno.R

mc_sim_phenoR Documentation

mc_sim_pheno

Description

Simulate phenotypes with featured and residual causal variants for multiple crosses. It is possible to add a dilution, with noncausal SNP in the genomic feature.

Usage

mc_sim_pheno(
  X,
  map,
  n_QTL = 100,
  QTL_distribution = "random",
  min_distance = NULL,
  prop_f = 0,
  Sp = 100,
  mu = 50,
  h2 = 0.5,
  h2f = 0,
  scale_mk = FALSE,
  theta = 0.95,
  kj = "decrease_random",
  k_val = NULL,
  corr_f = 0.5,
  dilution_percent = 0,
  type_effect = "equal",
  s_effect = 0.9
)

Arguments

X

Numeric genotype marker matrix with genotype x marker information coded as 0, 1, 2 representing the number of copies of the minor allele.

map

genetic marker map: marker id, chromosome, genetic position, physical position.

n_QTL

number of QTLs (causal variants) to be divided into featured and residual variants. Default = 100

QTL_distribution

Fixed to "random" (for the moment)

min_distance

Minimum distance in kb between two selected QTLs. Default = NULL. Not used now

prop_f

Proportion of featured causal variants. Default = 0.5

Sp

Phenotypic variance. Default = 100

mu

Grand mean. Default = 50

h2

heritability. Default = 0.5

h2f

Proportion of genetic variance due to the featured (annotated) markers. Default = 0.5

scale_mk

logical value specifying if marker scores should be scaled. Default = FALSE. NOT YET AVAILABLE!

theta

Parameter values of the marker distribution function (\theta^{n_{QTL}}). Default = 0.95

kj

Character string specifying the distribution of proportion by the causal variant in the featured and residual class. One of "constant", "decrease_along_genome", "decrease_random", "user_specified". Default = "decrease_random"

k_val

numerical vector specifing QTL effect given by the user. Default = NULL

corr_f

numeric value as correction factor in from of the simulated QTL effects. Default = 0.5

dilution_percent

numeric value. Proportion of noncausal markers which are added to the featured markers. Dilution percent must take value between 0 and 100. Default = 0

type_effect

Character string specifying le type d'effet des marqueurs selon les croisements. "equal" pour que tous les croisements aient le même effet. "series" pour que les effets des croisements suivent une série allélique avec le paramètre s_effect. (s, s^2, s^3...). Default = "equal".

s_effet

numeric value. Effet de la série allélique s, uniquement utilisé si type_effet = "series". Default = 0.9.

Details

The phenotype are simulated following this sequence:

1. Underlying model: P_{i} = \mu + G_i + \epsilon_{i} (unreplicated genotypes) with variance structure

\sigma_{p}^2 = \sigma_{g}^2 + \sigma_{\epsilon}^2

with \sigma_{g}^2 = \sigma_{g_f}^2 + \sigma_{g_r}^2

\sigma_{g_f}^2: genetic variance due to functional SNP

\sigma_{g_r}^2: residual genetic variance

The latter are linked through the following relationship h_f^2 = \frac{\sigma_{g_f}^2}{\sigma_{g}^2} = \frac{\sigma_{g_f}^2}{\sigma_{g_f}^2 + \sigma_{g_r}^2}

2. Fix \sigma_{p}^2 = 100 and \mu = 50

3. Get the genetic variance given assumed variance \sigma_{g}^2 = h^2*\sigma_{p}^2

4. Get the genetic variance under functional annotations \sigma_{g_f}^2 = h_f^2*\sigma_{g}^2

5. Get the residual genetic variance \sigma_{g_r}^2 = \sigma_{g}^2 - \sigma_{g_f}^2

6. Select n_{SNP} that are spread into the f and the r class to form X_{f} and X_{r} given the percentage of markers associated to each category.

############ MODIFIER CA 7. Calculate for each selected marker the minor allele frequency p_j

8. Determine the marker effect distribution using the following function

f(\theta, j) = \theta^{j} \quad j=1, ..., n_{SNP}

9. Determine the proportion of genetic variance accounted for associated to each marker. This is done for each class of effect separately. The values of \theta^{j} are randomly sampled from the general distribution to respect the general distribution of the marker effects.

k_j(f) = \frac{\theta^{j}}{\sum_{m=1}^{n_{SNP(f)}}{\theta^{m}}}

10. For each selected SNP, determine the marker effect to meet the determined \sigma_{g_f}^2 and \sigma_{g_r}^2 using the formula proposed by Mollandin et al. (2022)

\beta_{j(f)} = \pm (\frac{1}{2}) \sqrt{\frac{k_{j(f)}*\sigma_{g_f}^2}{2pq}}

11. Form the simulated phenotypes

\tilde{y} = \mu + X_{s(f)}'\beta_{j(f)} + X_{s(r)}'\beta_{j(r)} + \epsilon_{i}

where X_{s(f)} (X_{s(r)}) is the centered and scaled marker matrix containing the f (r) markers.

and \epsilon_i \sim N(0, \sigma_{\epsilon}^2) where \sigma_{\epsilon}^2 = (1-h^2) * \sigma_{p}^2

##############

12. Annotation dilution

n_{QTL.polluted} = n_{QTL} * \frac{dilution.percent}{100}

Dilution percent : if there is 100 causal markers to be divided into featured and residual variants, 10 to 10 markers non causal added to the causal featured ones.

Value

list containing the following object

  1. d_y: data.frame with: simulated phenotype (y_sim), featured causal variants contribution (y_f), residual causal variants contribution (y_r), error contribution (e_i)

  2. mk_sel_f: selected featured causal variants

  3. mk_sel_r: selected residual causal variants

  4. X_f: marker matrix of the featured causal variants (if dilution_percent != 0, marker matrix with also noncausal variants featured selected)

  5. X_r: marker matrix of the residual causal variants

  6. Bf: Additive effect of the featured causal variants

  7. Br: Additive effect of the residual causal variants

Examples


# Generate genotype data
data("geno_par")
rownames(geno_par) <- paste0('P', 1:nrow(geno_par))

data("EUNAM_map")
rownames(map) <- map[, 1]

# single cross
cross_scheme <- cross_scheme_NAM(np = 5)
geno_par <- geno_par[1:5, ]

geno <- sim_mpp_cross(geno_par = geno_par, map = map,
                      cross_scheme = cross_scheme,
                      n_ind_cr = rep(100, nrow(cross_scheme)))

# type effect : equal
y_sim <- mc_sim_pheno(X = geno$geno_num_IBD, map = map,
                            type_effect = "equal")

var(y_sim$d_y$y_r)

# test simulation consistency
res <- matrix(NA, nrow = 100, ncol = 4)

for(i in 1:100){

  d <- mc_sim_pheno(X = geno$geno_num_IBD, map = map,
                    type_effect = "equal")
  res[i, ] <- diag(var(d$d_y))


}

colnames(res) <- c("Sp", "Sgf", "Sg", "Se")
res <- res[, -2]
res <- data.frame(res)
boxplot(res)

abline(h = 100, col = "green")
abline(h = 50, col = "blue")

# type effect : series
y_sim <- mc_sim_pheno(X = geno$geno_num_IBD, map = map,
                      type_effect = "series", s_effect = 0.9)

var(y_sim$d_y$y_r)

# test simulation consistency
res <- matrix(NA, nrow = 100, ncol = 4)

for(i in 1:100){

  d <- mc_sim_pheno(X = geno$geno_num_IBD, map = map,
                    type_effect = "series", s_effect = 0.9)
  res[i, ] <- diag(var(d$d_y))


}

colnames(res) <- c("Sp", "Sgf", "Sg", "Se")
res <- res[, -2]
res <- data.frame(res)
boxplot(res)

abline(h = 100, col = "green")
abline(h = 50, col = "blue")

# with 10% of dilution (not yes) :
# y_sim_dil <- mc_sim_pheno(X = geno$geno_num_IBD, map = map,
dilution_percent = 10)



vincentgarin/mppSim documentation built on Oct. 11, 2024, 3:18 p.m.