mc_sim_pheno: mc_sim_pheno
In vincentgarin/mppSim: Functions for simulation in MPP

mc_sim_pheno

R Documentation

mc_sim_pheno

Description

Simulate phenotypes with featured and residual causal variants for multiple crosses. It is possible to add a dilution, with noncausal SNP in the genomic feature.

Usage

mc_sim_pheno(
  X,
  map,
  n_QTL = 100,
  QTL_distribution = "random",
  min_distance = NULL,
  prop_f = 0,
  Sp = 100,
  mu = 50,
  h2 = 0.5,
  h2f = 0,
  scale_mk = FALSE,
  theta = 0.95,
  kj = "decrease_random",
  k_val = NULL,
  corr_f = 0.5,
  dilution_percent = 0,
  type_effect = "equal",
  s_effect = 0.9
)

Arguments

`X`	`Numeric` genotype marker matrix with genotype x marker information coded as 0, 1, 2 representing the number of copies of the minor allele.
`map`	genetic marker map: marker id, chromosome, genetic position, physical position.
`n_QTL`	number of QTLs (causal variants) to be divided into featured and residual variants. Default = 100
`QTL_distribution`	Fixed to "random" (for the moment)
`min_distance`	Minimum distance in kb between two selected QTLs. Default = NULL. Not used now
`prop_f`	Proportion of featured causal variants. Default = 0.5
`Sp`	Phenotypic variance. Default = 100
`mu`	Grand mean. Default = 50
`h2`	heritability. Default = 0.5
`h2f`	Proportion of genetic variance due to the featured (annotated) markers. Default = 0.5
`scale_mk`	`logical` value specifying if marker scores should be scaled. Default = FALSE. NOT YET AVAILABLE!
`theta`	Parameter values of the marker distribution function (`\theta^{n_{QTL}}`). Default = 0.95
`kj`	Character string specifying the distribution of proportion by the causal variant in the featured and residual class. One of "constant", "decrease_along_genome", "decrease_random", "user_specified". Default = "decrease_random"
`k_val`	`numerical` vector specifing QTL effect given by the user. Default = NULL
`corr_f`	`numeric` value as correction factor in from of the simulated QTL effects. Default = 0.5
`dilution_percent`	`numeric` value. Proportion of noncausal markers which are added to the featured markers. Dilution percent must take value between 0 and 100. Default = 0
`type_effect`	Character string specifying le type d'effet des marqueurs selon les croisements. "equal" pour que tous les croisements aient le même effet. "series" pour que les effets des croisements suivent une série allélique avec le paramètre s_effect. (s, s^2, s^3...). Default = "equal".
`s_effet`	`numeric` value. Effet de la série allélique s, uniquement utilisé si type_effet = "series". Default = 0.9.

Details

The phenotype are simulated following this sequence:

1. Underlying model: P_{i} = \mu + G_i + \epsilon_{i} (unreplicated genotypes) with variance structure

\sigma_{p}^2 = \sigma_{g}^2 + \sigma_{\epsilon}^2

with \sigma_{g}^2 = \sigma_{g_f}^2 + \sigma_{g_r}^2

\sigma_{g_f}^2: genetic variance due to functional SNP

\sigma_{g_r}^2: residual genetic variance

The latter are linked through the following relationship h_f^2 = \frac{\sigma_{g_f}^2}{\sigma_{g}^2} = \frac{\sigma_{g_f}^2}{\sigma_{g_f}^2 + \sigma_{g_r}^2}

2. Fix \sigma_{p}^2 = 100 and \mu = 50

3. Get the genetic variance given assumed variance \sigma_{g}^2 = h^2*\sigma_{p}^2

4. Get the genetic variance under functional annotations \sigma_{g_f}^2 = h_f^2*\sigma_{g}^2

5. Get the residual genetic variance \sigma_{g_r}^2 = \sigma_{g}^2 - \sigma_{g_f}^2

6. Select n_{SNP} that are spread into the f and the r class to form X_{f} and X_{r} given the percentage of markers associated to each category.

############ MODIFIER CA 7. Calculate for each selected marker the minor allele frequency p_j

8. Determine the marker effect distribution using the following function

f(\theta, j) = \theta^{j} \quad j=1, ..., n_{SNP}

9. Determine the proportion of genetic variance accounted for associated to each marker. This is done for each class of effect separately. The values of \theta^{j} are randomly sampled from the general distribution to respect the general distribution of the marker effects.

k_j(f) = \frac{\theta^{j}}{\sum_{m=1}^{n_{SNP(f)}}{\theta^{m}}}

10. For each selected SNP, determine the marker effect to meet the determined \sigma_{g_f}^2 and \sigma_{g_r}^2 using the formula proposed by Mollandin et al. (2022)

\beta_{j(f)} = \pm (\frac{1}{2}) \sqrt{\frac{k_{j(f)}*\sigma_{g_f}^2}{2pq}}

11. Form the simulated phenotypes

\tilde{y} = \mu + X_{s(f)}'\beta_{j(f)} + X_{s(r)}'\beta_{j(r)} + \epsilon_{i}

where X_{s(f)} (X_{s(r)}) is the centered and scaled marker matrix containing the f (r) markers.

and \epsilon_i \sim N(0, \sigma_{\epsilon}^2) where \sigma_{\epsilon}^2 = (1-h^2) * \sigma_{p}^2

##############

12. Annotation dilution

n_{QTL.polluted} = n_{QTL} * \frac{dilution.percent}{100}

Dilution percent : if there is 100 causal markers to be divided into featured and residual variants, 10 to 10 markers non causal added to the causal featured ones.

Value

list containing the following object

d_y: data.frame with: simulated phenotype (y_sim), featured causal variants contribution (y_f), residual causal variants contribution (y_r), error contribution (e_i)
mk_sel_f: selected featured causal variants
mk_sel_r: selected residual causal variants
X_f: marker matrix of the featured causal variants (if dilution_percent != 0, marker matrix with also noncausal variants featured selected)
X_r: marker matrix of the residual causal variants
Bf: Additive effect of the featured causal variants
Br: Additive effect of the residual causal variants

Examples


# Generate genotype data
data("geno_par")
rownames(geno_par) <- paste0('P', 1:nrow(geno_par))

data("EUNAM_map")
rownames(map) <- map[, 1]

# single cross
cross_scheme <- cross_scheme_NAM(np = 5)
geno_par <- geno_par[1:5, ]

geno <- sim_mpp_cross(geno_par = geno_par, map = map,
                      cross_scheme = cross_scheme,
                      n_ind_cr = rep(100, nrow(cross_scheme)))

# type effect : equal
y_sim <- mc_sim_pheno(X = geno$geno_num_IBD, map = map,
                            type_effect = "equal")

var(y_sim$d_y$y_r)

# test simulation consistency
res <- matrix(NA, nrow = 100, ncol = 4)

for(i in 1:100){

  d <- mc_sim_pheno(X = geno$geno_num_IBD, map = map,
                    type_effect = "equal")
  res[i, ] <- diag(var(d$d_y))


}

colnames(res) <- c("Sp", "Sgf", "Sg", "Se")
res <- res[, -2]
res <- data.frame(res)
boxplot(res)

abline(h = 100, col = "green")
abline(h = 50, col = "blue")

# type effect : series
y_sim <- mc_sim_pheno(X = geno$geno_num_IBD, map = map,
                      type_effect = "series", s_effect = 0.9)

var(y_sim$d_y$y_r)

# test simulation consistency
res <- matrix(NA, nrow = 100, ncol = 4)

for(i in 1:100){

  d <- mc_sim_pheno(X = geno$geno_num_IBD, map = map,
                    type_effect = "series", s_effect = 0.9)
  res[i, ] <- diag(var(d$d_y))


}

colnames(res) <- c("Sp", "Sgf", "Sg", "Se")
res <- res[, -2]
res <- data.frame(res)
boxplot(res)

abline(h = 100, col = "green")
abline(h = 50, col = "blue")

# with 10% of dilution (not yes) :
# y_sim_dil <- mc_sim_pheno(X = geno$geno_num_IBD, map = map,
dilution_percent = 10)

vincentgarin/mppSim documentation built on Oct. 11, 2024, 3:18 p.m.