sc_sim_pheno: sc_sim_pheno
In vincentgarin/mppSim: Functions for simulation in MPP

sc_sim_pheno

R Documentation

sc_sim_pheno

Description

Simulate phenotypes with featured and residual causal variants in a single cross situation. It is possible to add a dilution, with noncausal SNP in the genomic feature.

Usage

sc_sim_pheno(
  X,
  map,
  n_QTL = 100,
  QTL_distribution = "random",
  min_distance = NULL,
  prop_f = 0,
  Sp = 100,
  mu = 50,
  h2 = 0.5,
  h2f = 0,
  scale_mk = FALSE,
  theta = 0.95,
  kj = "decrease_random",
  corr_f = 1,
  dilution_percent = 0
)

Arguments

`X`	`Numeric` genotype marker matrix with genotype x marker information coded as 0, 1, 2 representing the number of copies of the minor allele.
`map`	genetic marker map: marker id, chromosome, genetic position, physical position.
`n_QTL`	number of QTLs (causal variants) to be divided into featured and residual variants. Default = 100
`QTL_distribution`	Fixed to "random" (for the moment)
`min_distance`	Minimum distance in kb between two selected QTLs. Default = NULL. Not used now
`prop_f`	Proportion of featured causal variants. Default = 0
`Sp`	Phenotypic variance. Default = 100
`mu`	Grand mean. Default = 50
`h2`	heritability. Default = 0.5
`h2f`	Proportion of genetic variance due to the featured (annotated) markers. Default = 0
`scale_mk`	`logical` value specifying if marker scores should be scaled. Default = FALSE
`theta`	Parameter values of the marker distribution function (`\theta^{n_{QTL}}`). Default = 0.95
`kj`	Character string specifying the distribution of proportion by the causal variant in the featured and residual class. One of "constant", "decrease_along_genome", "decrease_random". Default = "decrease_random"
`corr_f`	`numeric` value as correction factor in from of the simulated QTL effects. Default = 1 (no correction correspond to unscaled marker scores).
`dilution_percent`	`numeric` value. Proportion of noncausal markers which are added to the featured markers. Dilution percent must take value between 0 and 100. Default = 0

Details

The phenotype are simulated following this sequence:

1. Underlying model: P_{i} = \mu + G_i + \epsilon_{i} (unreplicated genotypes) with variance structure

\sigma_{p}^2 = \sigma_{g}^2 + \sigma_{\epsilon}^2

with \sigma_{g}^2 = \sigma_{g_f}^2 + \sigma_{g_r}^2

\sigma_{g_f}^2: genetic variance due to functional SNP

\sigma_{g_r}^2: residual genetic variance

The latter are linked through the following relationship h_f^2 = \frac{\sigma_{g_f}^2}{\sigma_{g}^2} = \frac{\sigma_{g_f}^2}{\sigma_{g_f}^2 + \sigma_{g_r}^2}

2. Fix \sigma_{p}^2 = 100 and \mu = 50

3. Get the genetic variance given assumed variance \sigma_{g}^2 = h^2*\sigma_{p}^2

4. Get the genetic variance under functional annotations \sigma_{g_f}^2 = h_f^2*\sigma_{g}^2

5. Get the residual genetic variance \sigma_{g_r}^2 = \sigma_{g}^2 - \sigma_{g_f}^2

6. Select n_{SNP} that are spread into the f and the r class to form X_{f} and X_{r} given the percentage of markers associated to each category.

7. Calculate for each selected marker the minor allele frequency p_j

8. Determine the marker effect distribution using the following function

f(\theta, j) = \theta^{j} \quad j=1, ..., n_{SNP}

9. Determine the proportion of genetic variance accounted for associated to each marker. This is done for each class of effect separately. The values of \theta^{j} are randomly sampled from the general distribution to respect the general distribution of the marker effects.

k_j(f) = \frac{\theta^{j}}{\sum_{m=1}^{n_{SNP(f)}}{\theta^{m}}}

10. For each selected SNP, determine the marker effect to meet the determined \sigma_{g_f}^2 and \sigma_{g_r}^2 using the formula proposed by Mollandin et al. (2022)

\beta_{j(f)} = \pm (\frac{1}{2}) \sqrt{\frac{k_{j(f)}*\sigma_{g_f}^2}{2pq}}

11. Form the simulated phenotypes

\tilde{y} = \mu + X_{s(f)}'\beta_{j(f)} + X_{s(r)}'\beta_{j(r)} + \epsilon_{i}

where X_{s(f)} (X_{s(r)}) is the centered and scaled marker matrix containing the f (r) markers.

and \epsilon_i \sim N(0, \sigma_{\epsilon}^2) where \sigma_{\epsilon}^2 = (1-h^2) * \sigma_{p}^2

12. Annotation dilution

n_{QTL.polluted} = n_{QTL} * \frac{dilution.percent}{100}

Dilution percent : if there is 100 causal markers to be divided into featured and residual variants, 10 to 10 markers non causal added to the causal featured ones.

Value

list containing the following object

d_y: data.frame with: simulated phenotype (y_sim), featured causal variants contribution (y_f), residual causal variants contribution (y_r), error contribution (e_i)
mk_sel_f: selected featured causal variants (if dilution_percent != 0, Beta = 0 for noncausal variants)
mk_sel_r: selected residual causal variants
X_f: marker matrix of the featured causal variants (if dilution_percent != 0, marker matrix with also noncausal variants featured selected)
X_r: marker matrix of the residual causal variants
Xs_f: standardized marker matrix of the featured causal variants (if dilution_percent != 0, marker matrix with also noncausal variants featured selected)
Xs_r: standardized marker matrix of the residual causal variants
Bf: Additive effect of the featured causal variants
Br: Additive effect of the residual causal variants

Examples


## Not run: 
# Generate genotype data
data("geno_par")
rownames(geno_par) <- paste0('P', 1:nrow(geno_par))

data("EUNAM_map")
rownames(map) <- map[, 1]

# single cross
cross_scheme <- cross_scheme_NAM(np = 2)
geno_par <- geno_par[1:2, ]

geno <- sim_mpp_cross(geno_par = geno_par, map = map,
                      cross_scheme = cross_scheme,
                      n_ind_cr = rep(100, nrow(cross_scheme)))

# without dilution :
geno_c1 <- geno$geno_num_IBD
y_sim <- sc_sim_pheno(X = geno_c1, map = map)

var(y_sim$d_y$y_r)

# test simulation consistency
res <- matrix(NA, nrow = 100, ncol = 4)

for(i in 1:100){

  d <- sc_sim_pheno(X = geno_c1, map = map, n_QTL = 1)
  res[i, ] <- diag(var(d$d_y))


}

colnames(res) <- c("Sp", "Sgf", "Sg", "Se")
res <- res[, -2]
res <- data.frame(res)
boxplot(res)

abline(h = 100, col = "green")
abline(h = 50, col = "blue")

# with 10% of dilution (not yet available)
# y_sim_dil <- sc_sim_pheno(X = geno, map = map, dilution_percent = 10)

## End(Not run)

vincentgarin/mppSim documentation built on Oct. 11, 2024, 3:18 p.m.