sc_sim_pheno: sc_sim_pheno

View source: R/sc_sim_pheno.R

sc_sim_phenoR Documentation

sc_sim_pheno

Description

Simulate phenotypes with featured and residual causal variants in a single cross situation. It is possible to add a dilution, with noncausal SNP in the genomic feature.

Usage

sc_sim_pheno(
  X,
  map,
  n_QTL = 100,
  QTL_distribution = "random",
  min_distance = NULL,
  prop_f = 0,
  Sp = 100,
  mu = 50,
  h2 = 0.5,
  h2f = 0,
  scale_mk = FALSE,
  theta = 0.95,
  kj = "decrease_random",
  corr_f = 1,
  dilution_percent = 0
)

Arguments

X

Numeric genotype marker matrix with genotype x marker information coded as 0, 1, 2 representing the number of copies of the minor allele.

map

genetic marker map: marker id, chromosome, genetic position, physical position.

n_QTL

number of QTLs (causal variants) to be divided into featured and residual variants. Default = 100

QTL_distribution

Fixed to "random" (for the moment)

min_distance

Minimum distance in kb between two selected QTLs. Default = NULL. Not used now

prop_f

Proportion of featured causal variants. Default = 0

Sp

Phenotypic variance. Default = 100

mu

Grand mean. Default = 50

h2

heritability. Default = 0.5

h2f

Proportion of genetic variance due to the featured (annotated) markers. Default = 0

scale_mk

logical value specifying if marker scores should be scaled. Default = FALSE

theta

Parameter values of the marker distribution function (\theta^{n_{QTL}}). Default = 0.95

kj

Character string specifying the distribution of proportion by the causal variant in the featured and residual class. One of "constant", "decrease_along_genome", "decrease_random". Default = "decrease_random"

corr_f

numeric value as correction factor in from of the simulated QTL effects. Default = 1 (no correction correspond to unscaled marker scores).

dilution_percent

numeric value. Proportion of noncausal markers which are added to the featured markers. Dilution percent must take value between 0 and 100. Default = 0

Details

The phenotype are simulated following this sequence:

1. Underlying model: P_{i} = \mu + G_i + \epsilon_{i} (unreplicated genotypes) with variance structure

\sigma_{p}^2 = \sigma_{g}^2 + \sigma_{\epsilon}^2

with \sigma_{g}^2 = \sigma_{g_f}^2 + \sigma_{g_r}^2

\sigma_{g_f}^2: genetic variance due to functional SNP

\sigma_{g_r}^2: residual genetic variance

The latter are linked through the following relationship h_f^2 = \frac{\sigma_{g_f}^2}{\sigma_{g}^2} = \frac{\sigma_{g_f}^2}{\sigma_{g_f}^2 + \sigma_{g_r}^2}

2. Fix \sigma_{p}^2 = 100 and \mu = 50

3. Get the genetic variance given assumed variance \sigma_{g}^2 = h^2*\sigma_{p}^2

4. Get the genetic variance under functional annotations \sigma_{g_f}^2 = h_f^2*\sigma_{g}^2

5. Get the residual genetic variance \sigma_{g_r}^2 = \sigma_{g}^2 - \sigma_{g_f}^2

6. Select n_{SNP} that are spread into the f and the r class to form X_{f} and X_{r} given the percentage of markers associated to each category.

7. Calculate for each selected marker the minor allele frequency p_j

8. Determine the marker effect distribution using the following function

f(\theta, j) = \theta^{j} \quad j=1, ..., n_{SNP}

9. Determine the proportion of genetic variance accounted for associated to each marker. This is done for each class of effect separately. The values of \theta^{j} are randomly sampled from the general distribution to respect the general distribution of the marker effects.

k_j(f) = \frac{\theta^{j}}{\sum_{m=1}^{n_{SNP(f)}}{\theta^{m}}}

10. For each selected SNP, determine the marker effect to meet the determined \sigma_{g_f}^2 and \sigma_{g_r}^2 using the formula proposed by Mollandin et al. (2022)

\beta_{j(f)} = \pm (\frac{1}{2}) \sqrt{\frac{k_{j(f)}*\sigma_{g_f}^2}{2pq}}

11. Form the simulated phenotypes

\tilde{y} = \mu + X_{s(f)}'\beta_{j(f)} + X_{s(r)}'\beta_{j(r)} + \epsilon_{i}

where X_{s(f)} (X_{s(r)}) is the centered and scaled marker matrix containing the f (r) markers.

and \epsilon_i \sim N(0, \sigma_{\epsilon}^2) where \sigma_{\epsilon}^2 = (1-h^2) * \sigma_{p}^2

12. Annotation dilution

n_{QTL.polluted} = n_{QTL} * \frac{dilution.percent}{100}

Dilution percent : if there is 100 causal markers to be divided into featured and residual variants, 10 to 10 markers non causal added to the causal featured ones.

Value

list containing the following object

  1. d_y: data.frame with: simulated phenotype (y_sim), featured causal variants contribution (y_f), residual causal variants contribution (y_r), error contribution (e_i)

  2. mk_sel_f: selected featured causal variants (if dilution_percent != 0, Beta = 0 for noncausal variants)

  3. mk_sel_r: selected residual causal variants

  4. X_f: marker matrix of the featured causal variants (if dilution_percent != 0, marker matrix with also noncausal variants featured selected)

  5. X_r: marker matrix of the residual causal variants

  6. Xs_f: standardized marker matrix of the featured causal variants (if dilution_percent != 0, marker matrix with also noncausal variants featured selected)

  7. Xs_r: standardized marker matrix of the residual causal variants

  8. Bf: Additive effect of the featured causal variants

  9. Br: Additive effect of the residual causal variants

Examples


## Not run: 
# Generate genotype data
data("geno_par")
rownames(geno_par) <- paste0('P', 1:nrow(geno_par))

data("EUNAM_map")
rownames(map) <- map[, 1]

# single cross
cross_scheme <- cross_scheme_NAM(np = 2)
geno_par <- geno_par[1:2, ]

geno <- sim_mpp_cross(geno_par = geno_par, map = map,
                      cross_scheme = cross_scheme,
                      n_ind_cr = rep(100, nrow(cross_scheme)))

# without dilution :
geno_c1 <- geno$geno_num_IBD
y_sim <- sc_sim_pheno(X = geno_c1, map = map)

var(y_sim$d_y$y_r)

# test simulation consistency
res <- matrix(NA, nrow = 100, ncol = 4)

for(i in 1:100){

  d <- sc_sim_pheno(X = geno_c1, map = map, n_QTL = 1)
  res[i, ] <- diag(var(d$d_y))


}

colnames(res) <- c("Sp", "Sgf", "Sg", "Se")
res <- res[, -2]
res <- data.frame(res)
boxplot(res)

abline(h = 100, col = "green")
abline(h = 50, col = "blue")

# with 10% of dilution (not yet available)
# y_sim_dil <- sc_sim_pheno(X = geno, map = map, dilution_percent = 10)

## End(Not run)


vincentgarin/mppSim documentation built on Oct. 11, 2024, 3:18 p.m.