sc_sim_pheno | R Documentation |
Simulate phenotypes with featured and residual causal variants in a single cross situation. It is possible to add a dilution, with noncausal SNP in the genomic feature.
sc_sim_pheno(
X,
map,
n_QTL = 100,
QTL_distribution = "random",
min_distance = NULL,
prop_f = 0,
Sp = 100,
mu = 50,
h2 = 0.5,
h2f = 0,
scale_mk = FALSE,
theta = 0.95,
kj = "decrease_random",
corr_f = 1,
dilution_percent = 0
)
X |
|
map |
genetic marker map: marker id, chromosome, genetic position, physical position. |
n_QTL |
number of QTLs (causal variants) to be divided into featured and residual variants. Default = 100 |
QTL_distribution |
Fixed to "random" (for the moment) |
min_distance |
Minimum distance in kb between two selected QTLs. Default = NULL. Not used now |
prop_f |
Proportion of featured causal variants. Default = 0 |
Sp |
Phenotypic variance. Default = 100 |
mu |
Grand mean. Default = 50 |
h2 |
heritability. Default = 0.5 |
h2f |
Proportion of genetic variance due to the featured (annotated) markers. Default = 0 |
scale_mk |
|
theta |
Parameter values of the marker distribution function
( |
kj |
Character string specifying the distribution of proportion by the causal variant in the featured and residual class. One of "constant", "decrease_along_genome", "decrease_random". Default = "decrease_random" |
corr_f |
|
dilution_percent |
|
The phenotype are simulated following this sequence:
1. Underlying model: P_{i} = \mu + G_i + \epsilon_{i}
(unreplicated genotypes) with variance structure
\sigma_{p}^2 = \sigma_{g}^2 + \sigma_{\epsilon}^2
with \sigma_{g}^2 = \sigma_{g_f}^2 + \sigma_{g_r}^2
\sigma_{g_f}^2
: genetic variance due to functional SNP
\sigma_{g_r}^2
: residual genetic variance
The latter are linked through the following relationship h_f^2 = \frac{\sigma_{g_f}^2}{\sigma_{g}^2} = \frac{\sigma_{g_f}^2}{\sigma_{g_f}^2 + \sigma_{g_r}^2}
2. Fix \sigma_{p}^2 = 100
and \mu = 50
3. Get the genetic variance given assumed variance \sigma_{g}^2 = h^2*\sigma_{p}^2
4. Get the genetic variance under functional annotations \sigma_{g_f}^2 = h_f^2*\sigma_{g}^2
5. Get the residual genetic variance \sigma_{g_r}^2 = \sigma_{g}^2 - \sigma_{g_f}^2
6. Select n_{SNP}
that are spread into the f
and the r
class to form X_{f}
and X_{r}
given the percentage of markers associated to each category.
7. Calculate for each selected marker the minor allele frequency p_j
8. Determine the marker effect distribution using the following function
f(\theta, j) = \theta^{j} \quad j=1, ..., n_{SNP}
9. Determine the proportion of genetic variance accounted for associated to each marker. This is done for each class of effect separately. The values of \theta^{j}
are randomly sampled from the general distribution to respect the general distribution of the marker effects.
k_j(f) = \frac{\theta^{j}}{\sum_{m=1}^{n_{SNP(f)}}{\theta^{m}}}
10. For each selected SNP, determine the marker effect to meet the determined \sigma_{g_f}^2
and \sigma_{g_r}^2
using the formula proposed by Mollandin et al. (2022)
\beta_{j(f)} = \pm (\frac{1}{2}) \sqrt{\frac{k_{j(f)}*\sigma_{g_f}^2}{2pq}}
11. Form the simulated phenotypes
\tilde{y} = \mu + X_{s(f)}'\beta_{j(f)} + X_{s(r)}'\beta_{j(r)} + \epsilon_{i}
where X_{s(f)}
(X_{s(r)}
) is the centered and scaled marker matrix containing the f
(r
) markers.
and \epsilon_i \sim N(0, \sigma_{\epsilon}^2)
where \sigma_{\epsilon}^2 = (1-h^2) * \sigma_{p}^2
12. Annotation dilution
n_{QTL.polluted} = n_{QTL} * \frac{dilution.percent}{100}
Dilution percent : if there is 100 causal markers to be divided into featured and residual variants, 10 to 10 markers non causal added to the causal featured ones.
list containing the following object
d_y: data.frame with: simulated phenotype (y_sim), featured causal variants contribution (y_f), residual causal variants contribution (y_r), error contribution (e_i)
mk_sel_f: selected featured causal variants (if dilution_percent != 0, Beta = 0 for noncausal variants)
mk_sel_r: selected residual causal variants
X_f: marker matrix of the featured causal variants (if dilution_percent != 0, marker matrix with also noncausal variants featured selected)
X_r: marker matrix of the residual causal variants
Xs_f: standardized marker matrix of the featured causal variants (if dilution_percent != 0, marker matrix with also noncausal variants featured selected)
Xs_r: standardized marker matrix of the residual causal variants
Bf: Additive effect of the featured causal variants
Br: Additive effect of the residual causal variants
## Not run:
# Generate genotype data
data("geno_par")
rownames(geno_par) <- paste0('P', 1:nrow(geno_par))
data("EUNAM_map")
rownames(map) <- map[, 1]
# single cross
cross_scheme <- cross_scheme_NAM(np = 2)
geno_par <- geno_par[1:2, ]
geno <- sim_mpp_cross(geno_par = geno_par, map = map,
cross_scheme = cross_scheme,
n_ind_cr = rep(100, nrow(cross_scheme)))
# without dilution :
geno_c1 <- geno$geno_num_IBD
y_sim <- sc_sim_pheno(X = geno_c1, map = map)
var(y_sim$d_y$y_r)
# test simulation consistency
res <- matrix(NA, nrow = 100, ncol = 4)
for(i in 1:100){
d <- sc_sim_pheno(X = geno_c1, map = map, n_QTL = 1)
res[i, ] <- diag(var(d$d_y))
}
colnames(res) <- c("Sp", "Sgf", "Sg", "Se")
res <- res[, -2]
res <- data.frame(res)
boxplot(res)
abline(h = 100, col = "green")
abline(h = 50, col = "blue")
# with 10% of dilution (not yet available)
# y_sim_dil <- sc_sim_pheno(X = geno, map = map, dilution_percent = 10)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.