Description Usage Arguments Details Value Examples

Simulate a complex trait *y* given a SNP genotype matrix and model parameters (the desired heritability and the true ancestral allele frequencies used to generate the genotypes, or alternatively the kinship matrix of the individuals).
Users can choose the number of causal loci and minimum marginal allele frequency requirements for the causal loci.
The code selects random loci to be causal, draws random Normal effect sizes for these loci (scaled appropriately) and random independent non-genetic effects.
Below let there be *m* loci and *n* individuals.

1 2 3 |

`X` |
The |

`m_causal` |
The number of causal loci desired. |

`herit` |
The desired heritability (proportion of trait variance due to genetics). |

`p_anc` |
The length- |

`kinship` |
The |

`mu` |
The desired parametric mean value of the trait (default zero).
The sample mean of the trait will not be exactly zero, but instead have an expectation of |

`sigma_sq` |
The desired parametric variance factor of the trait (default 1).
This factor corresponds to the variance of an outbred individual (see |

`maf_cut` |
The optional minimum allele frequency threshold (default 5%).
This prevents rare alleles from being causal in the simulation.
Note that this threshold is applied to the sample allele frequencies and not their true parametric values ( |

`loci_on_cols` |
If |

`mem_factor` |
BEDMatrix-specific, sets proportion of available memory to use loading genotypes.
Ignored if |

`mem_lim` |
BEDMatrix-specific, sets total memory to use loading genotypes, in GB.
If |

In order to center and scale the trait and locus effect size vector correctly to the desired parameters (mean, variance factor, and heritability), the parametric ancestral allele frequencies (`p_anc`

) must be known.
This is necessary since in the context of Heritability the genotypes are themselves random variables (with means given by `p_anc`

and a covariance structure given by `p_anc`

and the kinship matrix), so the parameters of the genotypes must be taken into account.
If `p_anc`

are indeed known (true for simulated genotypes), then the trait will have the specified mean and covariance matrix in agreement with `cov_trait`

.

If the desire is to simulate a trait using real genotypes, where `p_anc`

is unknown, a compromise that works well in practice is possible if the kinship matrix (`kinship`

) is known (see package vignette).
The kinship matrix can be estimated accurately using the `popkin`

package!

A list containing the simulated `trait`

(length *n*), the vector of causal locus indexes `causal_indexes`

(length *m_causal*), and the locus effect size vector `causal_coeffs`

(length *m_causal*) at the causal loci.
However, if `herit = 0`

then `causal_indexes`

and `causal_coeffs`

will have zero length regardless of `m_causal`

.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | ```
# construct a dummy genotype matrix
X <- matrix(
data = c(0,1,2,1,2,1,0,0,1),
nrow = 3,
byrow = TRUE
)
# made up ancestral allele frequency vector for example
p_anc <- c(0.5, 0.6, 0.2)
# create simulated trait and associated data
obj <- sim_trait(X = X, m_causal = 2, herit = 0.8, p_anc = p_anc)
# trait vector
obj$trait
# randomly-picked causal locus indexes
obj$causal_indexes
# locus effect size vector
obj$causal_coeffs
``` |

OchoaLab/simtrait documentation built on Oct. 18, 2019, 5:42 a.m.

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.