Description Usage Arguments Details Value Author(s) See Also Examples
trioSim()
simulates parental genotypes, child genotypes,
environmental attribute and sub-population membership on affected trios with
informative mating types
from a stratified population. All genotypes are at a test locus that is linked to a causal locus.
1 | trioSim(n, popfs, hapfs, edists, recomb = 0, riskmod, batchsize = 1000)
|
n |
A desired number of informative case-parent trios to simulate. |
popfs |
A vector of sub-population frequencies whose length is equal to the number of sub-populations. |
hapfs |
A list comprised of vectors of haplotype frequencies. One haplotype frequency vector for each sub-population. See Details for the assumed order of haplotypes. |
edists |
A list comprised of functions to simulate the environment attribute. One simulation function for each sub-population. |
recomb |
Recombination frequency between the test and causal locus. Currently not implemented. The function will stop execution if a non-zero value is specified. |
riskmod |
A function to evaluate the risk (probability) of disease. The function should take two arguments. The first is the child's genotype, and the second is the environmental attribute. |
batchsize |
Size of the batches of trios to simulate. See Details for more information. |
The function simulates trios from a stratified population. Population stratification is controlled by the user's choice of sub-population sizes, haplotype frequencies in each sub-population and the distribution of the environmental attribute in each sub-population. Given sub-population sizes, the degree of population stratification increases with greater differences in the distributions of the haplotype frequency and the environmental attribute among sub-populations.
The function first simulates sub-population membership for each trio using
the sub-population frequencies supplied by the user in the argument popfs
.
Conditional on sub-population, parental haplotypes Hp are simulated assuming
Hardy-Weinberg proportions using the subpopulation-specific haplotype frequencies
in the argument hapfs
.
Haplotype frequencies should be in the order N0, N1, R0, R1, where N and R denote
the non-risk and risk alleles at the causal locus, and 0 and 1 denote the non-index
and index alleles at the test locus.
To save computation time, we only considered informative parental mating types by simulating one parent from the conditional distribution given that the parent is heterozygous and simulating the other parent without any restrictions. Conditional on parental haplotypes, child haplotypes are sampled according to Mendel's laws. From the sampled haplotypes of the parents and children, their genotypes for the causal and test loci are extracted.
Assuming conditional independence between the gene and the environmental attribute given
sub-population, the environmental attribute for each trio is simulated conditional on
sub-population using the subpopulation-specific simulation functions in the argument edists
.
Finally, disease status is simulated according to the risk model in the argument riskmod
;
only those trios with affected children are retained.
To speed up computation, the rejection sampling of trios is done in batches of size 'batchsize'
until a desired number of affected trios is obtained.
In simulation studies we have performed, choosing batchsize
on the order of 1/3 the
desired number of trios appeared to be the fastest.
A data frame with columns
parent1 |
Test locus genotypes for one parent (heterozygous) coded as 0, 1, 2 representing the number of copies of the index allele. |
parent2 |
Test locus genotype for the other parent. |
child |
Test locus genotypes for the child. |
subpop |
Sub-population membership for the trio. Sub-populations are numbered 0,1,...,k-1, where k is the number of sub-populations. |
attr |
The environmental attribute. |
Ji-Hyung Shin <shin@sfu.ca>, Brad McNeney <mcneney@sfu.ca>, Jinko Graham <jgraham@sfu.ca>
trioGxE
, plot.trioGxE
, test.trioGxE
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 | # Generate case-parent trio from a population composed of
# two equal sized subpopulations.
# Set up list of functions to sample from each E distribution
e1<-function(n) {
return(rnorm(n,mean=(-0.8),sd=sqrt(1-.8^2)))
}
e2<-function(n) {
return(rnorm(n,mean=(0.8),sd=sqrt(1-.8^2)))
}
# Set up haplotype frequency distributions in the two subpopulations:
# The first subpopulation has the risk allele frequency of 0.1, where as
# the second subpopulation's frequency is 0.9.
# Set up risk model function.
## Simulate informative case-parent trios under additive linear GxE with a negative slope
riskmod<-function(G,E) {
n<-length(G)
# Baseline risk. Affects disease prevalence.
# The higher the prevalence, the less time wasted
# rejecting unaffected trios.
k<-(-2)
betaG<-log(3)/2
# Interaction
betaGE<-(-0.1)
# quadratic GxE
rr<-exp(k+betaG*G + betaGE*G*E)
rr[rr>1]<-1 # It is up to the user to make sure there are
# no probabilities greater than one.
D<-rbinom(n=n,size=1,prob=rr)
return(D)
}
# Simulate trio data under haplotype-environment dependence
# when marker locus is causal locus.
# allele frequency in subpop 0 is 0.1, allele frequency in subpop 1 is 0.9.
hapf1=c(0.9, 0, 0, 0.1)
hapf2=c(0.1, 0, 0, 0.9)
simdat.HEdep<-trioSim(n=3000,popfs=c(0.5,0.5),riskmod=riskmod,
edists=list(e1,e2),hapfs=list(hapf1,hapf2),
recomb=0,batchsize=1000)
# Simulate trio data under haplotype-environment independence
# when marker locus is causal locus.
# allele frequency in subpop 0 and subpop 1 is 0.1.
hapf1=hapf2=c(0.9, 0, 0, 0.1)
simdat.HEindep<-trioSim(n=3000,popfs=c(0.5,0.5),riskmod=riskmod,
edists=list(e1,e2),hapfs=list(hapf1,hapf2),
recomb=0,batchsize=1000)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.