Description Usage Arguments Details Value Author(s) References See Also Examples
Generates familial time-to-event data for specified study design, genetic model and source of residual familial correlation; the generated data frame also contains family structure (individual's id, father id, mother id, relationship to proband, generation), gender, current age, genotypes of major or second genes.
1 2 3 4 |
N.fam |
Number of families to generate. |
design |
Family based study design used in the simulations. Possible choices are: |
variation |
Source of residual familial correlation. Possible choices are: |
depend |
Variance of the frailty distribution. Dependence within families increases with depend value. Default value is 1. |
base.dist |
Choice of baseline hazard distribution. Possible choices are: |
frailty.dist |
Choice of frailty distribution. Possible choices are: |
base.parms |
Vector of parameter values for baseline hazard function.
|
vbeta |
Vector of parameter values for gender, majorgene, and secondgene. |
allelefreq |
Vector of population allele frequencies of major and second disease gene alleles. Frequencies must be between 0 and 1. Default frequencies are 0.02 for major gene allele and 0.2 for second gene allele, |
dominant.m |
Logical; if TRUE, the genetic model of major gene is dominant, otherwise recessive. |
dominant.s |
Logical; if TRUE, the genetic model of second gene is dominant, otherwise recessive. |
mrate |
Proportion of missing genotypes, value between 0 and 1. Default value is 0. |
hr |
Proportion of high risk families, which include at least two affected members, to be sampled from the two stage sampling. This value should be specified when |
age1 |
Vector of mean and standard deviation for the current age of generation 1 or grandparents. Default values are mean of 65 years and standard deviation of 2.5 years, |
age2 |
Vector of mean and standard deviation for the current age of generation 2 or proband generation. Default values are mean of 45 years and standard deviation of 2.5 years, |
agemin |
Minimum age of disease onset. Default is 20 years of age. |
The design
argument defines the type of family based design to be simulated. Two variants of the population-based and clinic-based design can be chosen: "pop"
when proband is affected, "pop+"
when proband is affected mutation carrier, "cli"
when proband is affected and at least one parent and one sibling are affected, "cli+"
when proband is affected mutation-carrier and at least one parent and one sibling are affected. The two-stage design, "twostage"
, is used to oversample high risk families, where the proportion of high risks families to include in the sample is specified by hr
. High risk families often include multiple (at least two) affected members in the family.
Age at onset is generated from the penetrance model where residual familial correlation is induced by either a latent random variable called "frailty"" or a second gene shared by family members.
The penetrance model with a shared frailty model has the form
h(t|Z) = h_0(t-t_0) Z \exp(β_s x_s + β_{g1} x_{g1})
where Z represents a frailty shared within families and follows either a gamma or log-normal distribution; t_0 is a minimum age of disease onset; x_s indicates males (1) and females (0) and x_{g1} indicates carriers (1) and non-carriers (0) of major gene mutation.
The penetrance model with a second gene variation has the form
h(t|Z) = h_0(t-t_0) \exp(β_s x_s + β_{g1} x_{g1} + β_{g2} x_{g2})
where x_{g2} indicates carriers (1) and non-carriers (0) of a second gene mutation.
The current ages for each generation are simulated assuming normal distributions. However, the probands' ages are generated using a left truncated normal distribution as their ages cannot be less than the minimum age of onset. The mean age difference between each generation and their parents is specified as at least 20 years apart.
Returns an object of class 'simfam'
, a data frame which contains:
famID |
Family identification number (id). |
indID |
Individual id. |
gender |
Gender indicator: 1 for males, 0 for females. |
motherID |
Mother id number. |
fatherID |
Father id number. |
proband |
Proband indicator: 1 if the individual is the proband, 0 otherwise. |
generation |
Individuals generation:1=parents of probands,2=probands and siblings,3=children of probands and siblings. |
majorgene |
Genotype of major gene: 1=AA, 2=Aa, 3=aa where A is disease gene. |
secondgene |
Genotype of second gene: 1=BB, 2=Bb, 3=bb where B is disease gene. |
ageonset |
Age at disease onset. |
currentage |
Current age. |
time |
Minimum time between current age and age at onset. |
status |
Disease status: 1 for affected and 0 for unaffected (censored). |
mgene |
Carrier status of major gene which can possibly be missing: 1 for carrier, 2 for non-carrier, NA for missing carrier status. |
relation |
Family members' relationship with the proband is as follows: |
1 | Proband (self) |
2 | Brother or sister |
3 | Son or daughter |
4 | Parent |
5 | Nephew or niece |
6 | Husband |
7 | Brother or sister in law |
fsize |
Family size including parents, siblings and children of the proband and the siblings. |
naff |
Number of affected members in family. |
weight |
Sampling weights. |
Yun-Hee Choi, Wenqing He
Choi, Y.-H., Kopciuk, K. and Briollais, L. (2008) Estimating Disease Risk Associated Mutated Genes in Family-Based Designs, Human Heredity 66, 238-251
Choi, Y.-H. and Briollais (2011) An EM Composite Likelihood Approach for Multistage Sampling of Family Data with Missing Genetic Covariates, Statistica Sinica 21, 231-253
summary.simfam, plot.simfam, penplot
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | ## Example 1: simulate family data from population-based design using
# a Weibull distribution for the baseline hazard and inducing
# residual familial correlation through a shared gamma frailty.
fam <- simfam(N.fam=100, design="pop+", variation="frailty",
base.dist="Weibull", frailty.dist="gamma", depend=1,
allelefreq=0.02, base.parms=c(0.01,3), vbeta=c(-1.13, 2.35))
head(fam)
## Not run:
famID indID gender motherID fatherID proband generation majorgene secondgene
1 1 1 1 0 0 0 1 2 0
2 1 2 0 0 0 0 1 3 0
3 1 3 0 2 1 1 2 2 0
4 1 4 1 0 0 0 0 3 0
5 1 7 0 3 4 0 3 2 0
6 1 8 1 3 4 0 3 3 0
ageonset currentage time status mgene relation fsize naff weight
1 70 68 68 0 1 4 11 1 1
2 110 68 68 0 0 4 11 1 1
3 36 40 36 1 1 1 11 1 1
4 212 50 50 0 0 6 11 1 1
5 79 19 19 0 1 3 11 1 1
6 169 16 16 0 0 3 11 1 1
## End(Not run)
summary(fam)
plot(fam, famid=c(1:2)) # pedigree plots for families with IDs=1 and 2
## Example 2: simulate family data from two stage design to include
# 30% of high risk families in the sample.
fam <- simfam(N.fam=100, design="twostage", variation="frailty",
base.dist="Weibull", frailty.dist="gamma", depend=1, hr=0.3,
base.parms=c(0.01,3), vbeta=c(-1.13, 2.35), allelefreq=0.02)
summary(fam)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.