Generate familial timetoevent data
Description
This function generates familial timetoevent data for specified study design, genetic model and source of residual familial correlation; the generated data frame also contains family structure (individual's id, father id, mother id, relationship to proband, generation), gender, current age, genotypes of major or second genes.
Usage
1 2 3 4 
Arguments
N.fam 
Number of families to generate. 
design 
Family based study design used in the simulations. Possible choices are: 
variation 
Source of residual familial correlation. Possible choices are: 
depend 
Variance of the frailty distribution. Dependence within families increases with depend value. Default value is 1. 
base.dist 
Choice of baseline hazard distribution. Possible choices are: 
frailty.dist 
Choice of frailty distribution. Possible choices are: 
base.parms 
Vector of parameter values for baseline hazard function.

vbeta 
Vector of parameter values for gender, majorgene, and secondgene. 
allelefreq 
Vector of population allele frequencies of major and second disease gene alleles. Frequencies must be between 0 and 1. Default frequencies are 0.02 for major gene allele and 0.2 for second gene allele, 
dominant.m 
logical; if TRUE, the genetic model of major gene is dominant, otherwise recessive. 
dominant.s 
logical; if TRUE, the genetic model of second gene is dominant, otherwise recessive. 
mrate 
Proportion of missing genotypes, value between 0 and 1. Default value is 0. 
hr 
Proportion of high risk families, which include at least two affected members, to be sampled from the two stage sampling. This value should be specified when 
age1 
Vector of mean and standard deviation for the current age of generation 1 or grandparents. Default values are mean of 65 years and standard deviation of 2.5 years, 
age2 
Vector of mean and standard deviation for the current age of generation 2 or proband generation. Default values are mean of 45 years and standard deviation of 2.5 years, 
agemin 
Minimum age of disease onset. Default is 20 years of age. 
Details
The design
argument defines the type of family based design to be simulated. Two variants of the populationbased and clinicbased design can be chosen: "pop"
when proband is affected, "pop+"
when proband is affected mutation carrier, "cli"
when proband is affected and at least one parent and one sibling are affected, "cli+"
when proband is affected mutationcarrier and at least one parent and one sibling are affected. The twostage design, "twostage"
, is used to oversample high risk families, where the proportion of high risks families to include in the sample is specified by hr
. High risk families often include multiple (at least two) affected members in the family.
Age at onset is generated from the penetrance model where residual familial correlation is induced by either a latent random variable called "frailty"" or a second gene shared by family members.
The penetrance model with a shared frailty model has the form
h(tZ) = h_0(tt_0) Z \exp(β_s x_s + β_{g1} x_{g1})
where Z represents a frailty shared within families and follows either a gamma or lognormal distribution; t_0 is a minimum age of disease onset; x_s indicates males (1) and females (0) and x_{g1} indicates carriers (1) and noncarriers (0) of major gene mutation.
The penetrance model with a second gene variation has the form
h(tZ) = h_0(tt_0) \exp(β_s x_s + β_{g1} x_{g1} + β_{g2} x_{g2})
where x_{g2} indicates carriers (1) and noncarriers (0) of a second gene mutation.
The current ages for each generation are simulated assuming normal distributions. However, the probands' ages are generated using a left truncated normal distribution as their ages cannot be less than the minimum age of onset. The mean age difference between each generation and their parents is specified as at least 20 years apart.
Value
The function returns a data frame which contains:
famID 
Family identification number (id). 
indID 
Individual id. 
gender 
Gender indicator: 1 for males, 0 for females. 
motherID 
Mother id number. 
fatherID 
Father id number. 
proband 
Proband indicator: 1 if the individual is the proband, 0 otherwise. 
generation 
Individuals generation:1=parents of probands,2=probands and siblings,3=children of probands and siblings. 
majorgene 
Genotype of major gene: 1=AA, 2=Aa, 3=aa where A is disease gene. 
secondgene 
Genotype of second gene: 1=BB, 2=Bb, 3=bb where B is disease gene. 
ageonset 
Age at disease onset. 
currentage 
Current age. 
time 
Minimum time between current age and age at onset. 
status 
Disease status: 1 for affected and 0 for unaffected (censored). 
mgene 
Carrier status of major gene which can possibly be missing: 1 for carrier, 2 for noncarrier, NA for missing carrier status 
relation 
Family members' relationship with the proband is as follows 
1  Proband (self) 
2  Brother or sister 
3  Son or daughter 
4  Parent 
5  Nephew or niece 
6  Husband 
7  Brother or sister in law 
fsize 
Family size including parents, siblings and children of the proband and the siblings. 
naff 
Number of affected members in family. 
weight 
Sampling weights. 
Author(s)
YunHee Choi, Wenqing He
References
Choi, Y.H., Kopciuk, K. and Briollais, L. (2008) Estimating Disease Risk Associated Mutated Genes in FamilyBased Designs, Human Heredity 66, 238251
Choi, Y.H. and Briollais (2011) An EM Composite Likelihood Approach for Multistage Sampling of Family Data with Missing Genetic Covariates, Statistica Sinica 21, 231253
See Also
summary.simfam, plot.simfam, penplot
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37  ## Example 1: simulate family data from populationbased design using
# a Weibull distribution for the baseline hazard and inducing
# residual familial correlation through a shared gamma frailty.
fam < simfam(N.fam=100, design="pop+", variation="frailty",
base.dist="Weibull", frailty.dist="gamma", depend=1,
allelefreq=0.02, base.parms=c(0.01,3), vbeta=c(1.13, 2.35))
head(fam)
# famID indID gender motherID fatherID proband generation majorgene secondgene
# 1 1 1 1 0 0 0 1 2 0
# 2 1 2 0 0 0 0 1 3 0
# 3 1 3 0 2 1 1 2 2 0
# 4 1 4 1 0 0 0 0 3 0
# 5 1 7 0 3 4 0 3 2 0
# 6 1 8 1 3 4 0 3 3 0
# ageonset currentage time status mgene relation fsize naff weight
# 1 70 68 68 0 1 4 11 1 1
# 2 110 68 68 0 0 4 11 1 1
# 3 36 40 36 1 1 1 11 1 1
# 4 212 50 50 0 0 6 11 1 1
# 5 79 19 19 0 1 3 11 1 1
# 6 169 16 16 0 0 3 11 1 1
summary(fam)
plot(fam, famid=c(1:2)) # pedigree plots for families with IDs=1 and 2
## Example 2: simulate family data from two stage design to include
# 30% of high risk families in the sample.
fam < simfam(N.fam=100, design="twostage", variation="frailty",
base.dist="Weibull", frailty.dist="gamma", depend=1, hr=0.3,
base.parms=c(0.01,3), vbeta=c(1.13, 2.35), allelefreq=0.02)
summary(fam)
