simfam_c | R Documentation |
Generates familial competing risks data for specified study design, genetic model and source of residual familial correlation; the generated data frame also contains family structure (individual's id, father id, mother id, relationship to proband, generation), gender, current age, genotypes of major or second genes. Similar to the simfam function, except with the addition of the competing
option, which provides competing risks data when competing = TURE
.
simfam_c(N.fam, design = "pop+", variation = "none", interaction = FALSE, depend = NULL, base.dist = c("Weibull", "Weibull"), frailty.dist = "none", base.parms=list(c(0.016, 3),c(0.016, 3)), vbeta=list(c(-1.13, 2.35), c(-1, 2)), competing=TRUE, allelefreq = c(0.02, 0.2), dominant.m = TRUE, dominant.s = TRUE, mrate = 0, hr = 0, probandage = c(45, 2), agemin = 20, agemax = 100)
N.fam |
Number of families to generate. |
design |
Family based study design used in the simulations. Possible choices are: |
variation |
Source of residual familial correlation. Possible choices are: |
interaction |
Logical; if |
depend |
Inverse of variance for the frailty distribution. Dependence within families decreases with depend value. Default is |
base.dist |
Choice of baseline hazard distribution. Possible choices are: |
frailty.dist |
Choice of frailty distribution. Possible choices are: |
base.parms |
Vector of parameter values for the specified baseline hazard function. |
vbeta |
Vector of regression coefficients for gender, majorgene, interaction between gender and majorgene (if |
competing |
Logical; if |
allelefreq |
Vector of population allele frequencies of major and second disease gene alleles. Frequencies must be between 0 and 1. Default frequencies are 0.02 for major gene allele and 0.2 for second gene allele, |
dominant.m |
Logical; if |
dominant.s |
Logical; if |
mrate |
Proportion of missing genotypes, value between 0 and 1. Default value is 0. |
hr |
Proportion of high risk families, which include at least two affected members, to be sampled from the two stage sampling. This value should be specified when |
probandage |
Vector of mean and standard deviation for the proband age. Default values are mean of 45 years and standard deviation of 2 years, |
agemin |
Minimum age of disease onset or minimum age. Default is 20 years of age. |
agemax |
Maximum age of disease onset or maximum age. Default is 100 years of age. |
The design
argument defines the type of family based design to be simulated. Two variants of the population-based and clinic-based design can be chosen: "pop"
when proband is affected, "pop+"
when proband is affected mutation carrier, "cli"
when proband is affected and at least one parent and one sibling are affected, "cli+"
when proband is affected mutation-carrier and at least one parent and one sibling are affected. The two-stage design, "twostage"
, is used to oversample high risk families, where the proportion of high risks families to include in the sample is specified by hr
. High risk families often include multiple (at least two) affected members in the family.
The ages at onset are generated from the following penetrance models depending on the choice of variation = "none", "frailty", "secondgene".
. When variation = "none"
, the ages at onset are independently generated from the proportional hazard model conditional on the gender and carrier status of major gene mutation, X = c(xs, xg).
The ages at onset correlated within families are generated from the shared frailty model (codevariation = "frailty") or the two-gene model (codevariation = "secondene"), where the residual familial correlation is induced by a frailty or a second gene, respectively, shared within the family.
The proportional hazard model
h(t|X) = h0(t - t0) exp(βs * xs + βg * xg),
where h0(t) is the baseline hazard function, t0 is a minimum age of disease onset, xx and xg indicate male (1) or female (0) and carrier (1) or non-carrier (0) of a main gene of interest, respectively.
The shared frailty model
h(t|X,Z) = h0(t - t0) Z exp(βs * xs + βg * xg),
where h0(t) is the baseline hazard function, t0 is a minimum age of disease onset, Z represents a frailty shared within families and follows either a gamma or log-normal distribution, xx and xg indicate male (1) or female (0) and carrier (1) or non-carrier (0) of a main gene of interest, respectively.
The two-gene model
h(t|X) = h0(t - t0) Z exp(βs * xs + β1 * x2 + β2 * x2),
where x1, x2 indicate carriers (1) and non-carriers (0) of a major gene and of second gene mutation, respectively.
Competing risk model
Event 1:
h1(t|X,Z) = h01(t - t0) Z1 exp(βs1 * xs + βg1 * xg),
Event 2:
h2(t|X,Z) = h02(t - t0) Z2 exp(βs2 * xs + βg2 * xg),
where h01(t) and h02(t) are the baseline hazard functions for event 1 and event 2, respectively, t0 is a minimum age of disease onset, Z1 and Z2 are frailties shared within families for each event and follow either a gamma, log-normal, correlateg gamma, or correlated log-normal distributions, xx and xg indicate male (1) or female (0) and carrier (1) or non-carrier (0) of a main gene of interest, respectively.
Choice of frailty distributions for competing risk models
frailty.dist = "gamma"
shares the frailties within families generated from a gamma distribution independently for each competing event, where
Zj follows Gamma(kj, 1/kj).
frailty.dist = "lognormal"
shares the frailties within families generated from a log-normal distribution independently for each competing event, where
Zj follows log-normal distribution with mean 0 and variance (1/kj.
frailty.dist = "cgamma"
shares the frailties within families generated from a correlated gamma distribution to allow the frailties between two events to be correlated, where the correlated gamma frailties (Z1, Z2) are generated with three independent gamma frailties (Y0, Y1, Y2) as follows:
where Y0 from Gamma(k0, 1/k0);
Y1from Gamma(k1, 1/(k0 + k1));
Y2from Gamma(k2, 1/(k0 + k2)).
frailty.dist = "clognormal"
shares the frailties within families generated from a correlated log-normal distribution where
log(Zj) follows a normal distribution with mean 0, variance 1/kj and correlation between two events k0.
depend
should specify the values of related frailty parameters: c(k1, k2)
with frailty.dist = "gamma"
or frailty.dist = "lognormal"
; c(k1, k2, k0)
for frailty.dist = "cgamma"
or frailty.dist = "clognormal"
.
The current ages for each generation are simulated assuming normal distributions. However, the probands' ages are generated using a left truncated normal distribution as their ages cannot be less than the minimum age of onset. The average age difference between each generation and their parents is specified as 20 years apart.
Note that simulating family data under the clinic-based designs ("cli"
or "cli+"
) or the two-stage design can be slower since the ascertainment criteria for the high risk families are difficult to meet in such settings. Especially, "cli"
design could be slower than "cli+"
design since the proband's mutation status is randomly selected from a disease population in "cli"
design, so his/her family members are less likely to be mutation carriers and have less chance to be affected, whereas the probands are all mutation carriers, their family members have higher chance to be carriers and affected by disease. Therefore, "cli"
design requires more iterations to sample high risk families than "cli+"
design. All designs simulations that include variation = "frailty"
could be also slower in order to generate families with specific familial correlations induced by the chosen frailty distribution.
Returns an object of class 'simfam'
, a data frame which contains:
famID |
Family identification (ID) numbers. | ||||||||||||||
indID |
Individual ID numbers. | ||||||||||||||
gender |
Gender indicators: 1 for males, 0 for females. | ||||||||||||||
motherID |
Mother ID numbers. | ||||||||||||||
fatherID |
Father ID numbers. | ||||||||||||||
proband |
Proband indicators: 1 if the individual is the proband, 0 otherwise. | ||||||||||||||
generation |
Individuals generation: 1=parents of probands,2=probands and siblings, 3=children of probands and siblings. | ||||||||||||||
majorgene |
Genotypes of major gene: 1=AA, 2=Aa, 3=aa where A is disease gene. | ||||||||||||||
secondgene |
Genotypes of second gene: 1=BB, 2=Bb, 3=bb where B is disease gene. | ||||||||||||||
ageonset |
Ages at disease onset in years. | ||||||||||||||
currentage |
Current ages in years. | ||||||||||||||
time |
Ages at disease onset for the affected or ages of last follow-up for the unaffected. | ||||||||||||||
status |
Disease statuses: 1 for affected, 0 for unaffected (censored).
When | ||||||||||||||
mgene |
Major gene mutation indicators: 1 for mutated gene carriers, 0 for mutated gene noncarriers, or | ||||||||||||||
relation |
Family members' relationship with the proband:
| ||||||||||||||
fsize |
Family size including parents, siblings and children of the proband and the siblings. | ||||||||||||||
naff |
Number of affected members within family. When | ||||||||||||||
df1 |
Number of members affected by event 1 within family when | ||||||||||||||
df2 |
Number of members affected by event 2 within family when | ||||||||||||||
weight |
Sampling weights. |
Yun-Hee Choi
Choi, Y.-H., Briollais, L., He, W. and Kopciuk, K. (2021) FamEvent: An R Package for Generating and Modeling Time-to-Event Data in Family Designs, Journal of Statistical Software 97 (7), 1-30. doi:10.18637/jss.v097.i07.
Choi, Y.-H., Jung, H., Buys, S., Daly, M., John, E.M., Hopper, J., Andrulis, I., Terry, M.B., Briollais, L. (2021) A Competing Risks Model with Binary Time Varying Covariates for Estimation of Breast Cancer Risks in BRCA1 Families, Statistical Methods in Medical Research 30 (9), 2165-2183. https://doi.org/10.1177/09622802211008945.
Choi, Y.-H., Kopciuk, K. and Briollais, L. (2008) Estimating Disease Risk Associated Mutated Genes in Family-Based Designs, Human Heredity 66, 238-251.
Choi, Y.-H. and Briollais (2011) An EM Composite Likelihood Approach for Multistage Sampling of Family Data with Missing Genetic Covariates, Statistica Sinica 21, 231-253.
summary.simfam_c, plot.simfam_c, penplot_c
## Example 1: simulate competing risk family data from pop+ design using # Weibull distribution for both baseline hazards and inducing # residual familial correlation through a correlated gamma frailty. set.seed(4321) fam <- simfam_c(N.fam = 10, design = "pop+", variation = "frailty", base.dist = "Weibull", frailty.dist = "cgamma", depend=c(1, 2, 0.5), allelefreq = 0.02, base.parms = list(c(0.01, 3), c(0.01, 3)), vbeta = list(c(-1.13, 2.35), c(-1, 2))) head(fam) ## Not run: famID indID gender motherID fatherID proband generation majorgene secondgene ageonset 1 1 1 1 0 0 0 1 3 0 124.23752 2 1 2 0 0 0 0 1 2 0 54.66936 3 1 3 0 2 1 1 2 2 0 32.75208 4 1 4 1 0 0 0 0 3 0 136.44926 5 1 11 1 3 4 0 3 3 0 71.53672 6 1 12 1 3 4 0 3 3 0 152.47073 currentage time status true_status mgene relation fsize naff df1 df2 weight 1 65.30602 65.30602 0 2 0 4 25 2 1 1 1 2 68.62107 54.66936 1 1 1 4 25 2 1 1 1 3 47.07842 32.75208 2 2 1 1 25 2 1 1 1 4 45.09295 45.09295 0 2 0 6 25 2 1 1 1 5 25.32819 25.32819 0 1 0 3 25 2 1 1 1 6 22.95059 22.95059 0 2 0 3 25 2 1 1 1 ## End(Not run) summary(fam) plot(fam, famid = 1) # pedigree plots for family with ID = 1
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.