simfam2: Generate familial time-to-event data with Kinship or IBD...
In FamEvent: Family Age-at-Onset Data Simulation and Penetrance Estimation

simfam2

R Documentation

Generate familial time-to-event data with Kinship or IBD matrices.

Description

Generate familial time-to-event data from correlated fraily model with Kinship or/and IBD matrices given pedigree data.

Usage

simfam2(inputdata = NULL, IBD = NULL, design = "pop", variation = "none", depend = NULL, 
base.dist = "Weibull", base.parms = c(0.016, 3), var_names = c("gender", "mgene"), 
vbeta = c(1, 1), agemin = 20, hr = NULL)

Arguments

`inputdata`	Dataframe contains variables `famID, indID, gender, motherID, fatherID, proband`, `generation, currentage` and other variables to be used in generating time-to-event data.
`IBD`	IBD matrix
`design`	Family based study design used in the simulations. Possible choices are: `"pop"`, `"pop+"`, `"cli"`, `"cli+"`, `"twostage"`, or `"noasc"`, where `"pop"` is for the population-based design that families are ascertained by affected probands, `"pop+"` is similar to `"pop"` but with mutation carrier probands, `"cli"` is for the clinic-based design that includes affected probands with at least one parent and one sib affected, `"cli+"` is similar to `"cli"` but with mutation carrier probands, `"twostage"` for two-stage design that randomly samples families from the population in the first stage and oversamples high risk families in the second stage that include at least two affected members in the family, and `"noasc"` for no ascertainment correction that families are from simple random sampling. Default is `"pop"`.
`variation`	Source of residual familial correlation. Possible choices are: `"kinship"` for correlated frailties within families generated by kinship matrix, `"IBD"` for correlated frailties by IBD matrix, `c("kinship", "IBD")` by both kinship and IBD matrices, or `"none"` for no residual familial correlation. Default is `"none"`.
`depend`	Inverse of variance for the frailty distribution. A single value should be specified when `variation = "IBD"` or `variation = "kinship"` or a vector of two values when `variation = c("kinship", "IBD")`, where the first element corresponds to kinship matrix and the second element corresponds to IBD matrix. Default is `NULL`.
`base.dist`	Choice of baseline hazard distribution. Possible choices are: `"Weibull"`, `"loglogistic"`, `"Gompertz"`, `"lognormal"` `"gamma"`, `"logBurr"`. Default is `"Weibull"`.
`base.parms`	Vector of parameter values for the specified baseline hazard function. `base.parms = c(lambda, rho)` should be specified for `base.dist = "Weibull"`, `"loglogistic"`, `"Gompertz"`, `"gamma"`, and `"lognormal"`. For `base.dist = "logBurr"`, three parameters should be specified `base.parms = c(lambda, rho, eta)`. Default value is `base.parms = c(0.016, 3)` for `base.dist = "Weibull"`.
`var_names`	Names of variables to be used in generating time-to-event data. Specified variables should be part of `inputdata`.
`vbeta`	Vector of regression coefficients for the variables specified by `var_names`.
`hr`	Proportion of high risk families, which include at least two affected members, to be sampled from the two stage sampling. This value should be specified when `design="twostage"`. Default value is 0. Value should lie between 0 and 1.
`agemin`	Minimum age of disease onset or minimum age. Default is 20 years of age.

Details

The ages at onset are generated from the correlated frailties and covariates using the following model:

The correlated shared frailty model with kinship and/or IBD matrices

h(t|X,Z) = h₀(t - t₀) Z exp( X*vbeta ),

where h₀(t) is the baseline hazard function, t₀ is a minimum age of disease onset, Z represents a vector of frailties following a multivariate log-normal distribution with mean 0 and variance 2*K*sig1 + D*sig2, where K represents the kinship matrix and D is IBD matrix, sig1 and sig2 are variance components related to each matrix and their values are specified by depend = c(1/sig1, 1/sig2), and X represents a vector of variables whose names are specified by var_names, and \beta is a vector of corresponding coefficients whose values are specified by vbeta.

The variance structure of the frailties shared within families is chosen by either variation = "kinship" or "IBD" matrix or both variation = c("kinship", "IBD").

When variation = "none", the ages at onset are independently generated from the proportional hazard model conditional on the covariates X.

The design argument defines the type of family based design to be simulated. Two variants of the population-based and clinic-based design can be chosen: "pop" when proband is affected, "pop+" when proband is affected mutation carrier, "cli" when proband is affected and at least one parent and one sibling are affected, "cli+" when proband is affected mutation-carrier and at least one parent and one sibling are affected. The two-stage design, "twostage", is used to oversample high risk families, where the proportion of high risks families to include in the sample is specified by hr. High risk families often include multiple (at least two) affected members in the family. design = "noasc" is to be used for no ascertainment correction.

Value

Returns an object of class 'simfam', a data frame which contains inputdata and the following:

`ageonset`	Ages at disease onset in years.
`time`	Ages at disease onset for the affected or ages of last follow-up for the unaffected.
`status`	Disease statuses: 1 for affected, 0 for unaffected (censored).
`fsize`	Family size including parents, siblings and children of the proband and the siblings.
`naff`	Number of affected members in family.
`weight`	Sampling weights.

References

Choi, Y.-H., Briollais, L., He, W. and Kopciuk, K. (2021) FamEvent: An R Package for Generating and Modeling Time-to-Event Data in Family Designs, Journal of Statistical Software 97 (7), 1-30. doi:10.18637/jss.v097.i07

Choi, Y.-H., Kopciuk, K. and Briollais, L. (2008) Estimating Disease Risk Associated Mutated Genes in Family-Based Designs, Human Heredity 66, 238-251.

Choi, Y.-H. and Briollais (2011) An EM Composite Likelihood Approach for Multistage Sampling of Family Data with Missing Genetic Covariates, Statistica Sinica 21, 231-253.

Examples


## Example: simulate family data from a population-based design using
#  a Weibull distribution for the baseline hazard and inducing 
#  residual familial correlation through kinship and IBD matrices.

# Inputdata and IBD matrix should be provided; 
# simuated inputdata as an example here;

data <- simfam(N.fam = 10, design = "noasc", variation = "none",
         base.dist = "Weibull", base.parms = c(0.016, 3), vbeta = c(1, 1))

IBDmatrix <- diag(1, dim(data)[1])
data <- data[ , c(1:7, 11, 14)]

fam2 <- simfam2(inputdata = data, IBD = IBDmatrix, design = "pop", 
        variation = c("kinship","IBD"), depend = c(1, 1), 
        base.dist = "Weibull", base.parms = c(0.016, 3),
        var_names = c("gender", "mgene"), vbeta = c(1,1),
        agemin=20) 

head(fam2)

summary(fam2)

FamEvent documentation built on July 3, 2024, 5:07 p.m.