simfam2: Generate familial time-to-event data with Kinship or IBD...

View source: R/simfam2.R

simfam2R Documentation

Generate familial time-to-event data with Kinship or IBD matrices.

Description

Generate familial time-to-event data from correlated fraily model with Kinship or/and IBD matrices given pedigree data.

Usage

simfam2(inputdata = NULL, IBD = NULL, design = "pop", variation = "none", depend = NULL, 
base.dist = "Weibull", base.parms = c(0.016, 3), var_names = c("gender", "mgene"), 
vbeta = c(1, 1), agemin = 20, hr = NULL)

Arguments

inputdata

Dataframe contains variables famID, indID, gender, motherID, fatherID, proband, generation, currentage and other variables to be used in generating time-to-event data.

IBD

IBD matrix

design

Family-based study design used in the simulations. Possible choices are "pop", "pop+", "cli", "cli+", "twostage", and "noasc".

"pop" is a population-based design with affected probands; "pop+" is similar to "pop" but with mutation-carrying probands; "cli" is a clinic-based design with affected probands and at least one affected parent and one affected sibling; "cli+" is similar to "cli" but with mutation-carrying probands; "twostage" is a two-stage design that oversamples high-risk families (with at least two affected members); and "noasc" corresponds to simple random sampling without ascertainment correction. Default is "pop".

variation

Source of residual familial correlation. Possible choices are "kinship" for correlated frailties based on a kinship matrix, "IBD" for correlated frailties based on an IBD matrix, c("kinship", "IBD") by both kinship and IBD matrices, and "none" for no residual familial correlation. Default is "none".

depend

Inverse of variance for the frailty distribution. A single value should be specified when variation = "IBD" or variation = "kinship" or a vector of two values when variation = c("kinship", "IBD"), where the first element corresponds to kinship matrix and the second element corresponds to IBD matrix. Default is NULL.

base.dist

Choice of baseline hazard distribution. Possible choices are "Weibull", "loglogistic", "Gompertz", "lognormal" "gamma", and "logBurr". Default is "Weibull".

base.parms

Vector of parameter values for the specified baseline hazard function. base.parms = c(lambda, rho) should be specified for base.dist = "Weibull", "loglogistic", "Gompertz", "gamma", and "lognormal". For base.dist = "logBurr", three parameters should be specified base.parms = c(lambda, rho, eta). Default value is base.parms = c(0.016, 3) for base.dist = "Weibull".

var_names

Names of variables to be used in generating time-to-event data. Specified variables should be part of inputdata.

vbeta

Vector of regression coefficients for the variables specified by var_names.

hr

Proportion of high risk families, which include at least two affected members, to be sampled from the two stage sampling. This value should be specified when design="twostage". Default value is 0. Value should lie between 0 and 1.

agemin

Minimum age of disease onset or minimum age. Default is 20 years of age.

Details

The ages at onset are generated from the correlated frailties and covariates using the following model:

The correlated shared frailty model with kinship and/or IBD matrices

h(t|X,Z) = h_0(t - t_0) Z \exp(X\beta),

where h_0(t) is the baseline hazard function, t_0 is a minimum age of disease onset, Z represents a vector of frailties following a multivariate log-normal distribution with mean 0 and variance 2*K*sig1 + D*sig2, where K represents the kinship matrix and D is IBD matrix, sig1 and sig2 are variance components related to each matrix and their values are specified by depend = c(1/sig1, 1/sig2), and X represents a vector of variables whose names are specified by var_names, and \beta is a vector of corresponding coefficients whose values are specified by vbeta.

The variance structure of the frailties shared within families is chosen by either variation = "kinship" or "IBD" matrix or both variation = c("kinship", "IBD").

When variation = "none", the ages at onset are independently generated from the proportional hazard model conditional on the covariates X.

The design argument specifies the type of family based design to be simulated. Two variants of the population-based and clinic-based design can be chosen: "pop", where the proband is affected; "pop+", where the proband is affected and a mutation carrier; "cli", where the proband is affected and at least one parent and one sibling are also affected; "cli+", where the proband is affected, is a mutation carrier, and has at least one affected parent and one affected sibling. The two-stage design, "twostage", is used to oversample high risk families, where the proportion of high risks families included in the sample is specified by hr. High risk families often include multiple (at least two) affected members. Finally, design = "noasc" specifies a design with no ascertainment criteria applied.

Value

Returns an object of class 'simfam', a data frame which contains inputdata and the following:

ageonset

Ages at disease onset in years.

time

Ages at disease onset for the affected or ages of last follow-up for the unaffected.

status

Disease statuses: 1 for affected, 0 for unaffected (censored).

fsize

Family size including parents, siblings and children of the proband and the siblings.

naff

Number of affected members in family.

weight

Sampling weights.

References

Choi, Y.-H., Briollais, L., He, W. and Kopciuk, K. (2021) FamEvent: An R Package for Generating and Modeling Time-to-Event Data in Family Designs, Journal of Statistical Software 97 (7), 1-30. doi:10.18637/jss.v097.i07

Choi, Y.-H., Kopciuk, K. and Briollais, L. (2008) Estimating Disease Risk Associated Mutated Genes in Family-Based Designs, Human Heredity 66, 238-251.

Choi, Y.-H. and Briollais (2011) An EM Composite Likelihood Approach for Multistage Sampling of Family Data with Missing Genetic Covariates, Statistica Sinica 21, 231-253.

See Also

summary.simfam2, plot.simfam, penplot

Examples


## Example: simulate family data from a population-based design using
#  a Weibull distribution for the baseline hazard and inducing 
#  residual familial correlation through kinship and IBD matrices.

# Inputdata and IBD matrix should be provided; 
# simuated inputdata as an example here;

data <- simfam(N.fam = 10, design = "noasc", variation = "none",
         base.dist = "Weibull", base.parms = c(0.016, 3), vbeta = c(1, 1))

IBDmatrix <- diag(1, dim(data)[1])
data <- data[ , c(1:7, 11, 14)]

fam2 <- simfam2(inputdata = data, IBD = IBDmatrix, design = "pop", 
        variation = c("kinship","IBD"), depend = c(1, 1), 
        base.dist = "Weibull", base.parms = c(0.016, 3),
        var_names = c("gender", "mgene"), vbeta = c(1,1),
        agemin=20) 

head(fam2)

summary(fam2)


FamEvent documentation built on May 15, 2026, 1:06 a.m.