penmodelEM: EM algorithm for estimating the penetrance model with missing... In FamEvent: Family Age-at-Onset Data Simulation and Penetrance Estimation

Description

Fits a penetrance model for family data with missing genotypes via the EM algorithm and provides model parameter estimates and corresponding gender- and genotype-specific penetrance estimates.

Usage

 1 2 penmodelEM(parms, vbeta, data, design="pop", base.dist="Weibull", robust=FALSE, method="data", mode="dominant", q=0.02) 

Arguments

 parms Vector of initial values for baseline parameters. parms=c(lambda, rho), where lambda and rho are the initial values for the scale and shape parameters, respectively. For the "lognormal" baseline distribution, rho > 0; for the other baselinse distributions, lambda > 0 and rho > 0. vbeta Vector of initial values for the regression coefficients for gender and majorgene, vbeta=c(beta.s, beta.g). data Data frame generated from simfam or data frame containing specific variables: famID, indID, generation, gender, currentage, mgene, time, status and weight with attr(data,"agemin") specified. design Study design of the family data. Possible choices are: "pop", "pop+", "cli", "cli+" or "twostage", where "pop" is for the population-based design with affected probands whose mutation status can be either carrier or non-carrier, "pop+" is similar to "pop" but with mutation carrier probands, "cli" is for the clinic-based design that includes affected probands with at least one parent and one sib affected, "cli+" is similar to "cli" but with mutation carrier probands, and "twostage" is for the two-stage design with oversampling of high risks families. Default is "pop". base.dist Choice of baseline hazard distribution to fit. Possible choices are: "Weibull", "loglogistic", "Gompertz", "lognormal", or "gamma". Default is "Weibull". robust Logical; if TRUE, use robust 'sandwich' standard errors and variance matrix, otherwise use conventional standard errors and variance matrix. method Choice of methods for calculating the carrier probabilities for individuals with missing mutation status. Possible choices are "data" for empirical calculation of the carrier probabilities based on the observed carriers' statuses in the entire sample, specific to generation and proband's mutation status or "mendelian" for calculating carrier probabilities based on Mendelian transmission probabilies with the given allele frequency and mutation statuses observed in the family. Default is "data". If method="mendelian", specify both mode of the inheritance and the allele frequency q. mode Choice of modes of inheritance for calculating carrier probabilies for individuals with missing mutation status. Possible choices are "dominant" for dominant model or "recessive" for recessive model. Default is "dominant". q Frequency of the disease causing allele used for calculating carrier pobabilities. The value should be between 0 and 1. If NULL, the estimated allele frequency from data will be used. Default value is 0.02.

Details

The expectation and maximization (EM) algorithm is applied for making inference about the missing genotypes. In the expectation step, for individuals with unknown carrier status, we first compute their carrier probabilities given their family's observed phenotype and genotype information based on current estimates of parameters θ

w_{fi} = P(X_{fi}=1|Y_{fi}, X^o_f) ,

where X_{fi} represents the mutation carrier status and Y_{fi} represents the phenotype (t_{fi}, δ_{fi}) in terms of age at onset t_{fi} and disease status δ_{fi} for individual i in family f and X^o_f represents the observed genotypes in family f.

Then, we obtain the conditional expectation of the log-likelihood function of the complete data given the observed data as a weighted log-likelihood, which has the form

E_{θ} [\ell (θ) | Y, X^o)] = ∑_f^n ∑_i^{n_f} \ell_{fi}(θ | X_{fi}=1) w_{fi} + \ell_{fi}(θ | X_{fi}=0) (1-w_{fi}),

In the maximization step, the updated parameter estimates are obtained by maximizing the weighted log likelihood computed in the E-step.

These expectation and maximization steps iterate until convergence to obtain the maximum likelihood estimates.

See more details in Choi and Briollais (2011) or Choi et al. (2014).

Transformed baseline parameters (λ, ρ) were used for estimation; see penmodel for details.

Value

Returns an object of class 'penmodel', including the following elements:

 coefficients Parameter estimates of transformed baseline parameters (λ, ρ) and regression coefficients for gender and mutation status (β_s, β_g) including their standard errors and also their robust standard errors. varcov Variance covariance matrix of parameter estimates. If robust=TRUE, robust ‘sandwich’ variance covariance matrix is returned. se Standard errors of parameter estimates. If robust=TRUE, robust ‘sandwich’ standard errors are returned. pen70.est Penetrance estimates by age 70 specific to gender and mutation-status subgroups. pen70.se Standard errors of penetrance estimates by age 70 specific to gender and mutation-status subgroups. pen70.ci 95% confidence interval estimates of penetrance by age 70 specific to gender and mutation-status subgroups. ageonset Vector of ages of onset ranging from agemin to 90 years. pen.maleCarr Vector of penetrance estimates for male carriers from agemin to 90 years. pen.femaleCarr Vector of penetrance estimates for female carriers from agemin to 90 years. pen.maleNoncarr Vector of penetrance estimates for male non-carriers from agemin to 90 years. pen.femaleNoncarr Vector of penetrance estimates for female non-carriers from agemin to 90 years. logLik Loglikelihood value for the fitted penetrance model.

Yun-Hee Choi

References

Choi, Y.-H. and Briollais, L. (2011) An EM composite likelihood approach for multistage sampling of family data with missing genetic covariates, Statistica Sinica 21, 231-253.

Choi, Y.-H., Briollais, L., Green, J., Parfrey, P., and Kopciuk, K. (2014) Estimating successive cancer risks in Lynch Syndrome families using a progressive three-state model, Statistics in Medicine 33, 618-638.

simfam, penmodel, print.penmodel, summary.penmodel, print.summary.penmodel,
plot.penmodel,carrierprob
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 # Family data simulated with 30% of members missing their genetic information. fam <- simfam(N.fam=100, design="pop+", base.dist="Weibull", base.parms=c(0.01,3), vbeta=c(-1.13, 2.35), agemin=20, allelefreq=0.02, mrate=0.3) # EM algorithm for fitting family data with missing genotypes fit <- penmodelEM(parms=c(0.01, 3), vbeta=c(-1.13, 2.35), data=fam, design="pop+", base.dist="Weibull", method="mendelian", mode="dominant", q=NULL) # Summary of the model parameter and penetrance estimates from model fit # by penmodelEM summary(fit) # Generate the lifetime penetrance curves from model fit for gender and # mutation status groups along with their non-parametric penetrance curves # based on observed data plot(fit)