EM algorithm for estimating the penetrance model with missing genotypes
Description
This function fits family data with missing genotypes via the EM algorithm and provides model parameter estimates and corresponding gender and genotypespecific penetrance estimates.
Usage
1 2  penmodelEM(parms, vbeta, data, design="pop", base.dist="Weibull",
method="data", mode="dominant", q=0.02)

Arguments
parms 
Vector of initial values for baseline parameters.

vbeta 
Vector of initial values for the regression coefficients for gender and majorgene,

data 
Family data structure should follow the format of the data generated from 
design 
The study design of the family data. Possible choices are: 
base.dist 
Choice of baseline hazard distribution to fit. Possible choices are: 
method 
Choice of methods for calculating the carrier probabilities for individuals with missing mutation status. Possible choices are If 
mode 
Choice of modes of inheritance for calculating carrier probabilies for individuals with missing mutation status. Possible choices are 
q 
Frequency of the disease causing allele used for calculating carrier pobabilities. The value should be between 0 and 1. If 
Details
The expectation and maximization (EM) algorithm is applied for making inference about the missing genotypes. In the expectation step, for individuals with unknown carrier status, we first compute their carrier probabilities given their family's observed phenotype and genotype information based on current estimates of parameters θ
w_{fi} = P(X_{fi}=1Y_{fi}, X^o_f) ,
where X_{fi} represents the mutation carrier status and Y_{fi} represents the phenotype (t_{fi}, δ_{fi}) in terms of age at onset t_{fi} and disease status δ_{fi} for individual i in family f and X^o_f represents the observed genotypes in family f.
Then, we obtain the conditional expectation of the loglikelihood function of the complete data given the observed data as a weighted loglikelihood, which has the form
E_{θ} [\ell (θ)  Y, X^o)] = ∑_f^n ∑_i^{n_f} \ell_{fi}(θ  X_{fi}=1) w_{fi} + \ell_{fi}(θ  X_{fi}=0) (1w_{fi}),
In the maximization step, the updated parameter estimates are obtained by maximizing the weighted log likelihood computed in the Estep.
These expectation and maximization steps iterate until convergence to obtain the maximum likelihood estimates.
See more details in Choi and Briollais (2011) or Choi et al. (2014).
Value
An object of class penmodel
, a list including elements
parms.est 
Parameter estimates of baseline parameters (λ, ρ) and regression coefficients for gender and mutation status (β_s, β_g) including their standard errors and also their robust standard errors. 
parms.cov 
Covariance matrix of parameter estimates. 
parms.se 
Standard errors of parameter estimates. 
parms.rcov 
Robust (sandwich) covariance matrix of parameter estimates. 
parms.rse 
Robust standard errors of parameter estimates. 
pen70.est 
Penetrance estimates by age 70 specific to gender and mutationstatus subgroups. 
pen70.se 
Standard errors of penetrance estimates by age 70 specific to gender and mutationstatus subgroups. 
pen70.ci 
95% confidence interval estimates of penetrance by age 70 specific to gender and mutationstatus subgroups. 
ageonset 
Vector of ages of onset ranging from 
pen.maleCarr 
Vector of penetrance estimates for male carriers from 
pen.femaleCarr 
Vector of penetrance estimates for female carriers from 
pen.maleNoncarr 
Vector of penetrance estimates for male noncarriers from 
pen.femaleNoncarr 
Vector of penetrance estimates for female noncarriers from 
Author(s)
YunHee Choi
References
Choi, Y.H. and Briollais, L. (2011) An EM composite likelihood approach for multistage sampling of family data with missing genetic covariates, Statistica Sinica 21, 231253.
Choi, Y.H., Briollais, L., Green, J., Parfrey, P., and Kopciuk, K. (2014) Estimating successive cancer risks in Lynch Syndrome families using a progressive threestate model, Statistics in Medicine 33, 618638.
See Also
simfam
, penmodel
, link{summary.penmodel}
,
plot.penmodel
, carrierprob
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20  # Family data simulated with 30% of members missing their genetic information.
fam < simfam(N.fam=100, design="pop+", base.dist="Weibull", base.parms=c(0.01,3),
vbeta=c(1.13, 2.35, 0.5), agemin=20, allelefreq=0.02, mrate=0.3)
# EM algorithm for fitting family data with missing genotypes
fit < penmodelEM(parms=c(0.01, 3), vbeta=c(1.13, 2.35), data=fam, design="pop+",
base.dist="Weibull", method="mendelian", mode="dominant", q=NULL)
# Summary of the model parameter and penetrance estimates from model fit
# by penmodelEM
summary(fit)
# Generate the lifetime penetrance curves from model fit for gender and
# mutation status groups along with their nonparametric penetrance curves
# based on observed data
plot(fit)
