Performs selection of a latent class model for phenotypic measurements
in pedigrees based on one of
two possible methods: likelihoodbased crossvalidation or Bayesian
Information Criterion (BIC) selection. This is the toplevel
function to perform a Latent Class Analysis (LCA), which calls the
model fitting function
lca.model
. Model selection is performed among models within one of two
types: with and without familial dependence. Two families of
distributions are currently implemented: product multinomial for discrete (or
ordinal) data and mutivariate
normal for continuous data.
1 2 3 4 
ped 
a matrix containing variables coding the pedigree
structure and the phenotype measurements: 
distribution 
a character variable taking the value 
trans.const 
a logical variable indicating if the parental constraint is used. Parental constraint means that the class of a subject must be one
of his parents classes. Default is 
optim.param 
a variable indicating how the measurement distribution parameter optimization is performed (see below for more details), 
optim.probs.indic 
a vector of logical values indicating which probability parameters to estimate (see below for more details), 
famdep 
a logical variable indicating if the familial dependence model is used or not. Default is 
selec 
a character variables taking the value 
H 
an integer giving the number of equal parts into which data will be splitted for the likelihoodbased crossvalidation model selection (see below for more details), 
K.vec 
a vector of integers, the number of latent classes of
candidate models, if 
tol 
a small number governing the stopping rule of the EM algorithm. Default is 0.001, 
x 
a matrix of covariates (optional), default is 
var.list 
a list of integers indicating the columns of

In the case of crossvalidation basedlikelihood method, data is
splitted into H
parts: H1
parts as a training set and one part as a
test set. For each model, a validation loglikelihood is obtained by
evaluating the loglikelihood of the test set data using the parameter
values estimated in the training set. This is repeated H
times
using a different part as training set each time, and a total
validation loglikelihood is obtained by summation over the H
test sets. The best model is the one having the largest
validation loglikelihood. In the case of BIC selection method, the
BIC is computed for each candidate model. The model with the smallest
BIC is selected.
The symptom status vector (column 6 of ped
) takes value 1 for
subjects that have been
examined and show no symptoms (i.e. completely unaffected
subjects). When applying the LCA to
measurements available on all subjects, the status vector must take the
value of 2 for every individual with measurements. If covariates are used, covariate values must be provided for subjects with symptom status 0 (missing) but not for subjects with symptom status 1 (if covariate values are provided, they will be ignored).
optim.param
is a variable indicating how the measurement
distribution parameter optimization of the M step is performed. Two
possibilities,
optim.noconst.ordi
and optim.const.ordi
, are now available in the case of discrete or ordinal measurements, and four possibilities,
optim.indep.norm
(measurements are independent, diagonal variancecovariance matrix),
optim.diff.norm
(general variancecovariance matrix but equal for all classes),
optim.equal.norm
(variancecovariance matrices are different for each class but equal variance and equal covariance for a class) and
optim.gene.norm
(general variancecovariance matrices for all classes), in the case of continuous measurements.
One of the allowed values of optim.param
must be entered without quotes.
optim.probs.indic
is a vector of logical values of length 4 for
models with familial dependence and 2 for models without familial
dependence indicating which probability parameters to estimate. See the
help page for lca.model
for a definition of the parameters.
For models with familial dependence:
optim.probs.indic[1]
indicates whether p0
will be estimated or not,
optim.probs.indic[2]
indicates whether p0connect
will be estimated or not,
optim.probs.indic[3]
indicates whether p.found
will be estimated or not,
optim.probs.indic[4]
indicates whether p.connect
will
be estimated or not.
For models without familial dependence:
optim.probs.indic[1]
indicates whether p0
will be estimated or not,
optim.probs.indic[2]
indicates whether p.aff
will be
estimated or not.
All defaults are TRUE
.
The function returns a list of 5 elements, the first 3 elements are common for BIC and crossvalidation model selection methods and are:
param 
the Maximum Likelihood Estimator (MLE) of the measurement distribution parameters of the selected model, 
probs 
the Maximum Likelihood Estimator (MLE) of the probability parameters of the selected model, 
weight 
an array of dimension 
If the crossvalidation selection method is used, the function returns also
ll 
the value of the maximum loglikelihood (logML) of the selected model, 
ll.valid 
the total crossvalidation loglikelihood of all candidate models, 
and if the Bayesian Information Criterion selection method is used, the function returns also
ll 
the value of maximum loglikelihood (logML) of all candidate models, 
bic 
the Bayesian Information Criterion

TAYEB, A. LABBE, A., BUREAU, A. and MERETTE, C. (2011) Solving Genetic Heterogeneity in Extended
Families by Identifying Subtypes of Complex Diseases. Computational Statistics, 26(3): 539560. DOI: 10.1007/s0018001002242,
LABBE, A., BUREAU, A. et MERETTE, C. (2009) Integration of Genetic Familial Dependence Structure in Latent Class Models. The International Journal of Biostatistics, 5(1): Article 6.
See also lca.model
.
1 2 3 4 5 6 7  #data
data(ped.cont)
fam < ped.cont[,1]
#the function applied for the two first families of ped.cont
model.select(ped.cont[fam%in%1:2,],distribution="normal",trans.const=TRUE,
optim.indep.norm,optim.probs.indic=c(TRUE,TRUE,TRUE,TRUE),
famdep=TRUE,selec="bic",K.vec=1:3,tol=0.001,x=NULL,var.list=NULL)

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.
All documentation is copyright its authors; we didn't write any of that.