Latent class analysis of polytomous outcome variables

Share:

Description

Estimates latent class and latent class regression models for polytomous outcome variables.

Usage

1
2
3
poLCA(formula, data, nclass = 2, maxiter = 1000, graphs = FALSE, 
      tol = 1e-10, na.rm = TRUE, probs.start = NULL, nrep = 1, 
      verbose = TRUE, calc.se = TRUE)

Arguments

formula

A formula expression of the form response ~ predictors. The details of model specification are given below.

data

A data frame containing variables in formula. Manifest variables must contain only integer values, and must be coded with consecutive values from 1 to the maximum number of outcomes for each variable. All missing values should be entered as NA.

nclass

The number of latent classes to assume in the model. Setting nclass=1 results in poLCA estimating the loglinear independence model. The default is two.

maxiter

The maximum number of iterations through which the estimation algorithm will cycle.

graphs

Logical, for whether poLCA should graphically display the parameter estimates at the completion of the estimation algorithm. The default is FALSE.

tol

A tolerance value for judging when convergence has been reached. When the one-iteration change in the estimated log-likelihood is less than tol, the estimation algorithm stops updating and considers the maximum log-likelihood to have been found.

na.rm

Logical, for how poLCA handles cases with missing values on the manifest variables. If TRUE, those cases are removed (listwise deleted) before estimating the model. If FALSE, cases with missing values are retained. Cases with missing covariates are always removed. The default is TRUE.

probs.start

A list of matrices of class-conditional response probabilities to be used as the starting values for the estimation algorithm. Each matrix in the list corresponds to one manifest variable, with one row for each latent class, and one column for each outcome. The default is NULL, producing random starting values. Note that if nrep>1, then any user-specified probs.start values are only used in the first of the nrep attempts.

nrep

Number of times to estimate the model, using different values of probs.start. The default is one. Setting nrep>1 automates the search for the global—rather than just a local—maximum of the log-likelihood function. poLCA returns the parameter estimates corresponding to the model with the greatest log-likelihood.

verbose

Logical, indicating whether poLCA should output to the screen the results of the model. If FALSE, no output is produced. The default is TRUE.

calc.se

Logical, indicating whether poLCA should calculate the standard errors of the estimated class-conditional response probabilities and mixing proportions. The default is TRUE; can only be set to FALSE if estimating a basic model with no concomitant variables specified in formula.

Details

Latent class analysis, also known as latent structure analysis, is a technique for the analysis of clustering among observations in multi-way tables of qualitative/categorical variables. The central idea is to fit a model in which any confounding between the manifest variables can be explained by a single unobserved "latent" categorical variable. poLCA uses the assumption of local independence to estimate a mixture model of latent multi-way tables, the number of which (nclass) is specified by the user. Estimated parameters include the class-conditional response probabilities for each manifest variable, the "mixing" proportions denoting population share of observations corresponding to each latent multi-way table, and coefficients on any class-predictor covariates, if specified in the model.

Model specification: Latent class models have more than one manifest variable, so the response variables are cbind(dv1,dv2,dv3...) where dv# refer to variable names in the data frame. For models with no covariates, the formula is cbind(dv1,dv2,dv3)~1. For models with covariates, replace the ~1 with the desired function of predictors iv1,iv2,iv3... as, for example, cbind(dv1,dv2,dv3)~iv1+iv2*iv3.

poLCA treats all manifest variables as qualitative/categorical/nominal – NOT as ordinal.

Value

poLCA returns an object of class poLCA; a list containing the following elements:

y

data frame of manifest variables.

x

data frame of covariates, if specified.

N

number of cases used in model.

Nobs

number of fully observed cases (less than or equal to N).

probs

estimated class-conditional response probabilities.

probs.se

standard errors of estimated class-conditional response probabilities, in the same format as probs.

P

sizes of each latent class; equal to the mixing proportions in the basic latent class model, or the mean of the priors in the latent class regression model.

P.se

the standard errors of the estimated P.

posterior

matrix of posterior class membership probabilities; also see function link{poLCA.posterior}.

predclass

vector of predicted class memberships, by modal assignment.

predcell

table of observed versus predicted cell counts for cases with no missing values; also see functions poLCA.table and poLCA.predcell.

llik

maximum value of the log-likelihood.

numiter

number of iterations until reaching convergence.

maxiter

maximum number of iterations through which the estimation algorithm was set to run.

coeff

multinomial logit coefficient estimates on covariates (when estimated). coeff is a matrix with nclass-1 columns, and one row for each covariate. All logit coefficients are calculated for classes with respect to class 1.

coeff.se

standard errors of coefficient estimates on covariates (when estimated), in the same format as coeff.

coeff.V

covariance matrix of coefficient estimates on covariates (when estimated).

aic

Akaike Information Criterion.

bic

Bayesian Information Criterion.

Gsq

Likelihood ratio/deviance statistic.

Chisq

Pearson Chi-square goodness of fit statistic for fitted vs. observed multiway tables.

time

length of time it took to run the model.

npar

number of degrees of freedom used by the model (estimated parameters).

resid.df

number of residual degrees of freedom.

attempts

a vector containing the maximum log-likelihood values found in each of the nrep attempts to fit the model.

eflag

Logical, error flag. TRUE if estimation algorithm needed to automatically restart with new initial parameters. A restart is caused in the event of computational/rounding errors that result in nonsensical parameter estimates.

probs.start

A list of matrices containing the class-conditional response probabilities used as starting values in the estimation algorithm. If the algorithm needed to restart (see eflag), then this contains the starting values used for the final, successful, run.

probs.start.ok

Logical. FALSE if probs.start was incorrectly specified by the user, otherwise TRUE.

call

function call to poLCA.

Note

poLCA uses EM and Newton-Raphson algorithms to maximize the latent class model log-likelihood function. Depending on the starting parameters, this algorithm may only locate a local, rather than global, maximum. This becomes more and more of a problem as nclass increases. It is therefore highly advisable to run poLCA multiple times until you are relatively certain that you have located the global maximum log-likelihood. As long as probs.start=NULL, each function call will use different (random) initial starting parameters. Alternatively, setting nrep to a value greater than one enables the user to estimate the latent class model multiple times with a single call to poLCA, thus conducting the search for the global maximizer automatically.

The term "Latent class regression" (LCR) can have two meanings. In this package, LCR models refer to latent class models in which the probability of class membership is predicted by one or more covariates. However, in other contexts, LCR is also used to refer to regression models in which the manifest variable is partitioned into some specified number of latent classes as part of estimating the regression model. It is a way to simultaneously fit more than one regression to the data when the latent data partition is unknown. The flexmix function in package flexmix will estimate this other type of LCR model. Because of these terminology issues, the LCR models this package estimates are sometimes termed "latent class models with covariates" or "concomitant-variable latent class analysis," both of which are accurate descriptions of this model.

A more detailed user's manual is available online at http://userwww.service.emory.edu/~dlinzer/poLCA.

References

Agresti, Alan. 2002. Categorical Data Analysis, second edition. Hoboken: John Wiley \& Sons.

Bandeen-Roche, Karen, Diana L. Miglioretti, Scott L. Zeger, and Paul J. Rathouz. 1997. "Latent Variable Regression for Multiple Discrete Outcomes." Journal of the American Statistical Association. 92(440): 1375-1386.

Hagenaars, Jacques A. and Allan L. McCutcheon, eds. 2002. Applied Latent Class Analysis. Cambridge: Cambridge University Press.

McLachlan, Geoffrey J. and Thriyambakam Krishnan. 1997. The EM Algorithm and Extensions. New York: John Wiley \& Sons.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
##
## Three models without covariates:
## M0: Loglinear independence model.
## M1: Two-class latent class model.
## M2: Three-class latent class model.
##
data(values)
f <- cbind(A,B,C,D)~1
M0 <- poLCA(f,values,nclass=1) # log-likelihood: -543.6498
M1 <- poLCA(f,values,nclass=2) # log-likelihood: -504.4677
M2 <- poLCA(f,values,nclass=3,maxiter=8000) # log-likelihood: -503.3011

##
## Three-class model with a single covariate.
##
data(election)
f2a <- cbind(MORALG,CARESG,KNOWG,LEADG,DISHONG,INTELG,
             MORALB,CARESB,KNOWB,LEADB,DISHONB,INTELB)~PARTY
nes2a <- poLCA(f2a,election,nclass=3,nrep=5)    # log-likelihood: -16222.32 
pidmat <- cbind(1,c(1:7))
exb <- exp(pidmat %*% nes2a$coeff)
matplot(c(1:7),(cbind(1,exb)/(1+rowSums(exb))),ylim=c(0,1),type="l",
    main="Party ID as a predictor of candidate affinity class",
    xlab="Party ID: strong Democratic (1) to strong Republican (7)",
    ylab="Probability of latent class membership",lwd=2,col=1)
text(5.9,0.35,"Other")
text(5.4,0.7,"Bush affinity")
text(1.8,0.6,"Gore affinity")