covLCA: Latent Class Models with Covariate Effects on Underlying and...

Description Usage Arguments Details Value Note References Examples

Description

Fits latent class models with covariate effects on underlying and measured variables. The measured variables are dichotomous or polytomous, all with the same number of categories.

Usage

1
2
3
covLCA(formula1, formula2, data, nclass = 2, maxiter = 1000, tol = 1e-10, 
beta.start = NULL, alpha.start = NULL, gamma.start = NULL, beta.auto = TRUE, 
alpha.auto = TRUE, gamma.auto = TRUE, nrep = 1, verbose = TRUE, calc.se = TRUE)

Arguments

formula1

The formula where the dependent variables are the manifest variables, grouped by cbind(), and the independent variables are the covariates for the latent class probabilities.

formula2

The formula where the dependent variables are the manifest variables, grouped by cbind(), and the independent variables are the covariates for the conditional probabilities.

data

a dataframe containing all variables appearing in formual1 and formula2. Manifest variables must contain only integer values, and must be coded with consecutive values from 1 to the maximum number of outcomes for each variable. All missing values should be entered as NA and all cases containing missing values (in the manifest variables or in the covariates) are removed before estimating the model.

nclass

the number of latent classes assumed in the model.

maxiter

the maximum number of iterations through which the estimation algorithm will cycle.

tol

A tolerance value for judging when convergence has been reached. When the one-iteration change in the estimated log-likelihood is less than tol, the estimation algorithm stops updating and considers the maximum log-likelihood to have been found.

beta.start

a vector of parameters β_{jp} to be used as the starting values for the estimation algorithm. There is one parameter for each pair latent class-covariate (the index of the covariate moving faster), except the last class, considered as the reference, for which β_{Jp}=0 \forall p. The default is NULL, leading either to an automatic search for “reasonable” initial values (when beta.auto=TRUE, the default) or to the generation of random starting values (when beta.auto=FALSE). Note that if nrep $>1$, then any user-specified beta.start values are only used in the first of the nrep attempts.

alpha.start

an M \times L(K-1) matrix of parameters α_{mlk} to be used as the starting values for the estimation algorithm. Rows correspond to manifest variable m. Within each row, columns correspond to covariates l and categories of manifest variables k (except the last category, for which α_{mlK_m}=0), the index of the latter moving faster. The default is NULL, leading either to an automatic search for “reasonable” initial values (when alpha.auto=TRUE, the default) or to the generation of random starting values (when alpha.auto=FALSE). Note that if nrep >1, then any user-specified alpha.start values are only used in the first of the nrep attempts.

gamma.start

an M \times J(K-1) matrix of parameters γ_{mjk} to be used as the starting values for the estimation algorithm. Rows correspond to manifest variable m. Within each row, columns correspond to latent classes j and categories of manifest variables k (except the last category, for which γ_{mjK_m}=0), the index of the latter moving faster. The default is NULL, leading either to an automatic search for “reasonable” initial values (when gamma.auto=TRUE, the default) or to the generation of random starting values (when gamma.auto=FALSE). Note that if nrep >1, then any user-specified gamma.start values are only used in the first of the nrep attempts.

beta.auto

logical, indicating whether covLCA() should calculate “reasonable” initial values for parameters β. If TRUE, the approach advised by Huang and Bandeen-Roche (2004) is applied: a standard latent class model assuming nclass latent classes is estimated, then each individual is assigned to a class with the posterior probabilities of class membership from this model, and finally a multinomial logistic regression model relating the latent classes to covariates x is fitted, whose coefficient estimates give initial estimates of \boldsymbol{β}. If FALSE, either random initial values are generated (if beta.start=NULL) or values provided by the user are used.

alpha.auto

logical, indicating whether covLCA() should calculate “reasonable” initial values for parameters α. If TRUE, the approach advised by Huang and Bandeen-Roche (2004) is applied: M different multinomial logistic regression models for (Y_{i1}, \mathbf{z}_{i1}), , (Y_{iM},\mathbf{z}_{iM}) are fitted and the corresponding estimated coefficients are initial values for parameters α. If FALSE, either random initial values are generated (if alpha.start=NULL) or values provided by the user are used.

gamma.auto

logical, indicating whether covLCA() should calculate “reasonable” initial values for parameters γ. If TRUE, the approach advised by Huang and Bandeen-Roche (2004) is applied: M different multinomial logistic regression models for (Y_{i1},\mathbf{z}_{i1}), , (Y_{iM}, \mathbf{z}_{iM}) are fitted and the corresponding estimated coefficients are initial values for parameters γ. If FALSE, either random initial values are generated (if gamma.start=NULL) or values provided by the user are used.

nrep

number of times the model is estimated, using different values of beta.start, alpha.start and gamma.start. The default is one. Setting nrep>1 automates the search for the global (rather than just a local) maximum of the log-likelihood function. covLCA() returns the parameter estimates corresponding to the model with the greatest log-likelihood.

verbose

logical, indicating wheter covLCA() should output to the screen the results of the model.

calc.se

logical, indicating whether covLCA() should calculate the standard errors of the estimated parameters β_{jp}, α_{mlk} and γ_{mjk}.

Details

We denote individuals by i (i=1,…,N), manifest variables (items) by Y_m (m=1,…,M), levels of the manifest variables by k (k=1,…,K ), the latent variable by S (S=j and j=1,…,J). There are two sets of covariates: those related with the latent class probabilities, \mathbf{x_i}=(1,x_{i1},…,x_{iP})^T, and those which can have a direct effect on the manifest variables, \mathbf{z_i}=(\mathbf{z_{i1}},…,\mathbf{z_{iM}}) with \mathbf{z_{im}}=(1,z_{im1},…,z_{imL})^T, m=1,…,M. The parameters of the model are the latent class probabilities π_j(\mathbf{x}'_i\boldsymbol{β})=P(S_i=j;\mathbf{x}_i) and the conditional probabilities p_{mkj}(\boldsymbol{γ}_{mj}+\boldsymbol{z}'_{im}\boldsymbol{α}_m)=P(Y_{im}=k|S_i=j;\boldsymbol{z}_{im}).

The model is

P(\mathbf{Y_i}=\mathbf{y}|\mathbf{x}_i, \mathbf{z}_i)=P(Y_{i1}=y_{1},…,Y_{iM}=y_M|\mathbf{x}_i, \mathbf{z}_i)

=∑_{j=1}^{J} ≤ft\{π_j(\mathbf{x}'_i\boldsymbol{β}) ∏_{m=1}^{M} p_{mkj}^{y_{imk}}(\boldsymbol{γ}_{mj}+\boldsymbol{z}'_{im}\boldsymbol{α}_m)\right\}

with

\log≤ft(\frac{π_j(\mathbf{x}'_i\boldsymbol{β})}{π_J(\mathbf{x}'_i\boldsymbol{β})}\right)= \mathbf{x}'_i\boldsymbol{β}_j \qquad i=1,…,N ;\quad j=1,…,(J-1)

and

\log≤ft(\frac{p_{mkj}(\boldsymbol{γ}_{mj}+\mathbf{z}'_{im}\boldsymbol{α}_m)}{p_{mKj}(\boldsymbol{γ}_{mj}+\boldsymbol{z}'_{im}\boldsymbol{α}_m)}\right)=γ_{mkj}+\mathbf{z}'_{im}\boldsymbol{α}_{mk}

Value

The output of function covLCA() is a list containing the following elements:

llik

The log-likelihood value of the estimated model.

attempts

A vector containing the maximum loglikelihood values found in each of the nrep attempts to fit the model.

beta.start

A vector containing the initial values for parameters β when such values were provided by the user (in beta.start) or when they were randomly generated (when beta.start=NULL and beta.auto=FALSE).

alpha.start

A vector containing the initial values for parameters α when such values were provided by the user (in alpha.start) or when they were randomly generated (when alpha.start=NULL and alpha.auto=FALSE).

gamma.start

A vector containing the initial values for parameters γ when such values were provided by the user (in gamma.start) or when they were randomly generated (when gamma.start=NULL and gamma.auto=FALSE).

beta.auto

Logical, indicating whether the user asked for “reasonable” initial estimates of parameters β to be automatically computed (with the argument beta.auto).

alpha.auto

Logical, indicating whether the user asked for “reasonable” initial estimates of parameters α to be automatically computed (with the argument alpha.auto).

gamma.auto

Logical, indicating whether the user asked for “reasonable” initial estimates of parameters γ to be automatically computed (with the argument gamma.auto).

beta.initAuto

A vector containing the initial values for parameters β when “reasonable” values are automatically computed (when beta.auto=TRUE).

alpha.initAuto

A vector containing the initial values for parameters α when “reasonable” values are automatically computed (when alpha.auto=TRUE).

gamma.initAuto

A vector containing the initial values for parameters γ when “reasonable” values are automatically computed (when gamma.auto=TRUE).

probs

An N\times M\times K\times J array containing the estimated conditional probabilities \hat{p}_{imkj}=\hat{p}_{mkj}(\boldsymbol{γ}_{mj}+\mathbf{z}'_{im}\boldsymbol{α}_m), where the first to fourth dimensions correspond to individuals, manifest variables, categories of manifest variables and latent classes, respectively.

prior

An N \times J matrix containing the estimated latent class probabilities \hat{π}_{ij}=\hat{π}_j(\mathbf{x}'_i\boldsymbol{β}), where rows correspond to individuals and columns, to latent classes.

posterior

An N \times J matrix containing the estimated posterior latent class probabilities h_{ij}(\hat{φ}), where rows correspond to individuals and columns to latent classes.

predclass

A vector of length N of predicted class memberships, by modal assignment.

P

The respective size of each latent class, equal to the mean of the priors.

numiter

The number of iterations required by the estimation algorithm to achieve convergence.

coeffBeta

An P\times J matrix of estimated β_{pj}, where rows correspond to covariates and columns, to latent classes.

param.se

A vector containing the standard error of each estimated parameter, in the following order: β_{jp}, γ_{mjk}, α_{mlk} where the last index always moves faster.

param.V

The covariance matrix of the coefficient estimates (in the same order as in param.se).

coeffGamma

An M \times J(K-1) matrix of estimated parameters γ_{mjk}. Each row corresponds to manifest variable m. Within each row, columns correspond to latent classes j and categories of manifest variables k (except the last category, for which γ_{mjK_m}=0), the index of the latter moving faster.

coeffAlpha

An M \times L(K-1) matrix of estimated parameters α_{mlk}. Each row corresponds to manifest variable m. Within each row, columns correspond to covariates l and categories of manifest variables k (except the last category, for which α_{mlK_m}=0), the index of the latter moving faster.

meanProbs

An M \times K \times J array of estimated conditional probabilities evaluated at the sample mean of the covariates. The first to third dimensions correspond to manifest variables, categories of manifest variables and latent classes, respectively.

eflag

Logical, error flag. TRUE if estimation algorithm needed to automatically restart with new initial parameters, otherwise FALSE. A restart is caused in the event of computational/rounding errors that result in nonsensical parameter estimates.

npar

The number of estimated parameters.

aic

Value of the AIC criterion for the estimated model.

bic

Value of the BIC criterion for the estimated model.

Nobs

Number of fully observed cases.

x

A dataframe containing the covariates for the latent class probabilities.

z

A dataframe containing the covariates for the conditional probabilities.

y

A dataframe containing the manifest variables.

identifiability

A list containing the eigenvalues and the inverse condition number of the matrices involved in conditions (iii') and (iv') of Theorem 1 (Local Identifiability) in Huang and Bandeen-Roche (2004).

maxiter

The maximum number of iterations of the estimation algorithm.

resid.df

The number of residual degrees of freedom, equal to the lesser of N and MK, minus npar.

time

Computation time of model estimation.

Note

This function is an extension of the source code of the R package poLCA (Linzer and Lewis, 2011) to the methodology proposed by Huang and Bandeen-Roche (2004).

References

Bertrand, A., Hafner, C.M. (2011) On heterogeneous latent class models with applications to the analysis of rating scores. Louvain-la-Neuve: Universite catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences. Discussion paper 2011/28. Available at: http://uclouvain.be/cps/ucl/doc/stat/documents/ISBADP2011-28_On_heterogeneous_latent_class_models...pdf

Huang, G.-H., Bandeen-Roche K. (2004) Building an identifiable latent class model with covariate effects on underlying and measured variables. Psychometrika, 69(1), 5–32.

Linzer, D.A., Lewis J. (2011) poLCA: Polytomous Variable Latent Class Analysis. R package version 1.3.1.

Linzer, D.A., Lewis J. (2011) poLCA: an R Package for Polytomous Variable Latent Class Analysis. Journal of Statistical Software, 42(10), 1–29. http://www.jstatsoft.org/v42/i10

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
## 2 models for a subset of dataset election in package poLCA
library("poLCA")
data("election",package="poLCA")
election$GENDER <- factor(election$GENDER)
elec <- election[,c(1:3,7:12,16:17)]
elec <- na.omit(elec)
elec <- elec[1:200,]
## Model 1: 3 classes, 1 covariate for modelling latent class membership
fm1 <- cbind(MORALG,CARESG,KNOWG,MORALB,CARESB,
KNOWB)~PARTY
poLCA1 <- poLCA(formula=fm1,data=elec,nclass=3,nrep=10)

## Model 2: 3 classes, 1 covariate in the model for latent class membership,
## 1 covariate in the model for the manifest variables probabilities
fm2 <- cbind(MORALG,CARESG,KNOWG,MORALB,CARESB,
KNOWB)~1+PARTY
fm3 <- cbind(MORALG,CARESG,KNOWG,MORALB,CARESB,
KNOWB)~1+GENDER

covLCA1 <- covLCA(formula1=fm2,formula2=fm3,data=elec,nclass=3,
beta.auto=TRUE,gamma.auto=TRUE,alpha.auto=TRUE,maxit=10000)

## Not run: ## 2 models for dataset election in package poLCA
library("poLCA")
data("election",package="poLCA")
election$GENDER <- factor(election$GENDER)
elec <- election[,c(1:12,16:17)]
elec <- na.omit(elec)

## Model 1: 3 classes, 1 covariate for modelling latent class membership
fm1 <- cbind(MORALG,CARESG,KNOWG,LEADG,DISHONG,INTELG,MORALB,CARESB,
KNOWB,LEADB,DISHONB,INTELB)~PARTY
poLCA1 <- poLCA(formula=fm1,data=elec,nclass=3,nrep=10)

## Model 2: 3 classes, 1 covariate in the model for latent class membership,
## 1 covariate in the model for the manifest variables probabilities
fm2 <- cbind(MORALG,CARESG,KNOWG,LEADG,DISHONG,INTELG,MORALB,CARESB,
KNOWB,LEADB,DISHONB,INTELB)~1+PARTY
fm3 <- cbind(MORALG,CARESG,KNOWG,LEADG,DISHONG,INTELG,MORALB,CARESB,
KNOWB,LEADB,DISHONB,INTELB)~1+GENDER

covLCA1 <- covLCA(formula1=fm2,formula2=fm3,data=elec,nclass=3,
beta.auto=TRUE,gamma.auto=TRUE,alpha.auto=TRUE,maxit=10000)

## End(Not run)

covLCA documentation built on May 2, 2019, 9:35 a.m.

Related to covLCA in covLCA...