poLCA.entropy: Entropy of a fitted latent class model

Description Usage Arguments Details Value See Also Examples

View source: R/poLCA.entropy.R

Description

Calculates the entropy of a cross-classification table produced as a density estimate using a latent class model.

Usage

1

Arguments

lc

A model object estimated using the poLCA function.

Details

Entropy is a measure of dispersion (or concentration) in a probability mass function. For multivariate categorical data it is calculated

H = -∑_c p_c log(p_c)

where p_c is the share of the probability in the cth cell of the cross-classification table. A fitted latent class model produces a smoothed density estimate of the underlying distribution of cell percentages in the multi-way table of the manifest variables. This function calculates the entropy of that estimated probability mass function.

Value

A number taking a minumum value of 0 (representing complete concentration of probability on one cell) and a maximum value equal to the logarithm of the total number of cells in the fitted cross-classfication table (representing complete dispersion, or equal probability for outcomes across every cell).

See Also

poLCA

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
data(carcinoma)
f <- cbind(A,B,C,D,E,F,G)~1
lca2 <- poLCA(f,carcinoma,nclass=2) # log-likelihood: -317.2568
lca3 <- poLCA(f,carcinoma,nclass=3) # log-likelihood: -293.705
lca4 <- poLCA(f,carcinoma,nclass=4,nrep=10,maxiter=5000) # log-likelihood: -289.2858 

# Maximum entropy (if all cases equally dispersed)
log(prod(sapply(lca2$probs,ncol)))

# Sample entropy ("plug-in" estimator, or MLE)
p.hat <- lca2$predcell$observed/lca2$N
H.hat <- -sum(p.hat * log(p.hat))
H.hat   # 2.42

# Entropy of fitted latent class models
poLCA.entropy(lca2)
poLCA.entropy(lca3)
poLCA.entropy(lca4)

Example output

Loading required package: scatterplot3d
Loading required package: MASS
Conditional item response (column) probabilities,
 by outcome variable, for each class (row) 
 
$A
           Pr(1)  Pr(2)
class 1:  0.0000 1.0000
class 2:  0.8835 0.1165

$B
           Pr(1)  Pr(2)
class 1:  0.0169 0.9831
class 2:  0.6456 0.3544

$C
           Pr(1)  Pr(2)
class 1:  0.2391 0.7609
class 2:  1.0000 0.0000

$D
           Pr(1)  Pr(2)
class 1:  0.4589 0.5411
class 2:  1.0000 0.0000

$E
           Pr(1)  Pr(2)
class 1:  0.0214 0.9786
class 2:  0.7771 0.2229

$F
           Pr(1)  Pr(2)
class 1:  0.5773 0.4227
class 2:  1.0000 0.0000

$G
           Pr(1)  Pr(2)
class 1:  0.0000 1.0000
class 2:  0.8835 0.1165

Estimated class population shares 
 0.5012 0.4988 
 
Predicted class memberships (by modal posterior prob.) 
 0.5 0.5 
 
========================================================= 
Fit for 2 latent classes: 
========================================================= 
number of observations: 118 
number of estimated parameters: 15 
residual degrees of freedom: 103 
maximum log-likelihood: -317.2568 
 
AIC(2): 664.5137
BIC(2): 706.0739
G^2(2): 62.36543 (Likelihood ratio/deviance statistic) 
X^2(2): 92.64814 (Chi-square goodness of fit) 
 
Conditional item response (column) probabilities,
 by outcome variable, for each class (row) 
 
$A
           Pr(1)  Pr(2)
class 1:  0.9427 0.0573
class 2:  0.0000 1.0000
class 3:  0.4872 0.5128

$B
           Pr(1)  Pr(2)
class 1:  0.8621 0.1379
class 2:  0.0191 0.9809
class 3:  0.0000 1.0000

$C
           Pr(1)  Pr(2)
class 1:  1.0000 0.0000
class 2:  0.1425 0.8575
class 3:  1.0000 0.0000

$D
           Pr(1)  Pr(2)
class 1:  1.0000 0.0000
class 2:  0.4138 0.5862
class 3:  0.9424 0.0576

$E
           Pr(1)  Pr(2)
class 1:  0.9449 0.0551
class 2:  0.0000 1.0000
class 3:  0.2494 0.7506

$F
           Pr(1)  Pr(2)
class 1:  1.0000 0.0000
class 2:  0.5236 0.4764
class 3:  1.0000 0.0000

$G
           Pr(1)  Pr(2)
class 1:  1.0000 0.0000
class 2:  0.0000 1.0000
class 3:  0.3693 0.6307

Estimated class population shares 
 0.3736 0.4447 0.1817 
 
Predicted class memberships (by modal posterior prob.) 
 0.3729 0.4322 0.1949 
 
========================================================= 
Fit for 3 latent classes: 
========================================================= 
number of observations: 118 
number of estimated parameters: 23 
residual degrees of freedom: 95 
maximum log-likelihood: -293.705 
 
AIC(3): 633.41
BIC(3): 697.1357
G^2(3): 15.26171 (Likelihood ratio/deviance statistic) 
X^2(3): 20.50336 (Chi-square goodness of fit) 
 
Model 1: llik = -292.493 ... best llik = -292.493
Model 2: llik = -291.9084 ... best llik = -291.9084
Model 3: llik = -289.7889 ... best llik = -289.7889
Model 4: llik = -289.7889 ... best llik = -289.7889
Model 5: llik = -289.2858 ... best llik = -289.2858
Model 6: llik = -289.7889 ... best llik = -289.2858
Model 7: llik = -289.7889 ... best llik = -289.2858
Model 8: llik = -289.2858 ... best llik = -289.2858
Model 9: llik = -289.7889 ... best llik = -289.2858
Model 10: llik = -289.2858 ... best llik = -289.2858
Conditional item response (column) probabilities,
 by outcome variable, for each class (row) 
 
$A
           Pr(1)  Pr(2)
class 1:  0.0000 1.0000
class 2:  0.0000 1.0000
class 3:  0.9422 0.0578
class 4:  0.4634 0.5366

$B
           Pr(1)  Pr(2)
class 1:  0.0905 0.9095
class 2:  0.0000 1.0000
class 3:  0.8584 0.1416
class 4:  0.0000 1.0000

$C
           Pr(1)  Pr(2)
class 1:  0.0186 0.9814
class 2:  0.1561 0.8439
class 3:  1.0000 0.0000
class 4:  1.0000 0.0000

$D
           Pr(1)  Pr(2)
class 1:  1.0000 0.0000
class 2:  0.2421 0.7579
class 3:  1.0000 0.0000
class 4:  0.9404 0.0596

$E
           Pr(1)  Pr(2)
class 1:  0.0000 1.0000
class 2:  0.0000 1.0000
class 3:  0.9443 0.0557
class 4:  0.2341 0.7659

$F
           Pr(1)  Pr(2)
class 1:  1.0000 0.0000
class 2:  0.3823 0.6177
class 3:  1.0000 0.0000
class 4:  1.0000 0.0000

$G
           Pr(1)  Pr(2)
class 1:  0.0000 1.0000
class 2:  0.0000 1.0000
class 3:  1.0000 0.0000
class 4:  0.3482 0.6518

Estimated class population shares 
 0.0936 0.343 0.3751 0.1882 
 
Predicted class memberships (by modal posterior prob.) 
 0.1186 0.3136 0.3729 0.1949 
 
========================================================= 
Fit for 4 latent classes: 
========================================================= 
number of observations: 118 
number of estimated parameters: 31 
residual degrees of freedom: 87 
maximum log-likelihood: -289.2858 
 
AIC(4): 640.5717
BIC(4): 726.4629
G^2(4): 6.423452 (Likelihood ratio/deviance statistic) 
X^2(4): 10.08438 (Chi-square goodness of fit) 
 
[1] 4.85203
[1] 2.424357
[1] 2.693452
[1] 2.494442
[1] 2.458558

poLCA documentation built on May 29, 2017, 5:59 p.m.