Description Details Author(s) References See Also Examples
This package provides additional data sets, documentation, and
a few functions designed to extend the vcd
package for Visualizing Categorical Data
and the gnm
package for Generalized Nonlinear Models.
In particular, vcdExtra extends mosaic, assoc and sieve plots from vcd to handle glm() and gnm() models and
adds a 3D version in mosaic3d
.
This package is now a support package for the book, Discrete Data Analysis with R by Michael Friendly and David Meyer, Chapman & Hall/CRC, 2016, https://www.crcpress.com/Discrete-Data-Analysis-with-R-Visualization-and-Modeling-Techniques-for/Friendly-Meyer/9781498725835 with a number of additional data sets, and functions. The web site for the book is http://ddar.datavis.ca.
Package: | vcdExtra |
Type: | Package |
Version: | 0.7-1 |
Date: | 2016-10-12 |
License: | GPL version 2 or newer |
LazyLoad: | yes |
The main purpose of this package is to serve as a sandbox for
introducing extensions of
mosaic plots and related graphical methods
that apply to loglinear models fitted using
glm()
and related, generalized nonlinear models fitted
with gnm()
in the gnm-package
package.
A related purpose is to fill in some holes in the analysis of
categorical data in R, not provided in base R, the vcd,
or other commonly used packages.
The method mosaic.glm
extends the mosaic.loglm
method in the vcd
package to this wider class of models. This method also works for
the generalized nonlinear models fit with the gnm-package
package,
including models for square tables and models with multiplicative associations.
mosaic3d
introduces a 3D generalization of mosaic displays using the
rgl package.
In addition, there are several new data sets, a tutorial vignette,
Working with categorical data with R and the vcd package, vignette("vcd-tutorial", package = "vcdExtra")
and a few functions for manipulating categorical data sets and working with models for categorical data.
A new class, glmlist
, is introduced for working with
collections of glm
objects, e.g., Kway
for fitting
all K-way models from a basic marginal model, and LRstats
for brief statistical summaries of goodnes-of-fit for a collection of
models.
For square tables with ordered factors, Crossings
supplements the
specification of terms in model formulas using
Symm
,
Diag
,
Topo
, etc. in the gnm-package
.
Some of these extensions may be migrated into vcd or gnm.
A collection of demos is included to illustrate fitting and visualizing a wide variety of models:
Mental health data: mosaics for glm() and gnm() models
Occupational status data: Compare mosaic using expected= to mosaic.glm
UCBAdmissions data: Conditional independence via loglm() and glm()
VisualAcuity data: Quasi- and Symmetry models
Yaish data: Unidiff model for 3-way table
Political views and support for women to work (U, R, C, R+C and RC(1) models)
Political views, support for women to work and national welfare spending (3-way, marginal, and conditional independence models)
Visualize glm(), multinom() and polr() models from example(housing, package="MASS")
Use demo(package="vcdExtra")
for a complete current list.
The vcdExtra package now contains a large number of data sets illustrating various forms of categorical data analysis
and related visualizations, from simple to advanced. Use data(package="vcdExtra")
for a
complete list, or datasets(package="vcdExtra")
for an annotated one showing the class
and
dim
for each data set.
Michael Friendly
Maintainer: Michael Friendly <friendly AT yorku.ca>
Friendly, M. Visualizing Categorical Data, Cary NC: SAS Insitute, 2000. Web materials: http://www.datavis.ca/books/vcd/.
Friendly, M. and Meyer, D. (2016). Discrete Data Analysis with R: Visualization and Modeling Techniques for Categorical and Count Data. Boca Raton, FL: Chapman & Hall/CRC. http://ddar.datavis.ca.
Meyer, D.; Zeileis, A. & Hornik, K.
The Strucplot Framework: Visualizing Multi-way Contingency Tables with vcd
Journal of Statistical Software, 2006, 17, 1-48.
Available in R via vignette("strucplot", package = "vcd")
Turner, H. and Firth, D. Generalized nonlinear models in R: An overview of the gnm package,
2007,
http://eprints.ncrm.ac.uk/472/. Available in R via vignette("gnmOverview", package = "gnm")
.
gnm-package
, for an extended range of models for contingency tables
mosaic
for details on mosaic displays within the strucplot framework.
1 2 3 | example(mosaic.glm)
demo("mental-glm")
|
Loading required package: vcd
Loading required package: grid
Loading required package: gnm
msc.gl> GSStab <- xtabs(count ~ sex + party, data=GSS)
msc.gl> # using the data in table form
msc.gl> mod.glm1 <- glm(Freq ~ sex + party, family = poisson, data = GSStab)
msc.gl> res <- residuals(mod.glm1)
msc.gl> std <- rstandard(mod.glm1)
msc.gl> # For mosaic.default(), need to re-shape residuals to conform to data
msc.gl> stdtab <- array(std, dim=dim(GSStab), dimnames=dimnames(GSStab))
msc.gl> mosaic(GSStab, gp=shading_Friendly, residuals=stdtab, residuals_type="Std\nresiduals",
msc.gl+ labeling = labeling_residuals)
msc.gl> # Using externally calculated residuals with the glm() object
msc.gl> mosaic.glm(mod.glm1, residuals=std, labeling = labeling_residuals, shade=TRUE)
msc.gl> # Using residuals_type
msc.gl> mosaic.glm(mod.glm1, residuals_type="rstandard", labeling = labeling_residuals, shade=TRUE)
msc.gl> ## Ordinal factors and structured associations
msc.gl> data(Mental)
msc.gl> xtabs(Freq ~ mental+ses, data=Mental)
ses
mental 1 2 3 4 5 6
Well 64 57 57 72 36 21
Mild 94 94 105 141 97 71
Moderate 58 54 65 77 54 54
Impaired 46 40 60 94 78 71
msc.gl> long.labels <- list(set_varnames = c(mental="Mental Health Status", ses="Parent SES"))
msc.gl> # fit independence model
msc.gl> # Residual deviance: 47.418 on 15 degrees of freedom
msc.gl> indep <- glm(Freq ~ mental+ses,
msc.gl+ family = poisson, data = Mental)
msc.gl> long.labels <- list(set_varnames = c(mental="Mental Health Status",
msc.gl+ ses="Parent SES"))
msc.gl> mosaic(indep,residuals_type="rstandard", labeling_args = long.labels, labeling=labeling_residuals)
msc.gl> # or, show as a sieve diagram
msc.gl> mosaic(indep, labeling_args = long.labels, panel=sieve, gp=shading_Friendly)
msc.gl> # fit linear x linear (uniform) association. Use integer scores for rows/cols
msc.gl> Cscore <- as.numeric(Mental$ses)
msc.gl> Rscore <- as.numeric(Mental$mental)
msc.gl> linlin <- glm(Freq ~ mental + ses + Rscore:Cscore,
msc.gl+ family = poisson, data = Mental)
msc.gl> mosaic(linlin,residuals_type="rstandard",
msc.gl+ labeling_args = long.labels, labeling=labeling_residuals, suppress=1, gp=shading_Friendly,
msc.gl+ main="Lin x Lin model")
msc.gl> ## Goodman Row-Column association model fits even better (deviance 3.57, df 8)
msc.gl> if (require(gnm)) {
msc.gl+ Mental$mental <- C(Mental$mental, treatment)
msc.gl+ Mental$ses <- C(Mental$ses, treatment)
msc.gl+ RC1model <- gnm(Freq ~ ses + mental + Mult(ses, mental),
msc.gl+ family = poisson, data = Mental)
msc.gl+
msc.gl+ mosaic(RC1model,residuals_type="rstandard",
msc.gl+ labeling_args = long.labels, labeling=labeling_residuals, suppress=1, gp=shading_Friendly,
msc.gl+ main="RC1 model")
msc.gl+ }
Initialising
Running start-up iterations..
Running main iterations........
Done
msc.gl> ############# UCB Admissions data, fit using glm()
msc.gl>
msc.gl> structable(Dept ~ Admit+Gender,UCBAdmissions)
Dept A B C D E F
Admit Gender
Admitted Male 512 353 120 138 53 22
Female 89 17 202 131 94 24
Rejected Male 313 207 205 279 138 351
Female 19 8 391 244 299 317
msc.gl> berkeley <- as.data.frame(UCBAdmissions)
msc.gl> berk.glm1 <- glm(Freq ~ Dept * (Gender+Admit), data=berkeley, family="poisson")
msc.gl> summary(berk.glm1)
Call:
glm(formula = Freq ~ Dept * (Gender + Admit), family = "poisson",
data = berkeley)
Deviance Residuals:
Min 1Q Median 3Q Max
-3.4776 -0.4144 0.0098 0.3089 2.2321
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 6.27557 0.04248 147.744 < 2e-16 ***
DeptB -0.40575 0.06770 -5.993 2.06e-09 ***
DeptC -1.53939 0.08305 -18.536 < 2e-16 ***
DeptD -1.32234 0.08159 -16.207 < 2e-16 ***
DeptE -2.40277 0.11014 -21.816 < 2e-16 ***
DeptF -3.09624 0.15756 -19.652 < 2e-16 ***
GenderFemale -2.03325 0.10233 -19.870 < 2e-16 ***
AdmitRejected -0.59346 0.06838 -8.679 < 2e-16 ***
DeptB:GenderFemale -1.07581 0.22860 -4.706 2.52e-06 ***
DeptC:GenderFemale 2.63462 0.12343 21.345 < 2e-16 ***
DeptD:GenderFemale 1.92709 0.12464 15.461 < 2e-16 ***
DeptE:GenderFemale 2.75479 0.13510 20.391 < 2e-16 ***
DeptF:GenderFemale 1.94356 0.12683 15.325 < 2e-16 ***
DeptB:AdmitRejected 0.05059 0.10968 0.461 0.645
DeptC:AdmitRejected 1.20915 0.09726 12.432 < 2e-16 ***
DeptD:AdmitRejected 1.25833 0.10152 12.395 < 2e-16 ***
DeptE:AdmitRejected 1.68296 0.11733 14.343 < 2e-16 ***
DeptF:AdmitRejected 3.26911 0.16707 19.567 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 2650.095 on 23 degrees of freedom
Residual deviance: 21.736 on 6 degrees of freedom
AIC: 216.8
Number of Fisher Scoring iterations: 4
msc.gl> mosaic(berk.glm1, gp=shading_Friendly, labeling=labeling_residuals, formula=~Admit+Dept+Gender)
msc.gl> # the same, displaying studentized residuals; note use of formula to reorder factors in the mosaic
msc.gl> mosaic(berk.glm1, residuals_type="rstandard", labeling=labeling_residuals, shade=TRUE,
msc.gl+ formula=~Admit+Dept+Gender, main="Model: [DeptGender][DeptAdmit]")
msc.gl> ## all two-way model
msc.gl> berk.glm2 <- glm(Freq ~ (Dept + Gender + Admit)^2, data=berkeley, family="poisson")
msc.gl> summary(berk.glm2)
Call:
glm(formula = Freq ~ (Dept + Gender + Admit)^2, family = "poisson",
data = berkeley)
Deviance Residuals:
1 2 3 4 5 6 7 8
-0.75481 0.99471 1.96454 -3.15768 -0.03402 0.04449 0.15709 -0.22034
9 10 11 12 13 14 15 16
1.01273 -0.73839 -0.74367 0.54896 0.06760 -0.04741 -0.06911 0.05080
17 18 19 20 21 22 23 24
1.05578 -0.61236 -0.73617 0.42678 -0.20117 0.05113 0.19803 -0.05370
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 6.27150 0.04271 146.855 < 2e-16 ***
DeptB -0.40322 0.06784 -5.944 2.78e-09 ***
DeptC -1.57790 0.08949 -17.632 < 2e-16 ***
DeptD -1.35000 0.08526 -15.834 < 2e-16 ***
DeptE -2.44982 0.11755 -20.840 < 2e-16 ***
DeptF -3.13787 0.16174 -19.401 < 2e-16 ***
GenderFemale -1.99859 0.10593 -18.866 < 2e-16 ***
AdmitRejected -0.58205 0.06899 -8.436 < 2e-16 ***
DeptB:GenderFemale -1.07482 0.22861 -4.701 2.58e-06 ***
DeptC:GenderFemale 2.66513 0.12609 21.137 < 2e-16 ***
DeptD:GenderFemale 1.95832 0.12734 15.379 < 2e-16 ***
DeptE:GenderFemale 2.79519 0.13925 20.073 < 2e-16 ***
DeptF:GenderFemale 2.00232 0.13571 14.754 < 2e-16 ***
DeptB:AdmitRejected 0.04340 0.10984 0.395 0.693
DeptC:AdmitRejected 1.26260 0.10663 11.841 < 2e-16 ***
DeptD:AdmitRejected 1.29461 0.10582 12.234 < 2e-16 ***
DeptE:AdmitRejected 1.73931 0.12611 13.792 < 2e-16 ***
DeptF:AdmitRejected 3.30648 0.16998 19.452 < 2e-16 ***
GenderFemale:AdmitRejected -0.09987 0.08085 -1.235 0.217
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 2650.095 on 23 degrees of freedom
Residual deviance: 20.204 on 5 degrees of freedom
AIC: 217.26
Number of Fisher Scoring iterations: 4
msc.gl> mosaic.glm(berk.glm2, residuals_type="rstandard", labeling = labeling_residuals, shade=TRUE,
msc.gl+ formula=~Admit+Dept+Gender, main="Model: [DeptGender][DeptAdmit][AdmitGender]")
msc.gl> anova(berk.glm1, berk.glm2, test="Chisq")
Analysis of Deviance Table
Model 1: Freq ~ Dept * (Gender + Admit)
Model 2: Freq ~ (Dept + Gender + Admit)^2
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 6 21.735
2 5 20.204 1 1.5312 0.2159
msc.gl> # Add 1 df term for association of [GenderAdmit] only in Dept A
msc.gl> berkeley <- within(berkeley, dept1AG <- (Dept=='A')*(Gender=='Female')*(Admit=='Admitted'))
msc.gl> berkeley[1:6,]
Admit Gender Dept Freq dept1AG
1 Admitted Male A 512 0
2 Rejected Male A 313 0
3 Admitted Female A 89 1
4 Rejected Female A 19 0
5 Admitted Male B 353 0
6 Rejected Male B 207 0
msc.gl> berk.glm3 <- glm(Freq ~ Dept * (Gender+Admit) + dept1AG, data=berkeley, family="poisson")
msc.gl> summary(berk.glm3)
Call:
glm(formula = Freq ~ Dept * (Gender + Admit) + dept1AG, family = "poisson",
data = berkeley)
Deviance Residuals:
1 2 3 4 5 6 7 8
0.00000 0.00000 0.00000 0.00000 -0.06316 0.08273 0.29514 -0.40088
9 10 11 12 13 14 15 16
0.55733 -0.41519 -0.41820 0.30511 -0.30655 0.21843 0.32036 -0.23141
17 18 19 20 21 22 23 24
0.69837 -0.41419 -0.49916 0.28628 -0.42032 0.10861 0.42684 -0.11382
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 6.23832 0.04419 141.157 < 2e-16 ***
DeptB -0.36850 0.06879 -5.357 8.47e-08 ***
DeptC -1.50215 0.08394 -17.895 < 2e-16 ***
DeptD -1.28509 0.08250 -15.577 < 2e-16 ***
DeptE -2.36552 0.11081 -21.347 < 2e-16 ***
DeptF -3.05899 0.15803 -19.357 < 2e-16 ***
GenderFemale -2.80176 0.23628 -11.858 < 2e-16 ***
AdmitRejected -0.49212 0.07175 -6.859 6.94e-12 ***
dept1AG 1.05208 0.26271 4.005 6.21e-05 ***
DeptB:GenderFemale -0.30730 0.31243 -0.984 0.325
DeptC:GenderFemale 3.40313 0.24615 13.825 < 2e-16 ***
DeptD:GenderFemale 2.69560 0.24676 10.924 < 2e-16 ***
DeptE:GenderFemale 3.52330 0.25220 13.970 < 2e-16 ***
DeptF:GenderFemale 2.71207 0.24787 10.941 < 2e-16 ***
DeptB:AdmitRejected -0.05074 0.11181 -0.454 0.650
DeptC:AdmitRejected 1.10781 0.09966 11.116 < 2e-16 ***
DeptD:AdmitRejected 1.15699 0.10381 11.145 < 2e-16 ***
DeptE:AdmitRejected 1.58162 0.11933 13.254 < 2e-16 ***
DeptF:AdmitRejected 3.16777 0.16848 18.803 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 2650.0952 on 23 degrees of freedom
Residual deviance: 2.6815 on 5 degrees of freedom
AIC: 199.74
Number of Fisher Scoring iterations: 3
msc.gl> mosaic.glm(berk.glm3, residuals_type="rstandard", labeling = labeling_residuals, shade=TRUE,
msc.gl+ formula=~Admit+Dept+Gender, main="Model: [DeptGender][DeptAdmit] + DeptA*[GA]")
msc.gl> anova(berk.glm1, berk.glm3, test="Chisq")
Analysis of Deviance Table
Model 1: Freq ~ Dept * (Gender + Admit)
Model 2: Freq ~ Dept * (Gender + Admit) + dept1AG
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 6 21.7355
2 5 2.6815 1 19.054 1.271e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Warning messages:
1: no formula provided, assuming ~ses + mental
2: no formula provided, assuming ~ses + mental
3: no formula provided, assuming ~ses + mental
4: no formula provided, assuming ~ses + mental
demo(mental-glm)
---- ~~~~~~~~~~
> ## Mental health data: mosaics for glm() and gnm() models
> library(gnm)
> library(vcdExtra)
> data(Mental)
> # display the frequency table
> (Mental.tab <- xtabs(Freq ~ mental+ses, data=Mental))
ses
mental 1 2 3 4 5 6
Well 64 57 57 72 36 21
Mild 94 94 105 141 97 71
Moderate 58 54 65 77 54 54
Impaired 46 40 60 94 78 71
> # fit independence model
> # Residual deviance: 47.418 on 15 degrees of freedom
> indep <- glm(Freq ~ mental+ses,
+ family = poisson, data = Mental)
> deviance(indep)
[1] 47.41785
> long.labels <- list(set_varnames = c(mental="Mental Health Status", ses="Parent SES"))
> mosaic(indep,residuals_type="rstandard", labeling_args = long.labels, labeling=labeling_residuals,
+ main="Mental health data: Independence")
> # as a sieve diagram
> mosaic(indep, labeling_args = long.labels, panel=sieve, gp=shading_Friendly,
+ main="Mental health data: Independence")
> # fit linear x linear (uniform) association. Use integer scores for rows/cols
> Cscore <- as.numeric(Mental$ses)
> Rscore <- as.numeric(Mental$mental)
> # column effects model (ses)
> coleff <- glm(Freq ~ mental + ses + Rscore:ses,
+ family = poisson, data = Mental)
> mosaic(coleff,residuals_type="rstandard",
+ labeling_args = long.labels, labeling=labeling_residuals, suppress=1, gp=shading_Friendly,
+ main="Mental health data: Col effects (ses)")
> # row effects model (mental)
> roweff <- glm(Freq ~ mental + ses + mental:Cscore,
+ family = poisson, data = Mental)
> mosaic(roweff,residuals_type="rstandard",
+ labeling_args = long.labels, labeling=labeling_residuals, suppress=1, gp=shading_Friendly,
+ main="Mental health data: Row effects (mental)")
> linlin <- glm(Freq ~ mental + ses + Rscore:Cscore,
+ family = poisson, data = Mental)
> # compare models
> anova(indep, roweff, coleff, linlin)
Analysis of Deviance Table
Model 1: Freq ~ mental + ses
Model 2: Freq ~ mental + ses + mental:Cscore
Model 3: Freq ~ mental + ses + Rscore:ses
Model 4: Freq ~ mental + ses + Rscore:Cscore
Resid. Df Resid. Dev Df Deviance
1 15 47.418
2 12 6.281 3 41.137
3 10 6.829 2 -0.549
4 14 9.895 -4 -3.066
> AIC(indep, roweff, coleff, linlin)
df AIC
indep 9 209.5908
roweff 12 174.4537
coleff 14 179.0023
linlin 10 174.0681
> mosaic(linlin,residuals_type="rstandard",
+ labeling_args = long.labels, labeling=labeling_residuals, suppress=1, gp=shading_Friendly,
+ main="Mental health data: Linear x Linear")
> ## Goodman Row-Column association model fits well (deviance 3.57, df 8)
> Mental$mental <- C(Mental$mental, treatment)
> Mental$ses <- C(Mental$ses, treatment)
> RC1model <- gnm(Freq ~ mental + ses + Mult(mental, ses),
+ family = poisson, data = Mental)
Initialising
Running start-up iterations..
Running main iterations......
Done
> mosaic(RC1model,residuals_type="rstandard",
+ labeling_args = long.labels, labeling=labeling_residuals, suppress=1, gp=shading_Friendly,
+ main="Mental health data: RC1 model")
Warning messages:
1: no formula provided, assuming ~ses + mental
2: no formula provided, assuming ~ses + mental
3: no formula provided, assuming ~ses + mental
4: no formula provided, assuming ~ses + mental
5: no formula provided, assuming ~ses + mental
6: no formula provided, assuming ~ses + mental
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.