augusticeps <- lizards[lizardsspecies == "augusticeps", -1]  Following @fienberg80, we might first consider the relation between perch height and perch diameter for the sagrei species. To look at the eikosigram, these need to be turned into a table using the cross-tabulation xtabs() function: sagreiTable <- xtabs(count ~ perch_height_ft + perch_diameter_inches, data = sagrei)  and draw it eikos(perch_height_ft ~ perch_diameter_inches, data = sagreiTable, main = "Habitat of adult male anolis sagrei lizards")  include_graphics("img/DataAnalysis/sagreiPerch.png")  The two heights of the bars are very nearly the same, suggesting that the height and the diameter of the perch might be independent variates for the anolis sagrei lizard of Bimini. Indeed, a formal test of independence gives no evidence against this hypothesis. chisq.test(sagreiTable)  Now, were we to consider the same for anolis augusticeps, first the table augusticepsTable <- xtabs(count ~ perch_height_ft + perch_diameter_inches, data = augusticeps)  and then its probability picture eikos(perch_height_ft ~ perch_diameter_inches, data = augusticepsTable, main = "Habitat of adult male anolis augusticeps lizards")  include_graphics("img/DataAnalysis/augusticepsPerch.png")  Clearly, the bars do not appear to be of equal height. That the eikosogram is not even nearly flat is suggestive that perch height and diameter are not independent for the species anolis augusticeps. However, there needs to be some caution exercised here. As with anolis sagrei, one needs to formally test the hypothesis of independence for a sample of counts. There are so few observations (1,2, and 3 in three cells and 21 in the largest) that any formal test will also show no evidence against the hypothesis of independence in this case. (This includes a line up test using eikosograms as the display for each generated data set!) All three variates could be examined at once by producing the three way cross-classified table and the corresponding eikosograms. Following @fienberg80, we first consider the relation between perch height and perch diameter for the sagrei species of the anolis lizard. The table lizardsTable <- xtabs(count ~ species + perch_height_ft + perch_diameter_inches, data = lizards)  and its picture eikos(species ~ . , data = lizardsTable, lock_aspect = FALSE, yprobs = seq(0,1, 0.1))  include_graphics("img/DataAnalysis/lizards3way.png")  # listings (data frame rows) The final way in which categorical data is commonly presented is in a listing (e.g. data.frame) where each row corresponds to a single occurrence of that combination of values for the variates. For example, consider the mtcars dataset in R, the first few rows of which are knitr::kable(head(mtcars))  Although all of these variables are numeric, it is clear that many of them could be treated as categorical. For example, vs is a simple indicator variable indicating the shape of the engine block -- 0 if it the cylinders are arranged in a "V" pattern and 1 if they are aligned in a straight line. This variable might instead have been represented as a factor. The same could be said of the variable am which is also a binary variable indicating 0 for an automatic transmission and 1 for a manual. To make the point, we replace these two numeric variables by factors. mtcarsvs <- factor(mtcars$vs, labels = c("V-shaped", "straight")) mtcars$am <- factor(mtcars$am, labels = c("automatic", "manual"))  so that the first few rows of mtcars now look like knitr::kable(head(mtcars))  The eikosogram of these two factors can be compared without contructing a table; eikos() does the counting of matching rows to produce the picture. eikos(am ~ vs, data = mtcars)  include_graphics("img/DataAnalysis/amvs.png")  This matching does not depend on the variables being factors. Any variable in a data.frame would do and would be treated as if it were categorical. In the mtcars data, there are several variables that are effectively ordinal categorical variates, namely cyl, gear, and carb. These too could be explored using eikosograms. Transmission type versus number of cylinders eikos(am ~ cyl, data = mtcars)  include_graphics("img/DataAnalysis/amordinal1.png")  and number of forward gears versus number of cylinders eikos(gear ~ cyl, data = mtcars)  include_graphics("img/DataAnalysis/ordinal2.png")  # fitted models There are numerous types of models that can be fitted to categorical data. One of the more popular are generalized linear models. For example, we might fit a log-linear model to the lizards habitat data. fittedModel <- glm(count ~ species + perch_height_ft, family="poisson", data = lizards)  This is a "main effects" only model and contains no interaction term. To see what this model asserts about the relationship between species and perch_height, we need to use the fitted.values from the model as the expected counts. A new data.frame is constructed from the fit as follows: # Can simply append the fitted values to the lizards to get a new data frame lizardsFit <- data.frame(lizards, fit = fittedModel$fitted.values)
# and create the table
fitTable <- xtabs(fit ~  species + perch_height_ft,  data=lizardsFit)


The eikosogram corresponding to the model fitted to these two variables is

eikos("species", "perch_height_ft", data = fitTable)

include_graphics("img/DataAnalysis/poisson1.png")


from which we can see that the model is asserting that species and perch_height_ft are independently distributed. In log-linear modelling, independence and conditional independence are asserted by the interaction terms which appear in these (hierarchical) models.

For example, the following model forces independence between perch_height_ft and perch_diameter_inches, conditional on species.

fittedModel3way <- glm(count ~ species + perch_height_ft + perch_diameter_inches +
perch_height_ft * species +
perch_diameter_inches * species,
family="poisson",
data = lizards)


The absence of the terms perch_height_ft * perch_diameter_inches and perch_height_ft * perch_diameter_inches * species means that when species is held fixed, the terms perch_height_ft and perch_diameter_inches are sepable in the model. Hence they are conditionally independent. As before, we can see this by viewing the eikosogram for the fitted values.

# Can simply append the fitted values to the lizards to get a new data frame
lizardsFit3way <- data.frame(lizards,  fit = fittedModel3way\$fitted.values)
# and create the table
fitTable3way <- xtabs(fit ~  species + perch_height_ft + perch_diameter_inches,  data=lizardsFit3way)
# and show the eikosograms
eikos(y = "perch_diameter_inches", x = c("perch_height_ft", "species"), data = fitTable3way,
xlab_rot = 30)

include_graphics("img/DataAnalysis/poisson2.png")


and the conditional independence asserted by the model becomes obvious. See also vignette on independence exploration.

# references

## Try the eikosograms package in your browser

Any scripts or data that you put into this service are public.

eikosograms documentation built on May 1, 2019, 10:52 p.m.