In nevrome/varnastats: Analyse some aspects of prehistoric burial sites

devtools::load_all()
library(reshape2)
library(ggplot2)
library(corrplot)
library(dplyr)
library(igraph)
library(ca)

This Vignette shows a workflow for archaeologists workig on burial sites. Starting point is a data.frame with burials (objects) and their classified attributes (grave goods, orientation, size, sex etc.). An example dataset with 50 fictional burials and 18 stereotypical attributes is provided in data/bs1. Here's an extract of bs1:

bs1[1:10,9:16]

For bivariate analysis it's useful to use categorized data. In the context of burialsites the simple information wether a grave good is present or absent in a certain grave is often already sufficient. To reduce our dataset to this information the function booleanize() can be applied. Empty graves and attributes that never appeare can be removed with itremove(x,1).

bs <- quantAAR::booleanize(bs1)
bs <- quantAAR::itremove(bs,1)
bs[1:10,9:16]

Let's look at the appearences of the individual grave goods with presencecount(). While most goods are frequent, pottery_4 is quite rare - too rare to be meaningfully analyzed with chi square statistics.

presencematerial <- varnastats::presencecount(bs[,7:18], dim = 1)

presence.m <- reshape2::melt(presencematerial)
ggplot(presence.m,
       aes(x = variable,
           y = value)) +
  geom_bar(stat = "identity")

With itremove() we can delete every grave good that appeares less than ten times. As a consequence pottery_4 is removed.

bsred <- bs[,7:18]
bsred <- quantAAR::itremove(bsred, cmin = 10, rmin = 0)

presencematerial <- varnastats::presencecount(bsred, dim = 1)
presence.m <- reshape2::melt(presencematerial)
ggplot(presence.m,
       aes(x = variable,
           y = value)) +
  geom_bar(stat = "identity")

Now we can calculate the bivariate relations of all attributes with the function corrmat(). corrmat() offers different correlation values. In this case we use the testdecision of the chi square test on a significance level of 2% and the phi coefficient. rmnegcorr() is used to remove significant but negative relationships.

bsprep <- data.frame(bs[,1:6], bsred)
corrtablechi2test <- varnastats::corrmat(bsprep, method = "chi2", dim = 1, chi2limit = 0.02)
corrtablephi <- varnastats::corrmat(bsprep, method = "phi", dim = 1)
mastercorr <- varnastats::rmnegcorr(corrtablephi, bsprep, niv = 0.1, dim = 1)

col2 <- grDevices::colorRampPalette(c("white","white", "chartreuse4"))
corrplot::corrplot(
  t(mastercorr),
  method = c("color")   
  )

reltable() creates a list of the significant relationships within the correlation table. We choose the phi coefficient to be the first correlation value and the chi square test decision to be the second.

signicorr <- varnastats::reltable(mastercorr, corrtablechi2test)
signicorr <- dplyr::filter(signicorr, corrvalue2 == TRUE)

signicorr

The network of bivariate relationships can be plotted as a graph with the igraph-package. The male and the female clusters are clearly distinguishable.

signicorrmod <- data.frame(
  from = signicorr$namevar1, 
  to = signicorr$namevar2, 
  weight = signicorr$corrvalue)

graphbasis <- igraph::graph.data.frame(signicorrmod, directed = TRUE)
igraph::plot.igraph(graphbasis)

Within signicorr it's easy to search for the directly to an individual attribute linked attributes (level 1 relations).

mvar <- c("sex_male", "sex_female")

mvar1male <- dplyr::filter(
  signicorr, 
  namevar1 == mvar[1] | 
    namevar2 == mvar[1]
)

mvar1male

mvar1female <- dplyr::filter(
  signicorr, 
  namevar1 == mvar[2] | 
    namevar2 == mvar[2]
)


mvar1female

predictvo() uses the initial data.frame and signicorr to make a prediction about whether an object could contain a variable or not based on cross-references. Useful to determine for example the sex of buried individuals based on their grave goods when no anthropological determination is availbable.

predictgen <- varnastats::predictvo(bsprep, signicorr, mvar, level = 1)

sexprediction <- data.frame(
  m = predictgen[,1], 
  w = predictgen[,3],
  statsex = NA,
  mtat = bsprep$sex_male, 
  wtat = bsprep$sex_female,
  names = make.names(rownames(bsprep))
  )

for (i in 1:length(sexprediction[,1])){
  if (sexprediction$m[i] >= 1.5*sexprediction$w[i]){
    sexprediction$statsex[i] <- "m"
  } else if (sexprediction$w[i] >= 1.5*sexprediction$m[i]) {
    sexprediction$statsex[i] <- "w"
  } else {
    sexprediction$statsex[i] <- "uncertain"
  }
}

sexprediction