In zooaRch: Analytical Tools for Zooarchaeological Data

Zooarchaeology is concerned with the analysis and inference of faunal remains recovered from archaeological sites. The zooaRch package provides analytical tools for zooarchaeological data. Functions in this package allow users to:

read
manipulate
visualize
and analyze zooarchaeological data

1. Survivorship and Mortality^[The surv.func function is only intended for traditional survivorship analyses, while mort.func is the analog for analyses of mortality profiles.That is, the data must be assignable to discrete age classes. In zooarchaeology, this is traditionally dental eruption/wear data. Epiphyseal fusion data are treated in section 2.]

1.1 Introduction

Functions surv.func and mort.func create confidence intervals around survivorship and mortality data respectively. surv.func takes bootstrap samples of the original dataset, calculates the Kaplan-Meier Estimator (KME) for each, and then calculates the confidence interval. The function mort.func bootstraps the mortality profiles. Defaults are 1000 iterations for the bootstrap and 95% for the confidence interval.

1.2 Setting up the Data for mortality and survivorship

The data must be coded in the long way (i.e., each specimen must have its own entry). The dataframe should include two columns: one called "Genus" and one called "Ageclass." For example, the first six entries of the Marj Rabba sheep/goat data from Price et al.^[Price, M.D., Buckley, M., Rowan, Y.M., Kersel, M., 2013. Animal Management Strategies during the Chalcolithic in the Lower Galilee: New Data from Marj Rabba, Paleorient 39, 183-200.] is:

|Genus |Ageclass| |:--------:|:----:| |Ovis/Capra| 1| |Ovis/Capra| 2| |Ovis/Capra| 3| |Ovis/Capra| 3| |Ovis/Capra| 3| |Ovis/Capra| 3|

Note: the Ageclass must be numeric (i.e., you cannot use "A", "B", "C", etc. for age classes).

1.3 Arguments.

Function arguments are very similar for both surv.func and mort.func. Here we describe the surv.func arguments. See help(mort.func) and help(surv.func) for more details on the arguments for mortality profile and survivorship bootstrapping.

surv.func(SurviveData, labels = NULL, models = NULL, usermod = NULL, ci = 95, plot = TRUE, iter = 1000)

SurviveData The dataset described in section 1.2 (in mort.func this is "mortData").
Labels The names of the age classes. By default, they are numeric from 1 to x, where x is the total number of age classes. One can define labels following this example for nine age classes A-I: Labels.ageclass<-c("A", "B", "C", "D", "E", "F", "G", "H", "I")
models User-defined models. Already existing models include:
```
    model.security: .904, .904, .645, .38, .25, .239, .171, .118, 0
    model.milk: .47, .42, .39, .35, .28, .23, .19, .1, 0
    model.wool: .85, .75, .65, .63, .57, .50, .43, .20, 0
```
For nine age classes (Payne 1973; Marom and Bar-Oz 2009 ^[Payne, S., 1973. Kill-off Patterns in Sheep and Goats: The Mandibles from Asvan Kale, Anatolian Studies 23, 281-303; Marom, N., Bar-Oz, G., 2009. Culling Profiles: The Indeterminacy of Archaeozoological Data to Survivorship Curve Modelling of Sheep and Goat Herd Maintenance Strategies, Journal of Archaeological Science 36, 1184-1187.])
```
    model.catastrophe: .86, .73, .60, .49, .39, .31, .23, .16, .11, .07, .03, .01, 0
    model.attritional: .76, .53, .48, .46, .45, .42, .39, .34, .29, .20, .11, .03, 0
```
For 13 age classes (Stiner 1990; Speth 1983 ^[Stiner, M.C., 1990. The Use of Mortality Patterns in Archaeological Studies of Homonid Predatory Adaptations, Journal of Anthropological Archaeology 9, 305-351. Speth, J.D., 1983. Bison Kills and Bone Counts: Decision Making by Ancient Hunters, University of Chicago Press, London. ])
usermod The user can define new models following this example, which uses five age classes (note: models must be a list):

new.model <- c(.9, .7, .5, .2, .1) new.model2 <- c(.9, .8, .4, .3, .1) models <- list(Model1=new.model, Model2=new.model2) surv.func(SurviveData, models=NULL, usermod=models) This graphs the two new models and labels them as "Model1" and "Model2" in the legend.
ci The confidence interval. Change to, e.g., ci = 90 or ci = 68 as desired)
plot If TRUE, it plots the data on a survivorship curve. If FALSE, it does not.
iter The number of bootstrap iterations. Change to, e.g., iter = 10000.

1.4 What the Function surv.func Does.

The function calculates the KME values for the original dataset.

> survivorcurve.Eq4 <- function(data)
{
  vector <- rep(1, N.ages+1) 
  for(i in 1:N.ages+1) 
  {
    vector[i] <- vector[i-1]*(sum(data >= (i-1)) - sum(data == > (i-1)))/sum(data >= (i-1)) 
 }
 vector[is.na(vector)] <- 0
 round(vector,4) 
}

It then takes x samples of the dataset, with x equal to the number of iterations (Default = 1000). The samples are initially structured in the same way as the original dataset, with each sample size equal to the size of the original (e.g., in the Marj Rabba example, there are 42 entries in each sample). The function calculates the KME for each sample

survive.matrix <- matrix(NA, ncol = N.ages+1, nrow = 1000)
survive.matrix[1,] <- survivorcurve.Eq4(SurviveData$Ageclass)
for(i in 2:1000)
{
  bootstrap <- sample(1:nrow(SurviveData), nrow(SurviveData),   replace = TRUE)
  survive.matrix[i,] <-
survivorcurve.Eq4(SurviveData$Ageclass[bootstrap])
}

Plotting and the calculation of the confidence intervals is straightforward:

plot(x = (1:N.ages), y = survive.matrix[1,-1], type = "l", lwd = 2, xlab = "Age Class", 
     ylab = "Proportion Survived", axes = FALSE, ylim = 
c(0,1))
axis(side = 1, at = 1:N.ages, 
     labels = Labels.ageclass)
axis(side = 2)

lines(x = 1:N.ages, y = apply(survive.matrix[,-1], MARGIN = 2, FUN = quantile, probs = 0.025), lty = "dashed", ylim = c(0,1))

lines(x = 1:N.ages, y = apply(survive.matrix[,-1], MARGIN = 2, FUN = quantile, probs = 0.975), lty = "dashed", ylim = c(0,1))

Plotting models and a legend is also straightforward:

lines(x = 1:N.ages, y = model.catastrophe, col = "blue", lwd = 2, lty = "dotted")
lines(x = 1:N.ages, y = model.attritional, col = "red", lwd = 2, lty = "dotdash")

legend(x = "topright", cex = .75, lwd = c(2,1,2,2,2),lty = 
c("solid", "dashed", "dotted", "dotdash"), 
       col = c("black", "black", "blue", "red"),
       legend = c("Survivorship", "95% Confidence Interval", 
    "Catastrophe", "Attritional"))

Finally, the function makes a table of the results:

data.LowerCI <- apply(survive.matrix[,-1], MARGIN = 2, FUN = 
quantile, prob = 0.025)
data.PointValue <- survive.matrix[1,-1]
data.UpperCI <- apply(survive.matrix[,-1], MARGIN = 2, FUN =  quantile, prob = 0.975)

Taxon <- rep(unique(SurviveData$Genus), times = N.ages)

Output.Matrix <- data.frame(Taxon = Taxon, AgeClassLabs =  
    Labels.ageclass, LowerCI = data.LowerCI, PointValue = 
    data.PointValue, UpperCI = data.UpperCI)

Output.Matrix

1.5 What the Function mort.func Does.

The function calculates mortality profile values using the age distribution as a proportion from the original dataset.

> mortprof <- function(data){
    vector <- rep(NA, N.ages)   
    for(i in 1:N.ages) {
      vector[i] <- sum(data==i)/length(data)
    }
    vector[is.na(vector)] <- 0
    round(vector,4) 
  }

It then takes x samples of the dataset, with x equal to the number of iterations (Default iter = 1000). The samples are initially structured in the same way as the original dataset, with each sample size equal to the size of the original (e.g., in the Marj Rabba example, there are 42 entries in each sample). The function calculates the mortality profile using the proportional age distribution for each sample.

mortality.matrix <- matrix(NA, ncol = N.ages, nrow = iter)
mortality.matrix[1,] <- mortprof(data)
for(i in 2:iter){
  bootstrap <- sample(1:length(data), length(data), replace = TRUE)
  mortality.matrix[i,] <- unlist(mortprof(data[bootstrap]))
}

Plotting and the calculation of the confidence intervals is straightforward:

bar<-barplot(mortality.matrix[1,],ylim=c(0,(max(upCI)+.1)),
            names=Labels.ageclass,ylab=ylab,
            xlab=xlab,beside=T)
      g<-(max(bar)-min(bar))/110
      for (i in 1:length(bar))         {
        lines(c(bar[i],bar[i]),c(upCI[i],loCI[i]))
        lines(c(bar[i]-g,bar[i]+g),c(upCI[i],upCI[i]))
        lines(c(bar[i]-g,bar[i]+g),c(loCI[i],loCI[i]))
      }

Plotting models and a legend and making a table of the results is also a sstraight forward as shown for surv.func.

2. Survivorship using the %Fused Method

2.1 Introduction

The fuse.func function creates confidence intervals for the survivorship based on epiphyseal fusion, using the %Fused method. This method takes the ratio of fused to fused-plus-unfused bones for each fusion group (a fusion group is a group of elements that fuse around the same time). It is important to note that this is not equivalent to survivorship based on age classes. Fusion groups are not age classes, since the specimens pertinent to each fusion group are from animals of different ages. In effect, each fusion group is really just a survivorship curve with two age classes - fused and unfused.

The fuse.func function requires the user to load a .csv file with ("Data", below). The user is then prompted to enter the number of fusion groups, the number of elements in each fusion group, and finally the names of the elements in each fusion group. The function then takes bootstrap samples of the original dataset for the %Fused in each fusion group. It then calculates the confidence interval. Defaults are 1000 iterations for the bootstrap and 95% for the confidence interval.

Note: the graphical output requires the package ggplot2^[Wickman, H., 2009. ggplot2: elegant graphics for data analysis, Springer, New York.]. The code automatically installs and loads ggplot2 if not already loaded.

2.2 Setting up the Data

The data must be coded in the long way (i.e., each specimen must have its own entry) in a .csv file. The dataframe should include three columns: "Identification" (a unique identification number, this can be left blank); "Element" (these should be differentiated according to proximal and distal, when necessary); "Fusion" ("Fused" or "Unfused"). An example:

2.3 Arguments

The fuse.func function is in the following format by default:

fuse.func(data, iter=100, ci = 95, plotci=TRUE, plot.title=NULL)

data The dataset described in section 2.2. Must be a dataframe.
iter The number of bootstrap iterations.
ci The confidence interval. Change to, e.g., ci = 90 or ci = 68 as desired).
plotci Command for plotting the %Fused graph with confidence intervals. If FALSE, no graph is produced. Uses ggplot2.
plot.title This is the title of graph. Change, e.g., to plot.title = "Example Graph".

2.4 What the Function Does

The function first requires the user to define the data. Then the function initiates a series of command prompts. First, for example,

> Enter number of fusion groups
> 5

Then

> Enter number of skeletal elements for fusion group A
> 2

.
.
.

> Enter number of skeletal elements for fusion group E 
> 1

Then

> Enter the 2 names of skeletal elements for fusion group A then > press enter 
> Px.Radius
> Ds.Humerus

.
.
.

> Enter the 1 names of skeletal elements for fusion group E then > press enter 
> Phalanx1

The R code for this is as follows:

fuse.func<-function(data,iter=1000,plotci=TRUE,plot.title=NULL){
  require(ggplot2)
  cat(paste("Enter number of fusion groups"), "\n")
  ans<-readLines(n = 1)
  ans <- as.numeric(ans)  
  fu.grps<-LETTERS[1:ans]
  ske.n<-numeric(length(fu.grps))  
  for(i in 1:length(ske.n)){
    cat(paste("Enter number of skeletal elements for fusion group"), fu.grps[i],"\n")
    ans<-readLines(n = 1)
    ske.n[i] <- as.numeric(ans)
  }  
  ele.list<-as.list(rep(NA,length(ske.n)))
  names(ele.list)<-fu.grps
  for(i in 1: length(ske.n)){
    cat(paste("Enter the", ske.n[i],"names of skeletal elements for fusion group"), fu.grps[i],"then press enter","\n")
    ele.list[[i]]<-readLines(n = ske.n[i])
  }

Then the function calculates the %Fused for each fusion group:

pctfuse<-function(dat){
    pct.ufu<-n<-numeric(length(ele.list))
    names(pct.ufu)<-fu.grps
    wh<-function(it){which(dat$Element==ele.list[[i]][it])} 
    for (i in 1:length(pct.ufu)){
      tab<-table(dat$Fusion[unlist(lapply(1:ske.n[i],wh))])
      fu<-tab[which(names(table(dat$Fusion[unlist(lapply(1:ske.n[i],wh))] ))=="Fused")]
      ufu<-tab[which(names(table(dat$Fusion[unlist(lapply(1:ske.n[i],wh))] ))=="Unfused")]
      if (is.nan(fu/(fu+ufu))){
        pct.ufu[i]<-0} else {pct.ufu[i]<-fu/(fu+ufu)}
      n[i]<-(fu+ufu)    
    }
    return(list(pct.ufu,n))
  }

It then bootstraps the %Fused values for each fusion group (an analogous process to part 1.4). It provides a table of the results, with confidence intervals (default = 95%).

  boot <- matrix(NA, ncol = length(ele.list), nrow = iter) 
  boot[1,] <- pctfuse(data)[[1]]
  for(i in 2:iter){
    data.boot<-data[sample(1:dim(data)[1],dim(data)[1],replace=T),]
    boot[i,] <- pctfuse(data.boot)[[1]]
  }

  ### Provide a Table of the Bootstrap Results
  quantilematrix <- matrix(NA, ncol = 2, nrow = length(fu.grps))
  for(i in 1:ncol(boot)){
    quantilematrix[i,] <- quantile(boot[,i], probs = c(0.025,0.975), na.rm = T)
  }
  outputtable <- data.frame(Fusion.groups = fu.grps, 
                            Data = round(boot[1,],2), 
                            LowerCI = round(quantilematrix[,1],2), UpperCI = round(quantilematrix[,2],2), 
                            Count = pctfuse(data)[[2]])

Plotting the results is the final step. The fuse.func function uses ggplot2 to graph the results. The resulting graph shows the %Fused on the y-axis, with lines representing the confidence intervals. The x-axis shows the fusion groups (which are labeled "A" , "B" , "C", etc.). Directly above the graphed %Fused values and confidence intervals are the numbers of specimens per fusion group.

  ### Plotting the %Fusion data
   ciplot<-ggplot(outputtable, aes(x = Fusion.groups, y = Data))+
    #now add the points
    geom_point(size = 3)+
    #add in the 95% confidence interval bars
    geom_errorbar(aes(ymax = UpperCI, ymin = LowerCI), width = 0.2)+
    #add in the sample size label for each fusion group
    #this uses the previously-made function
    geom_text(aes(x = Fusion.groups, y = rep(1.05, length(Fusion.groups)), label = Count))+
    #add in the theme (all of the background plotting details)
    #the size for element_text is font size, for element_line it is the thickness
    #element_blank() makes it so there is no background color to the plot
    theme(panel.background = element_blank(),
          panel.grid.minor = element_blank(),
          panel.border = element_rect(fill=NA, color = "black"),
          axis.title = element_text(color = "black", size = 20),
          axis.text = element_text(color = "black", size = 15),
          axis.ticks = element_line(color = "black", size = 0.75),
          plot.title = element_text(color = "black", size = 24))+
    #add in the titles/labels
    ylab("%Fused")+xlab("Fusion Group")+
    ggtitle(plot.title)
  if(plotci==TRUE){print(ciplot)}
  list(Output = outputtable, Bootstrap.Data = boot)
}