Introduction

Statistical tests for this study were conducted using the Bayesian framework described in Krushcke (2011). Bayesian statistics are familiar to archaeologists in the context of radiocarbon dating because they provide formal methods for improving the calibration of radiocarbon ages by combining them with other data, such as stratigraphic information, that give additional constraints to the chronological order of the dated samples (Ramsey, 2009). Beyond calibrating radiocarbon dates, there have been very few applications of Bayesian methods of statistical inference in archaeology (Buck and Litton, 1990a; Buck and Litton, 1990b; Buck et al., 1996; Dellaportas, 1998; Gowland and Chamberlain, 2002; Halekoh and Vach, 2004). Due to the rarity of Bayesian inference in archaeology more broadly, we present here an extended discussion to motivate its use in place of the classical Null Hypothesis Significance Tests (NHST). These classical tests, such as the t-test, chi-square and ANOVA, are based on frequentist inference, and are very common in the archaeological literature.

Although frequentist inference dominates the archaeological literature, we were motivated to explore alternatives by several concerns. Serious disadvantages of the frequentist approach have been identified in other disciplines over recent decades (some of the circa 500 publications on this topic include: Guttman, 1985; Cohen, 1994; Schmidt, 1996; Gill, 1999; Johnson, 1999; Nickerson, 2000; Stephens et al., 2007; Wagenmakers, 2007; Lambdin, 2012). The catalog of problems and misunderstandings surrounding the use of NHST is extensive, so we here we only briefly review some of the most frequently noted objections before describing how Bayesian methods provide options for avoiding these problems in our specific case.

Many of the criticisms focus on misuse of the p-values that are generated by NHST methods (Halsey et al., 2015). For example, lower p-values are sometimes interpreted as indicating greater significance (an effect size statistic is necessary to obtain this information; Gliner et al., 2002). Further misunderstandings include that the p-value gives the odds that a research hypothesis is correct, that a result will replicate, that the null hypothesis is true, and that statistical significance indicated by the p-value is equivalent to scientific significance (Carver, 1993; Cohen, 1994; Lambdin, 2012). Even when the NHST results are interpreted correctly, others have pointed out that NHST are often under-powered, and that small differences in estimates of population parameters from large samples, no matter how scientifically insignificant, will yield significant NHST results (Nix and Barnette, 1998; Johnson, 1999). While some of these problems can be mitigated by including related statistical measures (such as effect sizes and confidence intervals), there have been claims that researchers should abandon NHST as a method of statistical inference (Loftus, 1996) or be forbidden from using them in publication (Shrout, 1997; Fidler et al., 2004).

Although several alternatives exist to frequentist inference, our choice of a Bayesian approach is motivated by its suitability to the specific details of this analysis. We were especially motivated by the relative conceptual and computational simplicity of conducting Bayesian analyses, compared to the alternatives (ie. likelihood-based statistics, and Akaikean-Information Criterion-based statistics). It is noteworthy that among philosophers of science, both supporters and critics consider Bayesianism to be dominant view in their field, and the paradigm has been central to recent developments in disciplines such as philosophy, statistics, ecology, and computer science (Bandyopadhyay and Forster, 2009).

There are two specific details of this analysis that are relevant to our choice of a Bayesian approach. First is that Bayesian methods generally provide a more coherent basis for working with data from non-repeatable events such as an archaeological excavation (compared to, for example agricultural field trials, cf. Fisher, 1921). Bayesian methods use the data at hand to produce posterior probability statements about distributions of parameters and hypotheses (Puga et al., 2015). NHST procedures do the reverse, computing the probability of an event as a point value indicating its relative frequency of occurrence in an infinite sequence of repeated experiments. In this way, NHST methods use a null hypothesis to assess the plausibility of the observed data (and more ‘extreme’ data sets that were not observed but might have been with additional sampling or experiments), with another step of reasoning required to either reject or fail to reject the null hypothesis (Jackman, 2009). This means that Bayesian inference is based on the actual occurring data, not all possible data sets that might have occurred in an infinite number of hypothetical repetitions of the study (Bolstad, 2007). We consider Bayesian inference to be a conceptually simpler and more direct approach where we compute the probability of our hypotheses, not our data, as the NHST framework does.

Although Bayesian methods have intuitive appeal, they have been criticized as subjective and arbitrary because typically the analyst must choose a prior distribution over the unknown parameters (e.g. mean, standard deviation, etc.) of the model to capture their beliefs about the situation before working with the data. Once the data are collected, Bayes’ rule is used to combine the prior distribution with the data to compute a posterior distribution of the unknown parameters. Most current applications formalize the choice of prior distributions by either following expert recommendations, estimating the distribution from the observed data (known as an empirical Bayesian approach, Carlin and Louis, 2011), using pre-existing data sets to generate priors (Anholt et al., 2000; McCarthy and Masters, 2005; Carlin and Louis, 2011), or using uninformative or weakly informative prior distributions (such as the uniform distribution) that do not give strong prior probability to any hypotheses about the data (Burnham and Anderson, 2002; Efron, 2013). We describe our choice of prior and likelihood functions in the methods section below. We also have made freely available all the raw data and R programming code used to compute the analyses presented here, to facilitate close inspection and reuse of our quantitative methods.

Bayesian one-way ANOVA and Bayesian Poisson exponential ANOVA

We used two types of Bayesian methods, a one-way ANOVA for comparing groups of metric data (such as artefact measurements over different phases of occupation, where an ANOVA is the common NHST), and a Poisson exponential ANOVA for contingency table analysis (such as artefact counts, where a chi-square is the common NHST). Our supplementary information includes further discussion of the motivation for using a Bayesian approach, as well as an R package that makes these methods available for use with other datasets. For an extended technical description and graphical models of these methods, see Krushcke (2011). Our Bayesian one-way ANOVA takes the metric predicted variable (e.g. artefact length or mass) and considers how it is deflected by each of the nominally scaled predictor variables (e.g. depositional phase). These deflection parameters are the primary interest, and following Gelman (2005, 2006) we apply a hierarchical (or multilevel) model with a folded-t prior distribution (because it does not have infinite density near zero, so it behaves well when group-level variance is at or near zero) on the standard deviation, a uniform (or flat) distribution on the variance between levels (as an uninformative prior) and a fat-tailed normal distribution on the likelihood (to better accommodate outliers in the data). The parameters of these distributions are derived from the observed data. The combination of the specific types of prior distributions used here, and that their parameters come from the observed data, mean that this Bayesian ANOVA uses weakly informative priors that are not intended to represent our actual prior state of knowledge (which is small in this case) but rather to constrain the posterior distribution, to an extent allowed by the data (Gelman 2006).

For the contingency table analysis, a Poisson likelihood distribution is used as an exponential link function from an underlying ANOVA model. The term ‘ANOVA’ does not imply groups of a nominal measurement variable in our observed data (which are counts in this case). Instead it refers to the general ANOVA-like approach of comparing the distributions of cell frequencies that we produce when predicting the cell frequencies. Where the chi-square computes a single estimated value per cell of the table, the Bayesian approach generates a distribution of values for each cell of the table. Comparing these distributions is analogous to two-way ANOVA, and investigating the relationships within and between variables in the table is analogous to main effects and interaction contrasts in ANOVA. The Poisson distribution is well-suited as the likelihood function for count data because it returns only non-negative integer values and is widely used to model discrete occurrences in time or across space (Sadiku and Tofighi, 1999; Jackman, 2009). This method also uses the folded-t prior, like the one-way ANOVA above, and thus has a weakly informative prior that reflects our low degree of prior knowledge about these assemblages.

To obtain a result from these tests we computed the posterior distributions, or evaluated the outcome of combining the data with our beliefs about the processes that produced the data (weakly held, in this case, so that the data contribute much more information than the prior about the parameters of the posterior). The general process of obtaining the posterior distributions is to combine, or integrate, the prior and likelihood distributions. We used Markov chain Monte Carlo (MCMC) methods to approximate the integral by taking a very large number of independent samples from the distributions to approximate the distribution of the product of the prior and the likelihood. From this approximated distribution we computed descriptive statistics of the posterior distribution to interpret the substantive significance of the output.

Our results from these Bayesian tests are reported here as highest probability density intervals (HDIs), with more extensive output presented in the SOM. The HDI is the range of values that contain 95% of the values in the posterior distribution produced by the MCMC sampling. Our supplementary materials also includes the equivalent null-hypothesis-significance-test for direct comparison by readers who are unfamiliar with Bayesian tests. The SOM also includes the raw data used for all tests so that others may reuse the measurements in their own tests and combine with other datasets.

References

Anholt, B.R., Werner, E., Skelly, D.K., 2000. Effect of food and predators on the activity of four larval ranid frogs. Eco. 81, 3509-3521.

Bandyopadhyay, P., Forster, M., 2009. Introduction to the Philosophy of Statistics, Handbook for the Philosophy of Science: Philosophy of Statistics. Elsevier, Amsterdam, pp. 1-50.

Bolstad, W.M., 2007. Introduction to Bayesian statistics. John Wiley & Sons, New York.

Buck, C., Litton, C., 1990a. A computational Bayes approach to some common archaeological problems, Proceedings of the 2002 Computer Applications in Archaeology Conference, http://proceedings.caaconference.org/files/1990/15_Buck_Litton_CAA_1990.pdf.

Buck, C.E., Litton, C., 1990b. A computational Bayes approach to some common archaeological problems. Comp. Apps. and Quant. Meth. in Arch., 93-99.

Buck, C.E., Cavanagh, W.G., Litton, C.D., 1996. Bayesian Approach to Interpreting Archaeological data. Wiley Chichester England.

Burnham, K.P., Anderson, D.R., 2002. Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach. Springer-Verlag, New York.

Carlin, B.P., Louis, T.A., 2011. Bayesian Methods for Data Analysis. CRC Press.

Carver, R.P., 1993. The case against statistical significance testing, revisited. J. Exper. Edu. 61, 287-292.

Cohen, J., 1994. The earth is round (p <. 05). Am. Psych. 49, 997-1003.

Dellaportas, P., 1998. Bayesian classification of neolithic tools. Appl. Stat. 47, 279-297.

Efron, B., 2013. Bayes’ theorem in the 21st century. Science 340, 1177-1178.

Fidler, F., Geoff, C., Mark, B., Neil, T., 2004. Statistical reform in medicine, psychology and ecology. The J. of Socio-Economics 33, 615-630.

Fisher, R.A., 1921. Studies in crop variation. I. An examination of the yield of dressed grain from Broadbalk. The Journal of Ag. Sci. 11, 107-135.

Gelman, A. (2005). Analysis of variance—why it is more important than ever. The Annals of Statistics 33(1), 1-53.

Gelman, A. (2006). "Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper)." Bayesian Anal. 1(3), 515-534.

Gill, J., 1999. The insignificance of null hypothesis significance testing. Pol. Res. Qtly 52, 647-674.

Gliner, J.A., Leech, N.L., Morgan, G.A., 2002. Problems with null hypothesis significance testing (NHST): what do the textbooks say? The J. of Exp. Edu. 71, 83-92.

Gowland, R.L., Chamberlain, A.T., 2002. A Bayesian approach to ageing perinatal skeletal material from archaeological sites: implications for the evidence for infanticide in Roman-Britain. J. Archaeol. Sci. 29, 677-685.

Guttman, L., 1985. The illogic of statistical inference for cumulative science. Appl. Stoch. Mod. and Data Anal. 1, 3-9.

Halekoh, U., Vach, W., 2004. A Bayesian approach to seriation problems in archaeology. Comp.Stats. & Data Anal. 45, 651-673.

Halsey, L.G., Curran-Everett, D., Vowler, S.L., Drummond, G.B., 2015. The fickle P value generates irreproducible results. Nature Methods 12, 179-185.

Jackman, S., 2009. Bayesian Analysis for the Social Sciences. John Wiley & Sons, Melbourne.

Johnson, D.H., 1999. The insignificance of statistical significance testing. The J. of Wildlife Mngmt 63, 763-772.

Krushcke, J.K., 2011. Doing Bayesian Data Analysis: A Tutorial with R and BUGS. Academic Press, New York.

Lambdin, C., 2012. Significance tests as sorcery: Science is empirical—significance tests are not. Theory & Psychology 22, 67-90.

Loftus, G.R., 1996. Psychology will be a much better science when we change the way we analyze data. Curr. Dir. in Psych. Sci. 5, 161-171.

McCarthy, M.A., Masters, P.I.P., 2005. Profiting from prior information in Bayesian analyses of ecological data. J. of Appl. Eco. 42, 1012-1019.

Nickerson, R.S., 2000. Null hypothesis significance testing: A review of an old and continuing controversy. Psych. Meth. 5, 241-301.

Nix, T.W., Barnette, J.J., 1998. The data analysis dilemma: Ban or abandon. A review of null hypothesis significance testing. Research in the Schools 5, 3-14.

Puga, J.L., Krzywinski, M., Altman, N., 2015. Points of significance: Bayesian statistics. Nature Methods 12, 377-378.

Ramsey, C.B., 2009. Bayesian analysis of radiocarbon dates. Radiocarbon 51, 337–360.

Reimer, P.J., Bard, E., Bayliss, A., Beck, J.W., Blackwell, P.G., Ramsey, C.B., Buck, C.E., Cheng, H., Edwards, R.L., Friedrich, M., 2013. IntCal13 and Marine13 radiocarbon age calibration curves 0–50,000 years cal BP. Radiocarbon 55, 1869-1887.

Sadiku, M.N.O. Tofighi, M.R. 1999. A tutorial on simulation of queueing models. Intl. J. of Elect. Engin. Edu. 36, 102-120.

Schmidt, F.L., 1996. Statistical significance testing and cumulative knowledge in psychology: Implications for training of researchers. Psych. Meth. 1, 115.

Shrout, P.E., 1997. Should significance tests be banned? Introduction to a special section exploring the pros and cons. Psych. Sci. 8, 1-2.

Stephens, P.A., Buskirk, S.W., del Rio, C.M., 2007. Inference in ecology and evolution. Trends in Eco. & Evo. 22.4 (2007): 192-197.

Trafimow, D., Marks, M., 2015. Editorial. Basic and Applied Social Psychology 37, 1-2.

Wagenmakers, E.-J., 2007. A practical solution to the pervasive problems of p values. Psycho. Bull. & Rev. 14, 779-804.

Colophon

This report was generated on r Sys.time() using the following computational environment and dependencies:

# which R packages and versions?
sessionInfo()

# what commit is this file at?
library(git2r)
repo <- repository(path = "../")
last_commit <- commits(repo)[[1]]

The current git commit of this file is r last_commit@sha, which is on the r branches(repo)[[1]]@name branch and was made by r last_commit@committer@name on r when(last_commit). The current commit message is "r last_commit@summary".



benmarwick/Pleistocene-aged-stone-artefacts-from-Jerimalai--East-Timor documentation built on May 12, 2019, 1:01 p.m.