The R
package eratosthenes
aims to provide a coherent foundation for
archaeological chronology-building by incorporating, computationally,
all relevant sources of information on uncertain archaeological or
historical dates. Archaeological dates are often subject to relational
conditions (via seriation or stratigraphic relationships) and absolute
constraints (such as radiocarbon dates, datable artifacts, or other
known historical events, as termini post or ante quos), which prompt
the use of a joint conditional probability density to convey those
relationships. The date of any one event can then be marginalized from
that full, joint conditional distribution, which is achieved using a
two-stage Gibbs sampler to draw estimates uniformly between potential
earliest and latest bounds. Ancillary functions include checking for
discrepancies in sequences of events and constraining optimal seriations
to known sequences.
While software exists for calibrating and conditioning radiocarbon dates
upon relative constraints, such as BCal
(Buck, Christen, and James 1999) and
OxCal (Bronk Ramsey 2009), the
aim of eratosthenes
is to extend the application of probability theory
more generally to dating all archaeological phenomena, especially the
production dates of artifact types. The method of sampling employed in
eratosthenes
involves a two-step process, which entails first
performing iterative routines of Gibbs sampling to determine the initial
value for the main sampler, which uses consistent batch means (CBM) and
Monte Carlo standard errors (MCSE) to determine convergence (Flegal,
Haran, and Jones 2008). Furthermore, eratosthenes
provides tools for
analyzing the impact of events on each other with the conditional
structure stipulated by the investigator, by implementing a
jackknife-style estimator of squared displacement (how much the date of
one event shifts when another is omitted, squared). Rcpp
is required
for faster Gibbs sampling.
The package is named after Eratosthenes of Cyrene, author of the Chronographiai.
To obtain the current development version of eratosthenes
from GitHub,
install the package in the R
command line with devtools
:
library(devtools)
install_github("scollinselliott/eratosthenes", dependencies = TRUE, build_vignettes = TRUE)
The following comments are intended as a general introduction. See vignettes for more information on package functionality.
The basic objects of interest in eratosthenes
are:
Information related to these three items must be formatted in objects of
a list
class, as follows.
Relative sequences should run in order from left (earliest) to right
(latest). All sequences should consist of vectors, contained in a
list
. In the following example, the object contexts
contains three
sequences of events.
x <- c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J")
y <- c("B", "D", "G", "H", "K")
z <- c("F", "K", "L", "M")
contexts <- list(x, y, z)
See also the section Evaluating Sequences below.
Finds should be formatted as a list
of lists
, each of which contains
the entries of the following:
id
: a unique identification number or codeassoc
: an element in the sequences list
to which that find or
element pertainstype
: optional – one or more types, attributes, features, or
aspects that pertain to that find (NULL
if none)In the following example, the artifacts
object contains six artifacts
which pertain to elements of the sequences contained in contexts
above”
f1 <- list(id = "find01", assoc = "D", type = c("type1", "form1"))
f2 <- list(id = "find02", assoc = "E", type = c("type1", "form2"))
f3 <- list(id = "find03", assoc = "G", type = c("type1", "form1"))
f4 <- list(id = "find04", assoc = "H", type = c("type2", "form1"))
f5 <- list(id = "find05", assoc = "I", type = "type2")
f6 <- list(id = "find06", assoc = "H", type = NULL)
artifacts <- list(f1, f2, f3, f4, f5, f6)
Constraints should be given as two separate lists
, one for termini
post quos and the other for termini ante quos. These take the same
form as the finds object, as a list
of lists
, with the same
headings, but include one additional heading of samples
which contains
the absolute dates pertinent to that t.p.q. or t.a.q.
coin1 <- list(id = "coin1", assoc = "B", type = NULL, samples = runif(100, -320, -300))
coin2 <- list(id = "coin2", assoc = "G", type = NULL, samples = runif(100, 37, 41))
destr <- list(id = "destr", assoc = "J", type = NULL, samples = 79)
tpq_info <- list(coin1, coin2)
taq_info <- list(destr)
Absolute dates can take any form:
79
for 79 CE.-91:-88
,
seq(-91, -88, length = 10^5)
, or runif(10^5, -91, -88)
for 91-88
BCE.eratosthenes
does not provide functionality for calibrating
dates, which can be accomplished using preexisting software or
directly from a calibration curve. The R
package Bchron
(Haslett
and Parnell 2008) provides functions for calibrating dates. As a brief
example, given an uncalibrated date and its standard deviation, a
crude sample of calibrated dates can be drawn from the IntCal20 curve
data, available from IntCal
here (Reimer et al.
2020), using the following script:intcal20 <- read.csv("../path/to/intcal20.14c")
# 14c date mean and st.dev.
mu <- 2040
sigma <- 30
# samples of 14c dates
uncalib <- round(rnorm(10^5, mu, sigma))
calib <- c()
for (i in 1:length(uncalib)) {
x <- intcal20$CAL.BP[ intcal20$X14C.age == uncalib[i] ]
#g <- intcal20$Sigma[ intcal20$X14C.age == uncalib[i] ]
if (length(x) > 0) {
for (j in 1:length(x)) {
calib <- c(calib, x[j])
}
}
}
# samples of cal BC date
calBC <- 1950 - calib
hist(calBC, breaks = 100)
The core approach of eratosthenes
is a Gibbs sampler, a common Markov
Chain Monte Carlo (MCMC) technique used for dating archaeological
events, above all radiocarbon dates (Geman and Geman 1984; Buck,
Cavanagh, and Litton 1996; Bronk Ramsey 2009). Gibbs sampling however
can take a number of different forms, and so it is worthwhile to
describe explicitly how it is conducted in eratosthenes
. The precise
method is as follows:
There are two functions in eratosthenes
for estimating dates:
gibbs_ad()
is the primary function, which will yield samples for
dates of deposition, production, and any absolute constraints
themselves (that is, the density of that extrinsic date as impacted by
all other events in the joint distribution).gibbs_ad_use()
is the secondary function, for estimating the date of
use of a find given its production and depositional date.See the section Evaluating Displacement below for tools on assessing the effective influence of events upon each other within the joint conditional density.
The function gibbs_ad()
takes as inputs the following objects:
sequences
: A list
of relative sequences of contexts or events.finds
: A list
of any elements which belong to a context or event,
which may be assigned a given type.max_samples
: The maximum number of samples to run, which will stop
the main sampling routine even if convergence has not been achieved
(default is 10^5
).size
: How many samples to take between each check for convergence
(default is 10^3
).mcse_crit
: The criterion of the mean MCSE at which to stop the
sampler (default is 0.5
)tpq
and taq
: Separate lists
that indicate any elements that
provide extrinsic (i.e., absolute) chronological information, as
termini post and ante quos.alpha_
and omega_
: lowest and highest bounds within which to
sample.trim
: whether to remove contexts from the output that are before or
after user-provided t.p.q. and t.a.q. (i.e., those which depend on
alpha_
and omega_
).rule
: the rule for determining the earliest date of production of an
artifact type. Initial threshold boundaries are first established
between the earliest depositional context containing an artifact of
that type and the next earliest context which lacks it. Then, the
following rules will sample a date accordingly:naive
: samples are drawn between the initial threshold sample and
the depositional date of that artifactearliest
: samples are drawn within the initial threshold
boundariesFor example, to sample from the sequences, finds, and constraints given
above, the following inputs are entered into the gibbs_ad()
function:
result <- gibbs_ad(contexts, finds = artifacts, tpq = tpq_info, taq = taq_info)
The output is a list
object of class marginals
containing the
following objects:
deposition
: a list
of the marginal densities of the date of the
final deposition of contexts.externals
: a list
of the marginal densities of date of any
terminus post quem or terminus ante quem, as affected by
depositional variates in the joint conditional distribution.production
: a list
of the marginal densities of the production
date of a given type or class of artifact, given the rule stipulated
in the input.mcse
: a vector of the MCSE of all events.Information on the marginals
object can be accessed with print()
and
summary()
. Density plots and density histograms of one more events can
be produced using plot()
and hist()
respectively (see documentation
for details).
Determining the use date proceeds along the same method of Gibbs
sampling discussed above, using consistent batch means to determine
convergence. To compute a density of the use date of an artifact or
artifact type, an object of class marginals
is necessary (i.e., the
estimation of the production and depositional dates must first be
performed). Then, the use_dates()
function will return a density
conditional on the production and depositional dates.
Only one type at a time can be estimated with use_dates()
. A type can
be defined on the basis of:
id
in the finds list.type
in the finds list.That is, one can pool together multiple finds as a type on the basis of
their id
, even if they were not so explicitly given a type
in the
finds object. Similarly, one can pool together more than one type of
artifact, e.g., if one is dealing with multiple subtypes and one wants
to evaluate them as a single type (e.g., pooling the labels of “Late
Greco-Italic amphora”, “MGS V amphora”, “MGS VI amphora” into a single
type).
The gibbs_ad_use()
function, takes the following inputs, similar to
gibbs_ad()
, but with a field for either id
or type
(only one or
the other field must be used):
gibbs
: A list
object of class marginals
, as computed via
gibbs_ad()
.finds
: Either the list
object of finds originally used as input
to produce gibbs
, or a data.frame
of two columns, the first column
listing the context and the second the incidence of the id or type in
that context.id
: A vector of the id
of one or more specific finds whose use
date is to be estimated. The values of id
must match those in the
list
of finds
. If type
is used, id
is ignored.type
: A vector of one or more types to estimate a use density for.
Must contain a value if id
is left as NULL
.max_samples
, size
, mcse_crit
: The same information used for
determining the maximum length of the Gibbs sampler and when
convergence has been achieved.Using the result
object above, the densities of the use dates of the
following types is computed using the use_dates()
function as follows:
# use dates by specifying ids
gibbs_ad_use(result, artifacts, id = c("find04", "find05"))
# use dates by speciifying types
gibbs_ad_use(result, artifacts, type = "type1")
Adjusting the values of max_samples
and mcse_crit
is recommended to
reduce computational time.
The result is an object of the class use_marginals
, which contains
information on the density of the date of use as well as the MCSE of the
type specified, in the same fashion as the result of the gibbs_ad()
function.
Base R graphics are provided by eratosthenes
to examine traceplots of
the results of gibbs_ad()
, as well as produce density histograms of
the results of gibbs_ad()
and gibbs_ad_use()
. For gibbs_ad()
,
histograms may contain up to 12 distinct events. For gibbs_ad_use()
,
the production, use, and deposition of the stipulated artifact type are
shown.
Managing and evaluating the validity of relative sequences consists of checking multiple partial sequences against one another. Not all relative sequences are of the same informational validity, and not all sequences will contain the same elements. An investigator may seek to constrain one sequence against another, i.e., keeping elements of sequence as close as possible to one another while reordering only some of the elements.
Some functions related to relative sequences:
seq_check()
sees whether partial sequences agree in their relative
ordering of elements.seq_adj()
provides the means to coerce an “input” sequence to a
discrepant “target” sequence which contains fewer elements. E.g., if
one has obtained an optimal seriation of contexts (of both single,
unrelated deposits and stratigraphic deposits) as determined by the
presence/absence of find-types, which conflicts with a sequence
obtained from a stratigraphic sequence whose physical relationships
are certain, this function will reorder the optimal seriation, fitting
any single deposits missing from the stratigraphic sequence
accordingly.The package eratosthenes
does not have functionality to produce
seriations or ordinations, as packages seriation
, vegan
, and
lakhesis
can perform this task already.
As real-world joint conditional densities will comprise hundreds of
events or more, it is easy for an investigator to loose track of which
relative/absolute events are determinative or influential upon others,
in terms of the estimation of their date. eratosthenes
assesses such
influence within the conditional structure via the estimation of
“displacement.” That is, given the omission of an event $j$ (either a
depositional event or an absolute constraint) from the set of all
events, how much does the estimation of the date of another event
change?
The squared displacement $\delta^2(i,j)$ of a target event $i$ caused by the omission of $j$ is computed as follows. Let $x_i$ be the estimated marginalized Monte Carlo mean date using all events within the full joint conditional, and then let $\tilde{x}_i^{(-j)}$ be the “jackknife” estimated date, when event $j$ has been omitted from all sequences and absolute constraints. Squared displacement of $j$ upon $i$ is then:
$$ \delta^2(i,j) = (\tilde{x}_i^{(-j)} - x_i)^2 $$
If squared displacement is high, then the omission of $j$ has greatly shifted the date of $i$. If squared displacement is low, then the omission of $j$ has not altered the date of $i$ much. Squared displacement is measured in continuous time, whichever scale the investigator is using (typically years).
Conversely, one can estimate the effective influence of an event $j$ upon all others by taking the mean squared displacement (MSD). This involves taking the mean of the squared displacements of all other events when $j$ is omitted. Where $\Theta$ represents the set of all relative and absolute events, the MSD is defined as
$$ \text{MSD}(j) = \frac{1}{n-1} \sum_{i \in \Theta, i \neq j} \delta^2 (i,j) $$
The squared displacement and MSD are computed in eratosthenes
after
running the gibbs_ad()
function, as follows. Note that squared
displacement may be computed for any event $i$ that represents a
relative or absolute constraint, as well as an artifact production date,
while $j$ can only be a relative event or absolute constraint (it would
make no sense to omit an artifact production date, since these are
conditional upon relative/absolute dates to begin with). Similarly, MSD
can only be computed for relative/absolute events.
Objects in the example below are provided from the section
Usage above. As these routines are fairly intensive,
computational time can be reduced by lowering the values of
max_samples
and/or raising mcse_crit
.
# run gibbs_ad() first
result <- gibbs_ad(contexts, finds = artifacts, tpq = tpq_info, taq = taq_info)
# squared displacement is estimated for a target event ("j" above) and all other events
# squared displacement for depositional context "E"
sq_disp(result, target = "E", sequences = contexts,
max_samples = 20000, mcse_crit = 2, tpq = tpq_info, taq = taq_info)
# squared displacement for production of artifact type "type1"
sq_disp(result, target = "type1", sequences = contexts, finds = artifacts,
max_samples = 20000, mcse_crit = 2, tpq = tpq_info, taq = taq_info)
# mean squared displacement (MSD) is estimated for all relative and absolute dates
result_msd <- msd(result, contexts, finds = artifacts,
mcse_crit = 1, tpq = tpq_info, taq = taq_info)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.