knitr::opts_chunk$set(echo=TRUE,error=TRUE) knitr::opts_chunk$set(comment = "") library("sdam")
options(width = 96)
Install and load a version of "sdam"
package.
install.packages("sdam") # from CRAN devtools::install_github("sdam-au/sdam") # development version devtools::install_github("mplex/cedhar", subdir="pkg/sdam") # legacy version R 3.6.x
# load and check versions library(sdam) packageVersion("sdam")
Temporal data is significant when it comes to analysing the history of archaeological
artefacts like written markers from the Ancient Mediterranean.
In the EDH
dataset, for example, dates for inscriptions are plausible timespans of existence
with the endpoints in variables not_before
and not_after
that, from the perspective of the timespan,
are the terminus ante quem (TAQ) and terminus post quem (TPQ) of the time segment.
However, not all inscriptions have these two variables filled by domain experts and replacing missing dating data constitutes a challenge.
Besides EDH
, other datasets with "sdam"
the package and related functions involve dating data in
the ancient Mediterranean like displaying dates and time segments in a plot, by organising dates within Roman provinces,
and by performed imputation techniques for missing dating data.
An example of plotting dates is with the Shipwrecks external dataset, which is a semicolon separated file of different variables.
References for shipwrecks data are in
"sdam"
package When reading the shipwrecks external dataset with read.csv
make sure to use the right separator in sep
and leave untouched the names of the variables.
# load shipwrecks external dataset sw <- system.file("extdata","StraussShipwrecks.csv",package="sdam") |> read.csv(sep=";", check.names=FALSE)
# variables in shipwrecks dataset colnames(sw)
Plot the time segments with function plot.dates()
and a customized 'id'
where variables 15 to 16 in sw
have timespans of existence as 'taq'
and 'tpq'
.
# shipwrecks dates with Wreck ID plot.dates(sw, id="Wreck ID", type="rg", taq="Earliest date", tpq="Latest date", col=4)
The mid points and range of shipwrecks data are explicitly computed by function prex()
with the mp
option in the 'type'
argument.
'vars'
stands for the variables that in this case are TAQ and TPQ, and the 'keep'
option allows maintaining the rest of the variables
in the output that for prex()
with mid points is a data frame.
# add mid points and range to shipwrecks data prex(sw[c(1,7,15:16)], type="mp", vars=c("Earliest date", "Latest date"), keep=TRUE) |> tail()
The default 'type'
option and chronological phase in prex()
are the aoristic sum with a five periods bin or bin5
.
# aoristic sum shipwrecks prex(sw[c(1,7,15:16)], vars=c("Earliest date", "Latest date"))
For an eight chronological periods bin in the shipwrecks dataset
# aoristic sum shipwrecks 8 bin prex(sw[c(1,7,15:16)], vars=c("Earliest date", "Latest date"), cp="bin8")
For aoristic sum algorithm, cf. Temporal Uncertainty.
Many functions and datasets in "sdam"
are related to temporal information of the Roman world,
particularly from the Roman Empire during the classical ancient period.
Function plot.map()
is to depict cartographical maps per Roman province or region, and it has a 'date'
argument to display dates within the caption. Dates in this case are one or two years either for the consolidation of the Italian peninsula or the affiliation of the region to the Roman Empire.
# silhouette of Italian peninsula plot.map(x="Ita", date=TRUE) ## not run
rpmcd
has the shapes and colours used in the cartographical maps with plot.map()
, and some
dates related to provinces as well. # 59 provinces dates, colors, and shapes data("rpmcd") # province acronyms as in EDH names(rpmcd)
The establishment dates of Roman provinces used in the cartographical map captions are in the second
component of rpmcd
.
# pipe dataset for dates in second component rpmcd |> lapply(function (x) x[[2]]) |> head()
A vector of establishment dates in years from the "rpmcd"
dataset is recorded in object est
that
allow making a chronology of the Roman provinces.
# second component in dataset est <- rpmcd |> lapply(function (x) x[[2]]) |> unlist(use.names=FALSE) est
The establishment dates of Roman provinces and regions are in vector est
, and these dates can become
more standard with the function cln()
for further processing.
This is a cleaning function where, for instance, level 9
removes all content after the first parenthesis
in the input while the other levels are for specific needs.
# clean levels are 0-9 cln(est, level=9)
After this transformation of the data in est
, is possible to format dates
as numerical data with function dts()
, which takes the first
value when there are two competing dates in the input; unless the opposite is specified
in the 'last'
argument.
# update object with establishment dates est <- est |> cln(level=9) |> dts()
est
Object est
has a chronology for the establishment dates of Mediterranean regions and territories as
Roman provinces that corresponds to the provinces in "rpmcd"
dataset.
The union of the names of provinces and dates of establishment as a Roman province is a data frame object
rpde
that better displays without the row names.
# Roman province dates of establishement (strings still strings) rpde <- cbind(names(rpmcd),dts(est)) |> as.data.frame(stringsAsFactors=FALSE)
rownames(rpde) <- NULL head(rpde)
Because the dates have a numerical format from function dts()
, the data frame allows producing a chronology of affiliation dates for the provinces and regions to the Roman Empire by ordering the second variable in rpde
.
# order of affiliation of provinces rpde[order(as.numeric(rpde$V2)),1]
The regions in the Italian peninsula have the earliest affiliation dates, and Mesopotamia has the latest affiliation date to the Roman Empire.
"rpcp"
has influence periods of the Roman Empire.# list with 45 early and late influence dates provinces data("rpcp")
# look at data internal structure str(rpcp)
Visualize time intervals of early Roman influence in provinces and regions.
# early influence dates are in first list of 'rpcp' plot.dates(x=rpcp[[1]], taq="EarInf", tpq="OffPrv", main="Early period", ylab="province")
plot.dates(x=rpcp[[1]], taq="EarInf", tpq="OffPrv", main="Early period", ylab="province", yaxt="n")
Time intervals of late Roman influence in provinces and regions depicted with mid points and range interval if longer than one.
# late influence dates are in second list of 'rpcp' plot.dates(x=rpcp[[2]], type="mp", taq="LateInf", tpq="Fall", lwd=5, col="red", main="Late period", ylab="province")
plot.dates(x=rpcp[[2]], type="mp", taq="LateInf", tpq="Fall", lwd=5, col="red", main="Late period", ylab="province", yaxt="n")
rpd
has time intervals for "not_before"
and "not_after"
that corresponds to the dating data
in the EDH
dataset.# Roman provinces dates from EDH data("rpd")
# Rome summary(rpd$Rom)
# Aegyptus summary(rpd$Aeg)
These intervals are the basis for a restricted imputation of missing dating data in EDH
Function edhwpd()
constructs, for a chosen province, a list of data frames with the
components made of its inscriptions related by attribute co-occurrences.
The replacement of missing dates occurs in this setting with function rmids()
that stand for
restricted multiple imputation on data subsets.
An example of restricted multiple imputations is the province of Armenia which has the fewest inscriptions in the EDH
dataset. Dataset rpd
is a list where each component corresponds to a province and where the component class provides the HD
ids
of inscriptions.
# Armenia rpd$Arm
Imputation from similarities of attribute variables per province and dates is organised with wrapper function edhwpd()
having different argument options.
# list with arguments formals(edhwpd)
By default, the input data for this function is the EDH
dataset and the organisation is based on characteristics of the
artefacts in vars
.
# characteristics of inscriptions vars = c("findspot_ancient", "type_of_inscription", "type_of_monument", "language")
Function rmids()
performs the multiple imputation of missing dating data in EDH
by default or in another dataset as input.
In the case of Arm
, record HD015521
has censored data in dates while the other two records have complete missing dating data.
# Armenia: restricted imputation of dates edhwpd(vars=vars, province="Arm") |> rmids()
The warnings tell us that the imputation values are taken from the respective province in the rpd
dataset
where avg len TS
stands for average length of timespan, min TAQ
is the minimum value of not_before
, and
max TPQ
is the maximum value of not_after
.
Since there are multiple imputations of missing dating data, one next step is to combine the data by pooling rules of the m results from function rmids()
into final point estimates plus standard error.
Pooling options for time intervals are take:
avg len TS
min TAQ
and max TPQ
max TAQ
and min TPQ
With these options, there is a single imputed value per variable with implied consequences.
¨
"sdam"
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.