knitr::opts_chunk$set(echo=TRUE,error=TRUE) knitr::opts_chunk$set(comment = "") library("sdam")
options(width = 96)
Install and load a version of "sdam"
package.
install.packages("sdam") # from CRAN devtools::install_github("sdam-au/sdam") # development version devtools::install_github("mplex/cedhar", subdir="pkg/sdam") # legacy version R 3.6.x
# load and check versions library(sdam) packageVersion("sdam")
EDH
is a dataset in "sdam"
that contains the texts of Latin and Latin-Greek inscriptions of the Roman Empire, which have been retrieved from the Epigraphic Database Heidelberg API repository through routines get.edh()
and get.edhw()
. Since the year 2022 and still today, the API repository does not support people variables, and the EDH
dataset serves as an alternative for the analysis of people-related inscriptions.
One challenge with people variables in EDH
is that some records contain characters in Greek and Latin extended that need re-encoding for a proper rendering and display.
people
in EDH
Ancient inscriptions in some Roman provinces have Greek characters written and, due to encoding and decoding steps in the process of extraction, loading, and transformation of the data (perhaps Treating UTF-8 Bytes as Windows-1252?), Greek and other Latin characters are not displayed properly with the actual version of the EDH
dataset. Most of the encoding issues are in variables related to people, and some examples with inscriptions in Roman provinces are next.
The Roman province of Achaia in the EDH
dataset has inscriptions related to people.
plot.map("Ach", cap=TRUE, name=FALSE)
Function edhw()
is to obtain the available inscriptions per province in the EDH dataset, which is a list that is the input for the same function to extract people
variables cognomen and nomen. In this case, the 'province'
argument is Ach
that stands for Achaia
.
# select two people variables from Achaia Ach <- edhw(province="Ach") |> edhw(vars="people", select=c("cognomen","nomen"))
Ach <- edhw(province="Ach") |> suppressWarnings() |> edhw(vars="people", select=c("cognomen","nomen")) |> suppressWarnings()
There are 1539 records with people in Ach
that corresponds to the number of rows in this data frame.
# number of people entries in Achaia nrow(Ach)
However, some records have either missing data or are inscriptions where cognomen and nomen are not available.
# also remove NAs Ach <- edhw(province="Ach") |> edhw(vars="people", select=c("cognomen","nomen"), na.rm=TRUE) nrow(Ach)
Ach <- edhw(province="Ach") |> suppressWarnings() |> edhw(vars="people", select=c("cognomen","nomen"), na.rm=TRUE) |> suppressWarnings() nrow(Ach)
Treating with people
attribute variables requires many times re-encoding that is one option in function cln()
.
For instance, values in cognomen in the first entries of Ach
are likely in Greek.
# some people entries in Achaia head(Ach)
Function cln()
serves to re-encode Greek and Latin characters to render Greek, Greek extended, and Latin extended glyphs.
# re-encode in Ach cognomen Ach$cognomen |> head() |> cln()
cognomen
knitr::asis_output(cln(head(Ach$cognomen,6))[1])
knitr::asis_output(cln(head(Ach$cognomen,6))[2])
knitr::asis_output(cln(head(Ach$cognomen,6))[3])
knitr::asis_output(cln(head(Ach$cognomen,6))[4])
knitr::asis_output(cln(head(Ach$cognomen,6))[5])
knitr::asis_output(cln(head(Ach$cognomen,6))[6])
detach("package:sdam", unload=TRUE) sdam::cln(tail(Ach))
For cognomen in the last people entries in Achaia
.
# last entries tail(Ach)
After re-encoding the last records in Ach
with cln()
, it is easier to see, for example, that some have identical cognomen where entries having <NA>
in the input become NA
.
In the case of the province of Aegyptus, three people variables have a mixing og Greek and Latin characters scripted that need re-codification as well.
plot.map("Aeg", cap=TRUE, name=FALSE)
# Aegyptus people Aeg <- edhw(province="Aeg") |> edhw(vars="people")
Aeg <- edhw(province="Aeg") |> suppressWarnings() |> edhw(vars="people") |> suppressWarnings()
# three variables of the last eight records Aeg[ , c(3,5:6)] |> tail(8)
For people in Aegyptus
, columns three, and five to six correspond to cognomen, name, and nomen, where
the output from cln()
in the console is a dataframe.
# re-encode three variables from last entries Aeg[ ,c(3,5:6)] |> tail() |> cln()
Some entries in Aeg
have Greek extended characters, and one entry in Latin has a special character at the end
(Sulpicius*
), which can be omitted for further computations by raising the cleaning level to 2
.
Benefits from re-encoding and cleaning text from the EDH
dataset are evident like when counting occurrences
in the different attribute variables as with nomen
in Aeg
.
# default cleaning level 1 Aeg$nomen |> cln() |> table() |> sort(decreasing=TRUE)
knitr::asis_output(cln(sort(unique(Aeg$nomen)),level=1)[32])
as.vector(sort(table(Aeg$nomen),decreasing=TRUE))[1]
knitr::asis_output(cln(sort(unique(Aeg$nomen)),level=1)[22])
as.vector(sort(table(Aeg$nomen),decreasing=TRUE))[2]
knitr::asis_output(cln(sort(unique(Aeg$nomen)),level=1)[23])
as.vector(sort(table(Aeg$nomen),decreasing=TRUE))[3]
knitr::asis_output(cln(sort(unique(Aeg$nomen)),level=1)[1])
as.vector(sort(table(Aeg$nomen),decreasing=TRUE))[4]
etc.
...
By raising the cleaning level to 2
, all special characters are removed from the end,
and it is possible to see that, in the Roman province of Aegyptus, Sempronius
, Sentius
, Valerius
are the three most common nomen in inscriptions with four occurrences each.
# raise cleaning level and remove NAs Aeg$nomen |> cln(level=2, na.rm=TRUE) |> table() |> sort(decreasing=TRUE)
knitr::asis_output(names(sort(table(cln(x=Aeg$nomen, level=2, na.rm=TRUE)), decreasing=TRUE))[1])
as.vector(sort(table(cln(x=Aeg$nomen, level=2, na.rm=TRUE)), decreasing=TRUE))[1]
knitr::asis_output(names(sort(table(cln(x=Aeg$nomen, level=2, na.rm=TRUE)), decreasing=TRUE))[2])
as.vector(sort(table(cln(x=Aeg$nomen, level=2, na.rm=TRUE)), decreasing=TRUE))[2]
knitr::asis_output(names(sort(table(cln(x=Aeg$nomen, level=2, na.rm=TRUE)), decreasing=TRUE))[3])
as.vector(sort(table(cln(x=Aeg$nomen, level=2, na.rm=TRUE)), decreasing=TRUE))[3]
knitr::asis_output(names(sort(table(cln(x=Aeg$nomen, level=2, na.rm=TRUE)), decreasing=TRUE))[4])
as.vector(sort(table(cln(x=Aeg$nomen, level=2, na.rm=TRUE)), decreasing=TRUE))[4]
etc.
...
See Warnings
section in manual.
"sdam"
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.