knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" ) options(width = "100") require(unstruwwel) require(magrittr)
This R package provides means to detect and parse historic dates, e.g., to ISO 8601:2-2019. It automatically converts language-specific verbal information, e.g., “circa 1st half of the 19th century,” into its standardized numerical counterparts, e.g., “1801-01-01\~/1850-12-31\~.” The package follows the recommendations of the MIDAS (Marburger Informations-, Dokumentations- und Administrations-System), see, e.g., https://doi.org/10.11588/artdok.00003770. It internally uses lubridate. The name of the package is inspired by Heinrich Hoffmann’s rhymed story “Struwwelpeter”, which goes as follows:
Just look at him! there he stands, with his nasty hair and hands. See! his nails are never cut; they are grimed as black as soot; and the sloven, I declare, never once has combed his hair; anything to me is sweeter than to see Shock-headed Peter.
For the German-language original text, see the online digital library Wikisource.
You can install the released version of unstruwwel from CRAN with:
install.packages("unstruwwel")
To install the development version from GitHub use:
# install.packages("devtools") devtools::install_github("stefanieschneider/unstruwwel")
The unstruwwel package contains only one function, unstruwwel()
, that does all the magic language-specific standardization. unstruwwel()
returns a named list, where each element is the result of applying the function to the corresponding element in the input vector.
dates <- c( "5th century b.c.", "unknown", "late 16th century", "mid-12th century", "mid-1880s", "June 1963", "August 11, 1958", "ca. 1920", "before 1856" ) # returns valid ISO 8601:2-2019 dates unlist(unstruwwel(dates, "en", scheme = "iso-format"), use.names = FALSE) # returns a numerical interval of length 2 unstruwwel(dates, language = "en", scheme = "time-span") %>% tibble::as_tibble() %>% dplyr::mutate(id = dplyr::row_number()) %>% tidyr::gather(key = id) %>% tidyr::unnest_wider(value, names_sep = "_") %>% dplyr::rename_all(dplyr::funs(c("text", "start", "end")))
dates <- c( "letztes Drittel 15. und 1. Hälfte 16. Jahrhundert", "undatiert", "1460?", "wohl nach 1923", "spätestens 1750er Jahre", "1897 (Guss vmtl. vor 1906)" ) # returns valid ISO 8601:2-2019 dates unlist(unstruwwel(dates, "de", scheme = "iso-format"), use.names = FALSE) # returns a numerical interval of length 2 unstruwwel(dates, language = "de", scheme = "time-span") %>% tibble::as_tibble() %>% dplyr::mutate(id = dplyr::row_number()) %>% tidyr::gather(key = id) %>% tidyr::unnest_wider(value, names_sep = "_") %>% dplyr::rename_all(dplyr::funs(c("text", "start", "end")))
Please report issues, feature requests, and questions to the GitHub issue tracker. We have a Contributor Code of Conduct. By participating in unstruwwel you agree to abide by its terms.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.