Transkribus: Transkribus API
In DIGI-VUB/madoc.utils: Tools for interacting with Madoc

Transkribus

R Documentation

Transkribus API

Description

Connect to Transkribus, inspect collections, documents, pages, perform handwritten text recognition

Public fields

JSESSIONID: character string with the JSESSIONID to use once logged in

Methods

Method `new()`

Usage

Transkribus$new(
  url = "https://transkribus.eu/TrpServer/rest/auth/login",
  user,
  password
)

Arguments

url: character string with the url to use in the call to the Transkribus API
user: character string with your Transkribus user in order to connect
password: character string with your Transkribus password in order to connect

Method `map_name_to_id()`

Map labels to identifiers

Usage

Transkribus$map_name_to_id(
  name,
  type = c("collection", "document", "htr-model"),
  collection
)

Arguments

name: character string with the label to map to the identifier
type: type of mapping: either one of 'collection' of 'document'
collection: id of the collection

Method `list_collections()`

List all collections you have access to

Usage

Transkribus$list_collections(
  url = "https://transkribus.eu/TrpServer/rest/collections/list"
)

Arguments

url: character string with the url to use in the call to the Transkribus API

Method `list_collection()`

List the content (the documents) of a collection

Usage

Transkribus$list_collection(
  url = "https://transkribus.eu/TrpServer/rest/collections/%s/list",
  collection
)

Arguments

url: character string with the url to use in the call to the Transkribus API
collection: id of the collection

Method `create_collection()`

Create a collection

Usage

Transkribus$create_collection(
  url = "https://transkribus.eu/TrpServer/rest/collections/createCollection",
  label
)

Arguments

url: character string with the url to use in the call to the Transkribus API
label: character string with the name of the collection to create

Method `delete_collection()`

Delete a collection

Usage

Transkribus$delete_collection(
  url = "https://transkribus.eu/TrpServer/rest/collections/{collection}",
  collection
)

Arguments

url: character string with the url to use in the call to the Transkribus API
collection: id of the collection

Method `list_document()`

List the content (the pages) of a document

Usage

Transkribus$list_document(
  url = "https://transkribus.eu/TrpServer/rest/collections/%s/%s/fulldoc",
  collection,
  document,
  type = c("pages", "raw")
)

Arguments

url: character string with the url to use in the call to the Transkribus API
collection: id of the collection
document: id of the document
type: character string with the type of extraction, either 'pages' or 'raw'. Defaults to 'pages'

Method `list_dictionaries()`

Retrieve the set of dictionaries containing possible letters as output

Usage

Transkribus$list_dictionaries(
  url = "https://transkribus.eu/TrpServer/rest/recognition/dicts"
)

Arguments

url: character string with the url to use in the call to the Transkribus API

Method `list_models()`

Retrieve all HTR/OCR models you have access to within a collection

Usage

Transkribus$list_models(
  url = "https://transkribus.eu/TrpServer/rest/recognition/%s/list",
  collection
)

Arguments

url: character string with the url to use in the call to the Transkribus API
collection: id of the collection

Method `list_job()`

List all jobs or get the information of one specific job

Usage

Transkribus$list_job(
  url = "https://transkribus.eu/TrpServer/rest/jobs/list",
  job
)

Arguments

url: character string with the url to use in the call to the Transkribus API
job: id of the job

Method `transcribe()`

Transcribe a set of pages with a model

Usage

Transkribus$transcribe(
 
    url = "https://transkribus.eu/TrpServer/rest/recognition/{collection}/{model}/htrCITlab?id={document}&pages={page}&dict={dictionary}",
  collection,
  document,
  page,
  model,
  dictionary
)

Arguments

url: character string with the url to use in the call to the Transkribus API
collection: id of the collection
document: id of the document
page: id of the page to transcribe
model: id of the Transkribus model to use
dictionary: character string with the dictionary (set of letters) to use

Method `upload()`

Upload a set of images in a collection

Usage

Transkribus$upload(
  url = c("https://transkribus.eu/TrpServer/rest/uploads?collId={collection}",
    "https://transkribus.eu/TrpServer/rest/uploads/{uploadId}"),
  collection,
  data,
  document,
  author = "R-API",
  trace = TRUE
)

Arguments

url: character string with the url to use in the call to the Transkribus API
collection: id of the collection
data: a character vector with the full path(s) to the image files on disk
document: the title of the document
author: the author of the document
trace: logical indicating to show progress

Method `layout()`

Perform layout analysis on all pages of a document in a collection

Usage

Transkribus$layout(
  url = "https://transkribus.eu/TrpServer/rest/LA?collId={collection}",
  collection,
  document,
  doBlockSeg = FALSE,
  doLineSeg = TRUE,
  doPolygonToBaseline = FALSE,
  doBaselineToPolygon = FALSE
)

Arguments

url: character string with the url to use in the call to the Transkribus API
collection: id of the collection
document: id of the document
doBlockSeg: if TRUE, existing layout will be deleted, if FALSE keep existing text block regions
doLineSeg: if TRUE, detect lines in text blocks, if FALSE keep existing lines
doPolygonToBaseline: if TRUE, inspect line polygons and add baselines, if FALSE keep existing baselines
doBaselineToPolygon: if TRUE, extrapolate new line polygons from baselines, if FALSE do not extrapolate

Method `clone()`

The objects of this class are cloneable with this method.

Usage

Transkribus$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Examples

library(madoc.utils)
api <- Transkribus$new(user     = "jan.wijffels@vub.ac.be", 
                       password = Sys.getenv("TRANSKRIBUS_PWD"))

## Get pages of a collection
collections   <- api$list_collections()
collections
id_collection <- sample(collections$colId, size = 1)
documents     <- api$list_collection(collection = id_collection)
documents
id_document   <- sample(documents$docId, size = 1)
pages         <- api$list_document(collection = id_collection, document = id_document)
pages

## Create a collection, upload some images to the collection, delete it again
id  <- api$create_collection(label = "example-collection")
img <- c(system.file(package = "madoc.utils", "extdata", "alto-example.jpg"),
         system.file(package = "madoc.utils", "extdata", "example.png"))
api$upload(data = img, collection = id, document = paste("Upload", Sys.time()), author = "R-API")
api$list_collection(collection = id)           
api$delete_collection(collection = id)

## Look at relevant models and dictionaries
dicts    <- api$list_dictionaries()
grep(dicts, pattern = "Dutch", ignore.case = TRUE, value = TRUE)
models   <- api$list_models(collection = id_collection)
str(models)
dutch    <- grep(models$language, pattern = "Dutch", ignore.case = TRUE, value = TRUE)
dutch    <- subset(models, language %in% dutch)
dutch    <- c("Dutch Mountains (18th Century)", "IJsberg", "Dutch Notarial Model 18th Century")
dutch    <- subset(models, name %in% dutch)
dutch    <- subset(models, name %in% "Dutch Mountains (18th Century)" & provider == "CITlabPlus")
str(dutch)
id_model <- dutch$htrId

## Inspect jobs
jobs   <- api$list_job()
jobs


## Not run: 
id  <- api$create_collection(label = "test-collection")
img <- c(system.file(package = "madoc.utils", "extdata", "example.png"),
         system.file(package = "madoc.utils", "extdata", "alto-example.jpg"))
api$upload(data = img, collection = id, document = "Doc with 2 images", author = "R-API")

##
## This section shows how to transcribe using the API
##    >> note that this consumes Transkribus credits

##
## Inspect one image and transcribe it
##
##  - id_model      <- 21683  ## Dutch Mountains HTR+
id_collection <- id
id_document   <- docs$docId
docs          <- api$list_collection(collection = id_collection)
pages         <- api$list_document(collection = id_collection, document = id_document)
page          <- head(pages, n = 1)
id_job        <- api$transcribe(collection = id_collection, document = id_document, 
                                page = 1,
                                model = 21683, 
                                dictionary = "Combined_Dutch_Model_M1.dict")
x             <- read_pagexml(page$page_xml) 

##
## A random document from a collection
##
library(magick)
pages  <- api$list_document(collection = id_collection, document = id_document)
page   <- tail(pages, n = 1)
page
img    <- image_read(page$thumbUrl)
img    <- image_read(page$url)
image_resize(img, "x600")
id_job <- api$transcribe(collection = id_collection, document = id_document, page = page$pageNr, 
                         model = id_model, dictionary = "Combined_Dutch_Model_M1.dict")
api$list_job(job = id_job)   
 
## After the job has finished, we have a Page-XML file which we can read in
pages  <- api$list_document(collection = id_collection, document = id_document)  
page   <- tail(pages, n = 1)
img    <- image_read(page$url)
x      <- read_pagexml(page$page_xml)  
bl     <- image_draw_baselines(img, x = x$baseline, col = "darkgreen", lwd = 4)
image_resize(bl, "x900")
bl    <- image_crop_baselineareas(img, 
                                  x = setNames(x$baseline, x$id), 
                                  textregion = x$points, 
                                  extend = FALSE, overview = FALSE)
bl    <- image_rbind(bl, color = "red", geometry = "2x2")   
image_resize(bl, "x900")

api$delete_collection(collection = id)

## End(Not run)

DIGI-VUB/madoc.utils documentation built on Sept. 14, 2022, 3:03 p.m.

DIGI-VUB/madoc.utils index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

DIGI-VUB/madoc.utils Tools for interacting with Madoc

Transkribus: Transkribus API In DIGI-VUB/madoc.utils: Tools for interacting with Madoc

Transkribus API

Description

Public fields

Methods

Public methods

Method new()

Usage

Arguments

Method map_name_to_id()

Usage

Arguments

Method list_collections()

Usage

Arguments

Method list_collection()

Usage

Arguments

Method create_collection()

Usage

Arguments

Method delete_collection()

Usage

Arguments

Method list_document()

Usage

Arguments

Method list_dictionaries()

Usage

Arguments

Method list_models()

Usage

Arguments

Method list_job()

Usage

Arguments

Method transcribe()

Usage

Arguments

Method upload()

Usage

Arguments

Method layout()

Usage

Arguments

Method clone()

Usage

Arguments

Examples

Related to Transkribus in DIGI-VUB/madoc.utils...

R Package Documentation

Browse R Packages

We want your feedback!

DIGI-VUB/madoc.utils
Tools for interacting with Madoc

Transkribus: Transkribus API
In DIGI-VUB/madoc.utils: Tools for interacting with Madoc

Method `new()`

Method `map_name_to_id()`

Method `list_collections()`

Method `list_collection()`

Method `create_collection()`

Method `delete_collection()`

Method `list_document()`

Method `list_dictionaries()`

Method `list_models()`

Method `list_job()`

Method `transcribe()`

Method `upload()`

Method `layout()`

Method `clone()`