getAvailableLanguages: Obtain a List of Languages Supported by Tesseract...

View source: R/ext.R

getAvailableLanguagesR Documentation

Obtain a List of Languages Supported by Tesseract Installation

Description

This function returns a list of all the languages the local tesseract installation supports. Any of these values can be passed via the lang parameter in the call to tesseract to specify the language of the content of a document, allowing tesseract to use the appropriate alphabet and trained model for that language. One can install any subset of the available trained languages. This function therefore allows us to programmatically query whether a language is supported in this installation of tesseract.

Usage

getAvailableLanguages(api = tesseract())

Arguments

api

a Tesseract API object. This can be omitted as a default instance will be created.

Details

The supported languages are computed via the Tesseract API. This looks in the directory identified by the TESSDATA_PREFIX environment variable and reports files name <lan>.traineddata.

Value

A character vector.

Author(s)

Duncan Temple Lang

References

Tesseract https://code.google.com/p/tesseract-ocr/, specifically http://zdenop.github.io/tesseract-doc/classtesseract_1_1_tess_base_a_p_i.html

See Also

tesseract

Examples

getAvailableLanguages()

duncantl/Rtesseract documentation built on March 25, 2022, 5:50 a.m.