make_hocr: Make hOCR file

View source: R/xml.R

make_hocrR Documentation

Make hOCR file

Description

Creates a hOCR file from Document AI output.

Usage

make_hocr(type, output, outfile_name = "out.hocr", dir = getwd())

Arguments

type

one of "sync" or "async" depending on the function used to process the original document.

output

either a HTTP response object (from dai_sync()) or the path to a JSON file (from dai_async).

outfile_name

a string with the desired filename. Must end with either .hocr, .html, or .xml.

dir

a string with the path to the desired output directory.

Details

hOCR is an open standard of data representation for formatted text obtained from optical character recognition. It can be used to generate searchable PDFs and many other things. This function generates a file compliant with the official hOCR specification (https://github.com/kba/hocr-spec) complete with token-level confidence scores. It also works with non-latin scripts and right-to-left languages.

Value

no return value, called for side effects.

Examples

## Not run: 
make_hocr(type = "async", output = "output.json")
resp <- dai_sync("file.pdf")
make_hocr(type = "sync", output = resp)
make_hocr(type = "sync", output = resp, outfile_name = "myfile.xml")

## End(Not run)

daiR documentation built on Sept. 8, 2023, 5:43 p.m.