paragraphs: Extract text paragraphs and bounding boxes from hocr file.

View source: R/ocr-utils.R

paragraphsR Documentation

Extract text paragraphs and bounding boxes from hocr file.

Description

Contructs a dataframe containing paragraphs and bounding boxes from an hocr file created by tesseract-OCR.

Usage

paragraphs(hocr_files)

Arguments

hocr_files

list of paths to hocr file.

Value

dataframe with columns "bbox1", "bbox2", "bbox3", "bbox4" and "text" for the four corners of the paragraph bounding box and the text content.


OlivierBinette/TessTools documentation built on March 13, 2024, 7:33 p.m.