hocr_parse: Parse hOCR file into a tibble

Description Usage Arguments Value Examples

View source: R/hocr.R

Description

Parse hOCR file into a tibble

Usage

1

Arguments

x

XHTML output from OCR algorithm in hOCR format (see https://en.wikipedia.org/wiki/HOCR for details)

Value

tibble with one word per line and columns describing lines, paragraphs, content areas and pages

Examples

1
2
3
4
5
6
## Not run: 
library(tesseract)
ocr("file.png", HOCR=TRUE) %>%
  tidy_hocr()

## End(Not run)

dmi3kno/hocr documentation built on April 27, 2020, 10:39 a.m.