apply_ocr: Apply tesseract::ocr_data() and clean result.

View source: R/prepare_dataset.R

apply_ocrR Documentation

Apply tesseract::ocr_data() and clean result.

Description

Apply tesseract::ocr_data() and clean result.

Usage

apply_ocr(image)

Arguments

image

file path, url, or raw vector to image (png, tiff, jpeg, etc)

Value

a data.frame of words and associated bounding-box

Examples

# good quality scan
image <- system.file("2106.11539_1.png", package = "docformer")
df <- apply_ocr(image)
# poor quality scan
library(magick)
df <- image %>% image_read() %>%
   image_resize("2000x") %>%
   image_trim(fuzz = 40) %>%
   image_write(format = 'png', density = "300x300") %>%
   apply_ocr()

cregouby/docformer documentation built on May 27, 2023, 11:19 p.m.