View source: R/prepare_dataset.R
apply_ocr | R Documentation |
Apply tesseract::ocr_data() and clean result.
apply_ocr(image)
image |
file path, url, or raw vector to image (png, tiff, jpeg, etc) |
a data.frame of words and associated bounding-box
# good quality scan
image <- system.file("2106.11539_1.png", package = "docformer")
df <- apply_ocr(image)
# poor quality scan
library(magick)
df <- image %>% image_read() %>%
image_resize("2000x") %>%
image_trim(fuzz = 40) %>%
image_write(format = 'png', density = "300x300") %>%
apply_ocr()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.