R/ocr.R

Defines functions image_ocr_data image_ocr

Documented in image_ocr image_ocr_data

#' Image Text OCR
#'
#' Extract text from an image using the [tesseract][tesseract::tesseract] package.
#'
#' To use this function you need to tesseract first:
#'
#' ```
#'   install.packages("tesseract")
#' ```
#'
#' Best results are obtained if you set the correct language in [tesseract][tesseract::tesseract].
#' To install additional languages see instructions in [tesseract_download()][tesseract::tesseract_download].
#'
#' @export
#' @family image
#' @name ocr
#' @rdname ocr
#' @inheritParams editing
#' @param language passed to [tesseract][tesseract::tesseract]. To install additional languages see
#' instructions in [tesseract_download()][tesseract::tesseract_download].
#' @param HOCR if `TRUE` return results as HOCR xml instead of plain text
#' @param ... additional parameters passed to [tesseract][tesseract::tesseract]
#' @examples
#' \donttest{
#' if(require("tesseract")){
#' img <- image_read("http://jeroen.github.io/images/testocr.png")
#' image_ocr(img)
#' image_ocr_data(img)
#' }
#' }
image_ocr <- function(image, language = "eng", HOCR = FALSE, ...){
  assert_image(image)
  tesseract::ocr(image, engine = tesseract::tesseract(language, ...), HOCR = HOCR)
}


#' @export
#' @rdname ocr
image_ocr_data <- function(image, language = "eng", ...){
  assert_image(image)
  tesseract::ocr_data(image, engine = tesseract::tesseract(language, ...))
}

Try the magick package in your browser

Any scripts or data that you put into this service are public.

magick documentation built on Oct. 22, 2023, 5:06 p.m.