scan_with_hocr: doing a Tesseract scan with HOCR output
In HanOostdijk/HOQCutil: Utilities by Han Oostdijk

scan_with_hocr

R Documentation

doing a Tesseract scan with HOCR output

Description

A Tesseract scan with HOCR output returns an XHTML document with not only the scanned word, but also information about the line where the word is found and the bounding box. The function scan_with_hocr does the scan and converts the document to a data frame. See Details and Acknowledgment.

Usage

scan_with_hocr(
  img,
  confsel = F,
  extract_bbox = T,
  add_header_cols = F,
  engine = tesseract::tesseract("eng")
)

Arguments

`img`	An image object or a character string with the name of an image file
`confsel`	A Boolean indicating if the confidence rate should also be selected
`extract_bbox`	A Boolean indicating if the bounding box should be unpacked (into x and y coordinates)
`add_header_cols`	A Boolean indicating if two header columns (`header_col` and `header_col_seq`) should be added to the result and initialized to resp. `0` and `1`. Useful when `extract_table()` is used later on.
`engine`	The OCR engine to use. See `tesseract::tesseract()`

Value

A data.frame with the scanned words. See Details

Details

The result is a data.frame with one row for each word found and the following columns

line : the line on which the word was found
fldnr: the sequence number of the word on this line
word : the word that is recognized by the engine
bbox : the bounding box where the word was found (character string with e.g. '19 227 1087 251' indicating x-coordinates x1=19 and x2=1087 and y-coordinates y1=227 and y2=251). Not present when extract_bbox=T is set: in that case x1, x2, y1 and y2 are present.
conf : the confidence rate of the word (only when confsel=T is set)
header_col : column with 0-s (only when add_header_cols=T is set)
header_col_seq : column with 1-s (only when add_header_cols=T is set)

Acknowledgment

This function is an extension of the snippet by Jeroen Ooms. I only added the extraction of the line info. Afterwards I made it into a function and usable for connecting it with extract_table().

Examples

## Not run: 
df1  = scan_with_hocr(img2,add_header_cols=F)

## End(Not run)

HanOostdijk/HOQCutil documentation built on July 28, 2023, 5:56 p.m.

HanOostdijk/HOQCutil index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

HanOostdijk/HOQCutil
Utilities by Han Oostdijk

scan_with_hocr: doing a Tesseract scan with HOCR output
In HanOostdijk/HOQCutil: Utilities by Han Oostdijk

doing a Tesseract scan with HOCR output

Description

Usage

Arguments

Value

Details

Acknowledgment

See Also

Examples

Related to scan_with_hocr in HanOostdijk/HOQCutil...

R Package Documentation

Browse R Packages

We want your feedback!

HanOostdijk/HOQCutil Utilities by Han Oostdijk

scan_with_hocr: doing a Tesseract scan with HOCR output In HanOostdijk/HOQCutil: Utilities by Han Oostdijk

doing a Tesseract scan with HOCR output

Description

Usage

Arguments

Value

Details

Acknowledgment

See Also

Examples

Related to scan_with_hocr in HanOostdijk/HOQCutil...

R Package Documentation

Browse R Packages

We want your feedback!

HanOostdijk/HOQCutil
Utilities by Han Oostdijk

scan_with_hocr: doing a Tesseract scan with HOCR output
In HanOostdijk/HOQCutil: Utilities by Han Oostdijk