bboxToDF: Utility Function for Manipulating Bounding Box Collection

bboxToDFR Documentation

Utility Function for Manipulating Bounding Box Collection

Description

We typically return the bounding boxes of the elements recognized by tesseract on an image as a matrix, with the text for each element in the rownames. A data frame would not allow duplicate text as row names. The bboxToDF function converts the matrix version to a data frame, adding the text as a 5th column.

It can be useful to have the bounding boxes as a data frame when we split the rows based on some criteria.

orderBBox is a convenient function for easily ensuring the rows in a bounding box are arranged from left to right or top to bottom or vice versa. This is a simple version of bbox[ order(bbox[, col], decreasing = TRUE/FALSE), ]

Usage

bboxToDF(bb)
orderBBox(bbox, colName = "bottom", decreasing = TRUE) 

Arguments

bb

the bounding box as a matrix

bbox

the bounding box as a matrix or data.frame

colName

the name or index/positio of the column of the bounding box to order the rows by.

decreasing

a logical value controlling whether the rows are odered by the values in colName in decreasing or increasing order.

Value

bboxToDF returns a data frame with the text for each bounding box in a 5th column named text.

orderBBox returns the same object as its input, but in different row order.

Author(s)

Duncan Temple Lang

See Also

GetBoxes


duncantl/Rtesseract documentation built on March 25, 2022, 5:50 a.m.