Goals.md
In dsidavis/GetDocElements: Reconstruct PDF Document Elements

Goals for this GetDocElements

Given bounding boxes from either Rtesseract or ReadPDF,

convert to a common format for subsequent operations
Identify/reconstruct elements from the bounding boxes, including:
columns - 2, 3 or more
header, footer, and page numbers/etc.
document title, authors, and date
section headers
section text, including sections that span pages or are interrupted by tables or figures.
images/tables with captions - not parsed at this stage, but identified and collected
lines, boxes and other page dividers?