read_bbox_layout_xhtml: Read PDF element positions from pdftotext.

Description Usage Arguments Examples

View source: R/parse_pdftotext_layout_output.R

Description

read_bbox_layout_xhtml Parses PDF layout files produced by pdftotext command pdftotext file -bbox-layout. The function returns a tibble with bounding box information for each word, line, block and page.

Usage

1
read_bbox_layout_xhtml(path_to_html)

Arguments

path_to_html

Path to HTML file generated by pdftotext.

Examples

1
2
doc <- system.file("extdata", "edi_2009_frcho43c6mmlx5lyohqy_doc#immrrkosg.html", package = "pdfparser")
read_bbox_layout_xhtml(doc)

balthasars/pdfparser documentation built on May 10, 2020, 12:33 a.m.