grab_text: Grab raw text from pdf

Description Usage Arguments Value Note Author(s)

Description

This function returns the entire text from a pdf.

Usage

1
2
grab_text(pdf, pdf2txt = "pdf2txt.py", remove_nulls = TRUE,
  remove_singles = TRUE, ...)

Arguments

pdf

The path to the pdf file

pdf2txt

The shell command to run the python script pdf2txt.py (default)

remove_nulls

A flag indicating that empty lines should be ignored (default = TRUE)

remove_singles

A flag indicating that lines with only one character on them should be ignored (default=TRUE)

...

Ignored

Value

The raw text from the file.

Note

This function is the main function through which pdf text is to be filtered. This funciton requires pdf2txt.py to be installed on the machine.

Author(s)

Rodney J. Dyer <rjdyer@vcu.edu>


dyerlab/footprints documentation built on May 15, 2019, 7:21 p.m.