extract_text: extract_text

Description Usage Arguments Details Value Author(s) See Also Examples

Description

Extract text from a file

Usage

1
extract_text(file, pages = NULL, password = NULL, encoding = NULL)

Arguments

file

A character string specifying the path or URL to a PDF file.

pages

An optional integer vector specifying pages to extract from.

password

Optionally, a character string containing a user password to access a secured PDF.

encoding

Optionally, a character string specifying an encoding for the text, to be passed to the assignment method of Encoding.

Details

This function converts the contents of a PDF file into a single unstructured character string.

Value

If pages = NULL (the default), a length 1 character vector, otherwise a vector of length length(pages).

Author(s)

Thomas J. Leeper <thosjleeper@gmail.com>

See Also

extract_tables, extract_areas, split_pdf

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
## Not run: 
# simple demo file
f <- system.file("examples", "data.pdf", package = "tabulizer")

# extract all text from page 1 only
extract_text(f, from = 1, to = 1)

# extract all text
extract_text(f)

## End(Not run)

Logiwo/tabulizer documentation built on May 9, 2019, 1:57 a.m.