extract: Extract text from a single pdf document

Description Usage Arguments Value Examples

View source: R/extract.R

Description

This function wraps many methods to extract text from non-scanned PDFs - no OCR methods used here. Available methods include xpdf, Ghostscript, and Poppler via pdftools

Usage

1
extract(paths, which = "xpdf", ...)

Arguments

paths

(character) One or more paths to a file

which

(character) One of gs, xpdf (default), or pdftools

...

further args passed on

Value

A list or a single object, of class gs_extr, xpdf_extr, or poppler_extr. All share the same global class extr

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
## Not run: 
path <- system.file("examples", "example1.pdf", package = "extractr")

# xpdf
xpdf <- extract(path, "xpdf")
xpdf$meta
xpdf$data

# Ghostscript
gs <- extract(path, "gs")
gs$meta
gs$data

# pdftools
pdft <- extract(path, "pdftools")
pdft$meta
cat(pdft$data)

# Pass many paths at once
path1 <- system.file("examples", "example1.pdf", package = "extractr")
path2 <- system.file("examples", "example2.pdf", package = "extractr")
path3 <- system.file("examples", "example3.pdf", package = "extractr")
extract(c(path1, path2, path3))

## End(Not run)

ropensci/extractr documentation built on May 16, 2018, 6:59 a.m.