extract: Extract text from a single pdf document

View source: R/extract.R

extractR Documentation

Extract text from a single pdf document

Description

This function wraps many methods to extract text from non-scanned PDFs - no OCR methods used here. Available methods include xpdf, Ghostscript, and Poppler via pdftools

Usage

extract(paths, which = "xpdf", ...)

Arguments

paths

(character) One or more paths to a file

which

(character) One of gs, xpdf (default), or pdftools

...

further args passed on

Value

A list or a single object, of class gs_extr, xpdf_extr, or poppler_extr. All share the same global class extr

Examples

## Not run: 
path <- system.file("examples", "example1.pdf", package = "extractr")

# xpdf
xpdf <- extract(path, "xpdf")
xpdf$meta
xpdf$data

# Ghostscript
gs <- extract(path, "gs")
gs$meta
gs$data

# pdftools
pdft <- extract(path, "pdftools")
pdft$meta
cat(pdft$data)

# Pass many paths at once
path1 <- system.file("examples", "example1.pdf", package = "extractr")
path2 <- system.file("examples", "example2.pdf", package = "extractr")
path3 <- system.file("examples", "example3.pdf", package = "extractr")
extract(c(path1, path2, path3))

## End(Not run)

ropensci/extractr documentation built on May 18, 2022, 9:56 a.m.