Extract text from one to many pdf documents into a tm Corpus or Vcorpus.

Description

Extract text from one to many pdf documents into a tm Corpus or Vcorpus.

Usage

1
ft_extract_corpus(paths, which = "xpdf", ...)

Arguments

paths

Path to one or more pdfs

which

One of gs or xpdf.

...

further args passed on to readerControl parameter in Corpus

Value

A tm Corpus (or VCorpus, later that is)

See Also

ft_extract

Examples

1
2
3
4
5
6
7
8
## Not run: 
path <- system.file("examples", "example1.pdf", package = "fulltext")
(res <- ft_extract_corpus(path, "xpdf"))
tm::TermDocumentMatrix(res$data)

(res_gs <- ft_extract_corpus(path, "gs"))

## End(Not run)