ft_extract_corpus: Extract text from one to many pdf documents into a tm Corpus...

Description Usage Arguments Value See Also Examples

Description

Extract text from one to many pdf documents into a tm Corpus or Vcorpus.

Usage

1
ft_extract_corpus(paths, which = "xpdf", ...)

Arguments

paths

Path to one or more pdfs

which

One of gs or xpdf.

...

further args passed on to readerControl parameter in Corpus

Value

A tm Corpus (or VCorpus, later that is)

See Also

ft_extract

Examples

1
2
3
4
5
6
7
8
## Not run: 
path <- system.file("examples", "example1.pdf", package = "fulltext")
(res <- ft_extract_corpus(path, "xpdf"))
tm::TermDocumentMatrix(res$data)

(res_gs <- ft_extract_corpus(path, "gs"))

## End(Not run)


Search within the fulltext package
Search all R packages, documentation and source code

Questions? Problems? Suggestions? or email at ian@mutexlabs.com.

Please suggest features or report bugs with the GitHub issue tracker.

All documentation is copyright its authors; we didn't write any of that.