extract_corpus: Extract text from one to many pdf documents into a tm Corpus.

Description Usage Arguments Value Examples

View source: R/extract_corpus.R

Description

Extract text from one to many pdf documents into a tm Corpus.

Usage

1

Arguments

paths

Path to a file

which

One of gs, or xpdf.

...

further args passed on

Value

A tm Corpus or VCorpus

Examples

1
2
3
4
5
6
7
8
9
paths <- c("~/github/sac/scott/pdfs/BarraquandEtal2014peerj.pdf",
"~/github/sac/scott/pdfs/Chamberlain&Holland2009Ecology.pdf",
"~/github/sac/scott/pdfs/Revell&Chamberlain2014MEE.pdf")
res <- extract_corpus(paths, "gs")
res
tm::TermDocumentMatrix(res$data)

res <- extract_corpus(path, "xpdf")
res

ropensci/extractr documentation built on May 16, 2018, 6:59 a.m.