extract_corpus: Extract text from one to many pdf documents into a tm Corpus.

View source: R/extract_corpus.R

extract_corpusR Documentation

Extract text from one to many pdf documents into a tm Corpus.

Description

Extract text from one to many pdf documents into a tm Corpus.

Usage

extract_corpus(paths, which, ...)

Arguments

paths

Path to a file

which

One of gs, or xpdf.

...

further args passed on

Value

A tm Corpus or VCorpus

Examples


paths <- c("~/github/sac/scott/pdfs/BarraquandEtal2014peerj.pdf",
"~/github/sac/scott/pdfs/Chamberlain&Holland2009Ecology.pdf",
"~/github/sac/scott/pdfs/Revell&Chamberlain2014MEE.pdf")
res <- extract_corpus(paths, "gs")
res
tm::TermDocumentMatrix(res$data)

res <- extract_corpus(path, "xpdf")
res


ropensci/extractr documentation built on May 18, 2022, 9:56 a.m.