extract_corpus: Extract text from one to many pdf documents into a tm Corpus.
In ropensci/extractr: Extract Text from 'PDFs'

View source: R/extract_corpus.R

extract_corpus

R Documentation

Extract text from one to many pdf documents into a tm Corpus.

Description

Extract text from one to many pdf documents into a tm Corpus.

Usage

extract_corpus(paths, which, ...)

Arguments

`paths`	Path to a file
`which`	One of gs, or xpdf.
`...`	further args passed on

Value

A tm Corpus or VCorpus

Examples


paths <- c("~/github/sac/scott/pdfs/BarraquandEtal2014peerj.pdf",
"~/github/sac/scott/pdfs/Chamberlain&Holland2009Ecology.pdf",
"~/github/sac/scott/pdfs/Revell&Chamberlain2014MEE.pdf")
res <- extract_corpus(paths, "gs")
res
tm::TermDocumentMatrix(res$data)

res <- extract_corpus(path, "xpdf")
res

ropensci/extractr documentation built on May 18, 2022, 9:56 a.m.