Description Usage Arguments Details Value Examples
Extract text from a single pdf document
1 |
path |
(character) path to a file, file must exist |
raw |
(raw) raw bytes |
try_ocr |
(logical) whether to try extracting OCRed
pages with |
... |
args passed on to |
We use pdftools under the hood to do pdf text extraction.
You have to supply either path
or raw
- not both.
An object of class crm_pdf
with a slot for
info
(pdf metadata essentially), and text
(the extracted
text) - with an attribute (path
) with the path to the pdf
on disk
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | path <- system.file("examples", "MairChamberlain2014RJournal.pdf",
package = "crminer")
(res <- crm_extract(path))
res$info
res$text
# with newlines, pretty print
cat(res$text)
# another example
path <- system.file("examples", "ChamberlainEtal2013Ecosphere.pdf",
package = "crminer")
(res <- crm_extract(path))
res$info
cat(res$text)
# with raw pdf bytes
path <- system.file("examples", "raw-example.rds", package = "crminer")
rds <- readRDS(path)
class(rds)
crm_extract(raw = rds)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.