ft_extract: Extract text from a single pdf document

View source: R/ft_extract.R

ft_extractR Documentation

Extract text from a single pdf document

Description

ft_extract attemps to make it easy to extract text from PDFs, using pdftools. Inputs can be either paths to PDF files, or the output of ft_get() (class ft_data).

Usage

ft_extract(x)

Arguments

x

Path to a pdf file, or an object of class ft_data, the output from ft_get()

Value

An object of class pdft_char in the case of character input, or of class ft_data in the case of ft_data input

Examples

## Not run: 
path <- system.file("examples", "example1.pdf", package = "fulltext")
(res <- ft_extract(path))

# use on output of ft_get() to extract pdf to text
## arxiv
res <- ft_get('cond-mat/9309029', from = "arxiv")
res2 <- ft_extract(res)
res$arxiv$data
res2$arxiv$data

## biorxiv
res <- ft_get('10.1101/012476')
res2 <- ft_extract(res)
res$biorxiv$data
res2$biorxiv$data

## End(Not run)

ropensci/fulltext documentation built on Sept. 12, 2022, 7:57 a.m.