ft_extract: Extract text from a single pdf document

Description Usage Arguments Value Examples

View source: R/ft_extract.R

Description

ft_extract attemps to make it easy to extract text from PDFs, using pdftools. Inputs can be either paths to PDF files, or the output of ft_get() (class ft_data).

Usage

1

Arguments

x

Path to a pdf file, or an object of class ft_data, the output from ft_get()

Value

An object of class pdft_char in the case of character input, or of class ft_data in the case of ft_data input

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
## Not run: 
path <- system.file("examples", "example1.pdf", package = "fulltext")
(res <- ft_extract(path))

# use on output of ft_get() to extract pdf to text
## arxiv
res <- ft_get('cond-mat/9309029', from = "arxiv")
res2 <- ft_extract(res)
res$arxiv$data
res2$arxiv$data

## biorxiv
res <- ft_get('10.1101/012476')
res2 <- ft_extract(res)
res$biorxiv$data
res2$biorxiv$data

## End(Not run)

fulltext documentation built on June 12, 2021, 9:06 a.m.