Home

/

GitHub

/

SeanFobbe/databuilder

/

pdf_extract: Parallelized Conversion of PDF to TXT: Extracts text from PDF...

pdf_extract: Parallelized Conversion of PDF to TXT: Extracts text from PDF...
In SeanFobbe/databuilder: Create New Data Sets (esp. Large Text Corpora)

View source: R/pdf_extract.R

pdf_extract

R Documentation

Parallelized Conversion of PDF to TXT

Extracts text from PDF files and writes the result to disk as TXT files. Parallel implementation with the future package. Resulting TXT files have the same filename as the original document (only the extension is modified).

Description

Please note that you must declare your own future evaluation strategy prior to using the function to enable parallelization. By default the function will be evaluated sequentially. On Windows, use future::plan(multisession, workers = n), on Linux/Mac, use future::plan(multicore, workers = n), where n stands for the number of CPU cores you wish to use. Due to the need to read from the disk the function may not work properly on high-performance clusters.

Usage

pdf_extract(x, outputdir = NULL, quiet = TRUE)

Arguments

`x`	A vector of PDF filenames.
`quiet`	Supress messages.

Value

A set of TXT files on disk with the same basename as the original PDF files. Invisible return in R session.

SeanFobbe/databuilder documentation built on July 20, 2022, 4:50 a.m.

SeanFobbe/databuilder index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

SeanFobbe/databuilder
Create New Data Sets (esp. Large Text Corpora)

pdf_extract: Parallelized Conversion of PDF to TXT: Extracts text from PDF...
In SeanFobbe/databuilder: Create New Data Sets (esp. Large Text Corpora)

Parallelized Conversion of PDF to TXT

Description

Usage

Arguments

Value

Related to pdf_extract in SeanFobbe/databuilder...

R Package Documentation

Browse R Packages

We want your feedback!

SeanFobbe/databuilder Create New Data Sets (esp. Large Text Corpora)

pdf_extract: Parallelized Conversion of PDF to TXT: Extracts text from PDF... In SeanFobbe/databuilder: Create New Data Sets (esp. Large Text Corpora)

Parallelized Conversion of PDF to TXT

Description

Usage

Arguments

Value

Related to pdf_extract in SeanFobbe/databuilder...

R Package Documentation

Browse R Packages

We want your feedback!

SeanFobbe/databuilder
Create New Data Sets (esp. Large Text Corpora)

pdf_extract: Parallelized Conversion of PDF to TXT: Extracts text from PDF...
In SeanFobbe/databuilder: Create New Data Sets (esp. Large Text Corpora)