PDE_analyzer: Extracting data from PDF (Portable Document Format) files

View source: R/PDE.R

PDE_analyzerR Documentation

Extracting data from PDF (Portable Document Format) files

Description

The PDE_analyzer allows the sentence and table extraction from multiple PDF files.

Usage

PDE_analyzer(PDE_parameters_file_path = NA, verbose = TRUE)

Arguments

PDE_parameters_file_path

String. This file includes all parameters to run PDE_extr_data_from_pdfs on multiple PDF files. If PDE_parameters_file_path does not exist or is NA a dialog box is opened prompting the user to select the parameter file.

verbose

Logical. Indicates whether messages will be printed in the console. Default: TRUE.

Value

If tables were extracted from the PDF file the function returns a list of following tables/items: 1) htmltablelines, 2) txttablelines, 3) keeplayouttxttablelines, 4) id, 5) out_msg. The tablelines are tables that provide the heading and position of the detected tables. The id provide the name of the PDF file. The out_msg includes all messages printed to the console or the suppressed messages if verbose=FALSE.

Details

The parameter file (also referred to as .tsv file) can either manually or with the help of the PDE_analyzer_i interface be filled.

Note

A detailed description of the parameters in the TSV file can be found in the markdown file (README_PDE.md) and in the description of PDE_extr_data_from_pdfs.

See Also

PDE_extr_data_from_pdfs

Examples

 if(PDE_check_Xpdf_install() == TRUE){
   PDE_analyzer(paste0(system.file(package = "PDE"),
   "/examples/tsvs/PDE_parameters_v1.4_all_files+-0.tsv"))
 }

## Not run: 
 ## requires user file choice:
 PDE_analyzer()

## End(Not run)


erikstricker/PDE documentation built on Jan. 25, 2024, 2:10 p.m.