PDE_analyzer: Extracting data from PDF (Portable Document Format) files

Description Usage Arguments Value Details Note See Also Examples

View source: R/PDE.R

Description

The PDE_analyzer allows the sentence and table extraction from multiple PDF files.

Usage

1
PDE_analyzer(PDE_parameters_file_path = NA, verbose = TRUE)

Arguments

PDE_parameters_file_path

String. This file includes all parameters to run PDE_extr_data_from_pdfs on multiple PDF files. If PDE_parameters_file_path does not exist or is NA a dialog box is opened prompting the user to select the parameter file.

verbose

Logical. Indicates whether messages will be printed in the console. Default: TRUE.

Value

If tables were extracted from the PDF file the function returns a list of following tables/items: 1) htmltablelines, 2) txttablelines, 3) keeplayouttxttablelines, 4) id, 5) out_msg. The tablelines are tables that provide the heading and position of the detected tables. The id provide the name of the PDF file. The out_msg includes all messages printed to the console or the suppressed messages if verbose=FALSE.

Details

The parameter file (also referred to as .tsv file) can either manually or with the help of the PDE_analyzer_i interface be filled.

Note

A detailed description of the parameters in the TSV file can be found in the markdown file (README_PDE.md) and in the description of PDE_extr_data_from_pdfs.

See Also

PDE_extr_data_from_pdfs

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
 if(PDE_check_Xpdf_install() == TRUE){
   PDE_analyzer(paste0(PDE_path(),"examples/tsvs/PDE_parameters_v1.0_all_files+-0.tsv"))
 }

## Not run: 
 ## requires user file choice:
 PDE_analyzer()

## End(Not run)
 

erikstricker/PDE documentation built on Jan. 1, 2021, 1:08 a.m.