View source: R/inpdfr_PRO_extractTxt.R
| getPDF | R Documentation |
getPDF returns a word-occurrence data.frame from PDF files.
It needs XPDF in order to run (http://www.foolabs.com/xpdf/download.html),
and uses parallel to perform parallel computation.
getPDF(
myPDFs,
minword = 1,
maxword = 20,
minFreqWord = 1,
pathToPdftotext = ""
)
myPDFs |
A character vector containing PDF file names. |
minword |
An integer specifying the minimum number of letters per word into the returned data.frame. |
maxword |
An integer to specifying the maximum number of letters per word into the returned data.frame. |
minFreqWord |
An integer specifying the minimum word frequency into the returned data.frame. |
pathToPdftotext |
A character containing an alternative path to XPDF
|
getPDF uses XPDF pdftotext function to extract the
content of PDF files into a TXT file. If pdftotext is not in the
PATH, an alternative is to provide the full path of the program into
the pathToPdftotext parameter.
A list of list with word-occurrence data.frame and file name.
## Not run:
getPDF(myPDFs = "mypdf.pdf")
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.