Description Usage Arguments Value Examples
We need the list of PNGs to be sorted correctly, but list.files
returns a naively sorted list. This means that a document with 10 pages would
return as 'file-0.png file-1.png file-10.png...'; when concatenating
the TXTs, the pages would be concatenated out-of-order. This function returns
the files in the correct order for OCR, and that order is then used for
concatenating the OCR'd pages.
1 | get_sorted_files(path, ext)
|
path |
Path to the directory containing TXT files from Tesseract |
ext |
The extension of the files (e.g., png) to match |
The (correctly) sorted vector of TXT files
1 2 3 4 5 6 7 8 | ## Not run:
get_sorted_files("OCR_tmp/doc1/", "png")
# Returns a vector of files such as c(doc1-0.png, doc1-1.png, doc1-2.png,
# ..., doc1-10.png) rather than c(doc1-0.png, doc1-1.png, doc1-10.png,
# doc1-2.png, ...)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.