| asciimostly | An example tesseract config file |
| cat_pages | Concatenate OCR'd pages to a single file |
| check_embed | Check if text embed is not from OCR |
| check_pdf | Check if file is a PDF |
| convert_to_imgs | Convert a file (PDF) to per-page images (PNG) |
| get_sorted_files | Return a list of correctly sorted imgs for Tesseract OCR |
| load_text | Load text extracted from a pdf to a list |
| make_main_dirs | Create main directories expected by 'pdftext' |
| ocr_pages | Perform optical character recognition on PNGs. |
| ocr_pdf | Perform optical character recognition on a PDF |
| pdftext | pdftext: A package to extract text from PDFs |
| pdf_to_txt | Extract text from a pdf and write to a txt file |
| run_unpaper | Run 'unpaper' to fix rotation angles |
| save_imgs | Save the images directory from options()$pdftext.wkdir |
| save_pages | Save the pages directory from tempdir |
| save_txts | Save the text directory from tempdir |
| set_tess_conf | Set a custom 'tesseract' config for OCR |
| set_wkdir | Set the option for the working directory |
| test_embed | Test if a PDF has embedded text. |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.