batch_get_gold | Extract text from a set of PDFs with embedded text. |
batch_get_ngrams | Get a n-grams from one or more texts |
batch_simulate_degrade_set | Simulate degraded PDFs from a set of input PDFs |
char_ngrams | Return a df with counts of all characters in df |
check_embed | Check if text embed is not from OCR |
create_dirs | Create directories for 'ocrerrs' |
degrade_blur | Degrade PDF quality by simulating blurred text |
degrade_complex | Degrade PDF quality by combining degradation parameters |
degrade_density | Degrade PDF quality by manipulating pixel density |
degrade_fax | Degrade PDF quality by simulating a fax |
degrade_pages | Wrap degrade functions of split PDF files |
degrade_rotate | Degrade PDF quality by simulating page rotation |
find_errors | Find errors from OCR by comparing to gold standard |
find_min_dists | Find the minimum string edit for each bad word |
get_bg_1grams | Get ngrams and counts for bad and gold strings |
get_delta_words | Get words with difference frequencies between bad and gold... |
get_dist_mat | Return a matrix of optimal string alignment distances for... |
get_embed_pages | Return a vector of pages with embedded text |
get_file_base | Return the base name of a file |
get_gold | Extract text from a PDF with embedded text. |
get_ngrams | Get a set of n-grams from text |
get_POS | Return a table of parts of speech |
hello | Hello, World! |
hunspell_errors | Use hunspell to find errors |
label_delta_words | Label words as correct or errors |
make_gold_path | Create a path to which 'gold standard' results are written |
normalize_text | Clean EOL characters from |
ocr_pages | Wrap optical character recognition around a set of files |
save_gold_text | Save the extracted text as a .rda |
simulate_degrade_set | Simulate degraded PDFs from an input PDF |
split_pdf | Split a PDF into multiple pages |
summarize_gold | Summarize the text from a gold-standard PDF |
tess_ocr | Perform optical character recognition with tesseract |
write_gold_text | Write the extracted text to file |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.