tesseract: Tesseract Engine

Description Usage Arguments Details References See Also Examples

Description

Create an OCR engine for a given language and control parameters. This can be used by the ocr and ocr_data functions to recognize text.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
tesseract(
  language = "eng",
  datapath = NULL,
  configs = NULL,
  options = NULL,
  cache = TRUE
)

tesseract_params(filter = "")

tesseract_info()

Arguments

language

string with language for training data. Usually defaults to eng

datapath

path with the training data for this language. Default uses the system library.

configs

character vector with files, each containing one or more parameter values. These config files can exist in the current directory or one of the standard tesseract config files that live in the tessdata directory. See details.

options

a named list with tesseract parameters. See details.

cache

speed things up by caching engines

filter

only list parameters containing a particular string

Details

Tesseract control parameters can be set either via a named list in the options parameter, or in a config file text file which contains the parameter name followed by a space and then the value, one per line. Use tesseract_params() to list or find parameters. Note that that some parameters are only supported in certain versions of libtesseract, and that invalid parameters can sometimes cause libtesseract to crash.

References

tesseract wiki: control parameters

See Also

Other tesseract: ocr(), tesseract_download()

Examples

1

Example output

Error opening data file /usr/share/tesseract-ocr/tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
sh: 1: cannot create /dev/null: Permission denied
Warning messages:
1: Unable to find English training data 
2: In if (grepl("ubuntu|debian", os, TRUE)) { :
  the condition has length > 1 and only the first element will be used
                                 param default desc
15               textord_debug_tabfind       0   NA
16                  textord_debug_bugs       0   NA
22         devanagari_split_debuglevel       0   NA
46                 textord_debug_block       0   NA
54                textord_debug_images       0   NA
55             textord_debug_printable       0   NA
63         devanagari_split_debugimage       0   NA
66                         edges_debug       0   NA
68                        gapmap_debug       0   NA
84              textord_debug_xheights       0   NA
88                  textord_debug_blob       0   NA
90                 textord_oldbl_debug       0   NA
91             textord_debug_baselines       0   NA
110           textord_debug_pitch_test       0   NA
113         textord_debug_pitch_metric       0   NA
133                         poly_debug       0   NA
139           editor_debug_config_file           NA
140                       fx_debugfile FXDebug   NA
143                         debug_file           NA
227                 ambigs_debug_level       0   NA
229               classify_debug_level       0   NA
231                matcher_debug_level       0   NA
232                matcher_debug_flags       0   NA
233      classify_learning_debug_level       0   NA
244                   dawg_debug_level       0   NA
245                 hyphen_debug_level       0   NA
248                stopper_debug_level       0   NA
250                    fragments_debug       0   NA
253                         chop_debug       0   NA
262               segment_adjust_debug       0   NA
263                wordrec_debug_level       0   NA
265              segsearch_debug_level       0   NA
268         language_model_debug_level       0   NA
278                         bidi_debug       0   NA
279                     applybox_debug       1   NA
281              tessedit_bigram_debug       0   NA
282                   debug_x_ht_level       0   NA
286              paragraph_debug_level       0   NA
287                   cube_debug_level       0   NA
294                       crunch_debug       0   NA
297              debug_fix_space_level       0   NA
300                  superscript_debug       0   NA
308       tessdata_manager_debug_level       0   NA
311                      segment_debug       0   NA
313                   tosp_debug_level       0   NA
320             textord_baseline_debug       0   NA
333  classify_enable_adaptive_debugger       0   NA
336 classify_debug_character_fragments       0   NA
337     matcher_debug_separate_windows       0   NA
360               wordrec_debug_blamer       0   NA
373            tessedit_adaption_debug       0   NA
379              tessedit_timing_debug       0   NA
385               tessedit_debug_fonts       0   NA
386     tessedit_debug_block_rejection       0   NA
388               debug_acceptable_wds       0   NA
403       tessedit_debug_doc_rejection       0   NA
404     tessedit_debug_quality_metrics       0   NA
427           tessedit_rejection_debug       0   NA
448                      permute_debug       0   NA
484                textord_noise_debug       0   NA
486           classify_learn_debug_str           NA
490                      word_to_debug           NA
491              word_to_debug_lengths           NA

tesseract documentation built on Jan. 10, 2022, 5:07 p.m.