scanner_functions: Scanner functions

scanner_functionsR Documentation

Scanner functions

Description

cleanup_bw(), scan_with_hocr() and extract_table() can be used to cleanup and scan (OCR) an image and extract a table into a data.frame format. The workflow would then be:

  • read an image from file with magick::image_read()
    e.g. img1 = magick::image_read('example1.png')

  • define the list with cleanup options
    e.g. cln_options1 = list(resize="4000x",trim=10,enhance=TRUE,sharpen=1)

  • use the cleanup_bw() function with this list
    e.g. img2 = cleanup_bw (img1,cln_options1)

  • scan (OCR) the cleansed image with scan_with_hocr()
    e.g. df1 = scan_with_hocr(img2,add_header_cols=F)

  • indicate in the columns of df1 which fields belong to the table headers (or alternatively define a headers list)

  • extract the table with the extract_table() function
    e.g. df2= extract_table(df1, headers=NULL,lastline = Inf, desc_above=T) or alternatively
    df2= extract_table(df1, headers=headers,lastline = Inf, desc_above=T)


HanOostdijk/HOQCutil documentation built on July 28, 2023, 5:56 p.m.