OCR text and handwritten forms using Captricity. Captricity's big advantage over Abbyy Cloud OCR is that it allows the user to easily specify the position of text-blocks that want to OCR; they have a simple web-based UI. The quality of the OCR can be checked using
compare_txt from recognize.
To get the latest version on CRAN:
To get the current development version from GitHub:
install.packages("devtools") devtools::install_github("soodoku/captr", build_vignettes = TRUE)
Read the vignette:
vignette("using_captr", package = "captr")
or follow the overview below.
Start by getting an application token and setting it using:
Then, create a batch using:
Once you have created a batch, you need to get the template ID (it tells Captricity what data to pull from where). Captricity requires a template. These templates can be created using the Web UI.
Next, assign the template ID to a batch:
Next, upload image(s) to a batch
Next, check whether the batch is ready to be processed:
You may also want to find out how much would processing the batch set you back by:
Once you are ready, submit the batch:
Captricity excels in nomenclature confusion. So once a batch is submitted, it is then called a job. The id for the job can be obtained from
the list that is returned from
submit_batch. The field name is
To track progress of a job, use:
List all forms (instance sets) associated with a job:
If you want to download data from a particular form, use the
list_instance_sets to get the form (instance_set) id and run:
Get csv of all your results from a job:
Scripts are released under the MIT License.
The project welcomes contributions from everyone! In fact, it depends on it. To maintain this welcoming atmosphere, and to collaborate in a fun and productive way, we expect contributors to the project to abide by the Contributor Code of Conduct.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.