OCR text and handwritten forms using Captricity. Captricity's big advantage over Abbyy Cloud OCR is that it allows the user to easily specify the position of text-blocks that want to OCR; they have a simple web-based UI. The quality of the OCR can be checked using compare_txt
from recognize.
To get the latest version on CRAN:
install.packages("captr")
To get the current development version from GitHub:
install.packages("devtools")
devtools::install_github("soodoku/captr", build_vignettes = TRUE)
Read the vignette:
vignette("using_captr", package = "captr")
or follow the overview below.
Start by getting an application token and setting it using:
set_token("token")
Then, create a batch using:
create_batch("batch_name")
Once you have created a batch, you need to get the template ID (it tells Captricity what data to pull from where). Captricity requires a template. These templates can be created using the Web UI.
set_template_id("id")
Next, assign the template ID to a batch:
set_batch_template("batch_id", "template_id")
Next, upload image(s) to a batch
upload_image(batch_id="batch_id", path_to_image="image_path")
Next, check whether the batch is ready to be processed:
test_readiness(batch_id="batch_id")
You may also want to find out how much would processing the batch set you back by:
batch_price(batch_id="batch_id")
Once you are ready, submit the batch:
submit_batch(batch_id="batch_id")
Captricity excels in nomenclature confusion. So once a batch is submitted, it is then called a job. The id for the job can be obtained from the list that is returned from submit_batch
. The field name is related_job_id
.
To track progress of a job, use:
track_progress(job_id ="job_id")
List all forms (instance sets) associated with a job:
list_instance_sets(job_id="job_id")
If you want to download data from a particular form, use the list_instance_sets
to get the form (instance_set) id and run:
get_instance_set(instance_set_id="instance_set_id")
Get csv of all your results from a job:
get_all(job_id="job_id")
Scripts are released under the MIT License.
The project welcomes contributions from everyone! In fact, it depends on it. To maintain this welcoming atmosphere, and to collaborate in a fun and productive way, we expect contributors to the project to abide by the Contributor Code of Conduct.