- Cloud OCR SDK Documentation
Quick Start GuidesThese guides will help you to start development with ABBYY Cloud OCR SDK.
API ReferenceThe description of API methods.
Code SamplesDemo samples show how to create a simple application with Cloud OCR SDK.
Sample ImagesHere you can download sample images in different languages.
Technical SpecificationsThe list of supported image formats, recognition languages, text types, etc.
FAQAnswers to commonly-asked questions about ABBYY Cloud OCR SDK.
Best PracticesGeneral recommendations on settings needed for the best recognition results.
Cloud OCR SDK Documentation
The method allows you to extract the value of a text field on an image. The method loads the image, creates a processing task for the image with the specified parameters, and passes the task for processing.
The image file is transmitted in the request body. See the list of supported input formats.
fieldLevelRecognition profile is used for processing.
The result of recognition is returned in XML format.
See How to Recognize Text Fields to know how to tune the parameters.
For details on task cost please see billing terms.
|Parameter||Is required||Default value||Description|
|region||No||"-1,-1,-1,-1"||Specifies the region of the text field on the image. The coordinates of the region are measured in pixels relative to the left top corner of the image and are specified in the following order: left, top, right, bottom. By default, the region of the whole image is used.|
|language||No||"English"||Specifies recognition language of the document. This parameter can contain several language names separated with commas, for example "English,French,German". See the list of available recognition languages. Note that not all languages are available for handprint recognition. The languages which are available for handprint recognition are marked with a special comment.|
|letterSet||No||""||Specifies the letter set, which should be used during recognition. Contains a string with the letter set characters. For example, "ABCDabcd'-.". By default, the letter set of the language, specified in the language parameter, is used.|
Specifies the regular expression which defines which words are allowed in the field and which are not. See the description of regular expressions. By default, the set of allowed words is defined by the dictionary of the language, specified in the language parameter.
Note that regular expressions do not strictly limit the set of characters of the output result, i.e. the recognized value may contain characters which are not included into the regular expression. During recognition all hypotheses of a word recognition are checked against the specified regular expression. If a given recognition variant conforms to the expression, it has higher probability of being selected as final recognition output. But if there is no variant that matches regular expression, the result will not conform to the expression. If you want to limit the set of characters, which can be recognized, the best way to do it is to use letterSet parameter.
|textType||No||"normal"||Specifies the type of the text in the field. This parameter may also contain several text types separated with commas, for example "normal,matrix". The following values can be used:
|oneTextLine||No||"false"||Specifies whether the field contains only one text line. The value should be true, if there is one text line in the field; otherwise it should be false.|
|oneWordPerTextLine||No||"false"||Specifies whether the field contains only one word in each text line. The value should be true, if no text line contains more than one word (so the lines of text will be recognized as a single word); otherwise it should be false.|
|markingType||No||"simpleText"||This property is valid only for the handprint recognition. Specifies the type of marking around letters (for example, underline, frame, box, etc.). By default, there is no marking around letters. The value can be one of the following:
|placeholdersCount||No||"1"||Specifies the number of character cells for the field.
This property has a sense only for the field marking types (the markingType parameter) that imply splitting the text in cells.
Default value for this property is 1, but you should set the appropriate value to recognize the text correctly.
|writingStyle||No||"default"||Provides additional information about handprinted letters writing style. It can be one of the following:
|description||No||""||Contains the description of the processing task. Must contain no more that 255 characters.|
|pdfPassword||No||""||Contains a password for accessing password-protected images in PDF format.|
Status codes and response format
General status codes and response format of the method are described in HTTP Status Codes and Response Formats.
The following status codes can be returned when this method is called:
|200||Successful method call.|
|450||Incorrect parameters have been passed. One of the following errors occurred:
|550||An internal program error occurred while processing the image.|
|551||An error occurred on the server side:
Output file format
The output XML file has the following format:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <document xmlns="@link" xmlns:xsi="@link" xsi:schemaLocation="@link" version="1.0"> <field left=”0” top=”0” right=”199” bottom=”100” type=”text”> <value encoding=”UTF-16”>Data Capture Sample Text Data</value> <line left=”0” top=”0” right=”199” bottom=”100”> <char left=”0” top=”0” right=”199” bottom=”100” confidence=”98”>
</char> … </line> … </field> </document>