‚Äč

Cloud OCR SDK Documentation

XML Parameters of Field Recognition

The processing parameters of the processFields method are specified using XML file, which is transmitted in the request body. You can use the XSD scheme of the XML file to create the file with necessary settings. You can find the description of the main XML tags below:

NameDescription
document The root tag. Contains the fieldTemplates and page elements. The fieldTemplates element allows you to specify a set of field templates, which can be reused on several pages. With the help of the page element you can specify settings of individual field instances.
fieldTemplates

Contains a set of field templates of text, barcode, and checkmark types. For each type of field template you can specify its own parameters (see the description of the text (textFieldType), barcode (barcodeFieldType), checkmark (checkmarkFieldType) tags. You can create several templates of the same type.

Templates are useful, if the document contains several fields, which can be recognized with the same parameters. In order the field template can be reused in the document, you should specify its id attribute. Then this id can be specified in the template attribute of a field instance.

page
(pageType)

Contains the set of field instances, which should be found on a page. The document tag can contain several tags of this type. The required attribute of the page tag is applyTo. Only the pages specified in this attribute are processed with the settings specified in this page tag.

The page can contain several field instances of the text (textFieldInstanceType), barcode (barcodeFieldInstanceType), and checkmark (checkmarkFieldInstanseType) types.

text
(textFieldType or textFieldInstanceType)

Text field. The field of the type textFieldType contains the template of a text field, while the field of the type textFieldInstanceType contains the settings of an individual text field. Both types can include the following elements:

  • language — contains the string with the recognition languages of the field. It can contain several language names separated with commas. See the list of available recognition languages.
  • letterSet — contains the string with the letter set characters.
  • regExp — contains the string with the regular expression which defines which words are allowed in the field and which are not. See the description of regular expressions.
  • textType — contains the type of the text in the field. See the description of available text types.
  • oneTextLine — the boolean value, which specifies whether the field contains only one text line.
  • oneWordPerLine — the boolean value, which specifies whether the field contains only one word in each text line.
  • markingType — this element is valid only for the handprint recognition. Specifies the type of marking around letters.
  • placeholdersCount — contains the number of character cells for the field. This property has sense only for the field marking types that imply splitting the text in cells.
  • writingStyle — contains the string, which provides additional information about handprinted letters writing style.

You can find additional information on the parameters in How to Recognize Text Fields.

Both field template and field instance have id, left, top, right, bottom attributes. The id attribute specifies the id of the field template or field instance, other attributes specify coordinates.

The field instance has additional template attribute, which specifies the id of the field template that should be used for this field. The parameters of the field template can be rewritten by the parameters of the field instance.

barcode
(barcodeFieldType or barcodeFieldInstanceType)

Barcode field. The field of the type barcodeFieldType contains the template of a barcode field, while the field of the type barcodeFieldInstanceType contains the settings of an individual barcode field. Both types can include the following elements:

  • type — specifies the type of barcode. See the description of barcode types for details.
  • containsBinaryData — the boolean value, which makes sense only for PDF417 and Aztec barcodes, which encode some binary data. If this parameter is set to true, the binary data encoded in a barcode are saved as a sequence of hexadecimal values for corresponding bytes.

You can find additional information on the parameters in How to Recognize Barcodes.

Both field template and field instance have id, left, top, right, bottom attributes. The id attribute specifies the id of the field template or field instance, other attributes specify coordinates.

The field instance has additional template attribute, which specifies the id of the field template that should be used for this field. The parameters of the field template can be rewritten by the parameters of the field instance.

checkmark
(checkmarkFieldType or checkmarkFieldInstanceType)

Checkmark field. The field of the type checkmarkFieldType contains the template of a checkmark field, while the field of the type checkmarkFieldInstanceType contains the settings of an individual checkmark field. Both types can include the following elements:

  • type — specifies the type of checkmark. It can be either empty or square.
  • correctionAllowed — the boolean value, which if set to true means that checkmark can be selected and then corrected.

Similar parameters are used in processCheckmarkField method.

Both field template and field instance have id, left, top, right, bottom attributes. The id attribute specifies the id of the field template or field instance, other attributes specify coordinates.

The field instance has additional template attribute, which specifies the id of the field template that should be used for this field. The parameters of the field template can be rewritten by the parameters of the field instance.

Samples

A simple XML, which contains processing settings for only one field:

<document xmlns="http://ocrsdk.com/schema/taskDescription-1.0.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://ocrsdk.com/schema/taskDescription-1.0.xsd http://ocrsdk.com/schema/taskDescription-1.0.xsd">

<fieldTemplates></fieldTemplates>

<page applyTo="0,1,3,4">
       <text id="myTextBlock" left="0" top="0" right="20" bottom="50">
             <language>English,German</language>
             <textType>matrix</textType>
             <oneTextLine>true</oneTextLine>
       </text>
</page>
</document>

A more difficult sample, which includes templates of several fields:

<?xml version="1.0" encoding="UTF-8"?>
<document xmlns="http://ocrsdk.com/schema/taskDescription-1.0.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://ocrsdk.com/schema/taskDescription-1.0.xsd http://ocrsdk.com/schema/taskDescription-1.0.xsd">
<fieldTemplates>
        <text id="textField" bottom="0" left="0" right="0" top="0">
                <language>English,German</language>
                <textType>matrix</textType>
                <oneTextLine>true</oneTextLine>
        </text> 
        <barcode bottom="0" id="barcodeField">
                <type>ean13</type>
                <containsBinaryData>true</containsBinaryData>
        </barcode> 
        <checkmark id="checkmarkField">
                <type>empty</type>
                <correctionAllowed>true</correctionAllowed>
        </checkmark>
</fieldTemplates>
<page applyTo="0,3">
        <text template="textField" id="field1" left="10" top="10" right="20" bottom="30">
                <language>English</language>
        </text>
        <text template="textField" id="field2" left="10" top="50" right="20" bottom="70">
                <language>Chinese,Japanese</language>
        </text>       
        <text template="textField" id="field3" left="10" top="50" right="20" bottom="70">
                <language>French</language>
        </text>       
        <barcode template="barcodeField" id="barcode1" left="100" top="10" right="150" bottom="200">
                <type>code39</type>
        </barcode>
        <barcode template="barcodeField" id="barcode2" left="100" top="210" right="150" bottom="400" />       
        <checkmark template="checkmarkField" id="cm1" left="200" top="10" right="220" bottom="30" />
</page>
</document>