Cloud OCR SDK Documentation

processTextField Method

The method allows you to extract the value of a text field on an image. The method loads the image, creates a processing task for the image with the specified parameters, and passes the task for processing.

[POST] http(s)://cloud.ocrsdk.com/processTextField

The image file is transmitted in the request body. See the list of supported input formats.

fieldLevelRecognition profile is used for processing.

The result of recognition is returned in XML format.

See How to Recognize Text Fields to know how to tune the parameters.

For details on task cost please see billing terms.

Parameters

ParameterIs requiredDefault valueDescription
region No "-1,-1,-1,-1" Specifies the region of the text field on the image. The coordinates of the region are measured in pixels relative to the left top corner of the image and are specified in the following order: left, top, right, bottom. By default, the region of the whole image is used.
language No "English" Specifies recognition language of the document. This parameter can contain several language names separated with commas, for example "English,French,German". See the list of available recognition languages. Note that not all languages are available for handprint recognition. The languages which are available for handprint recognition are marked with a special comment.
letterSet No "" Specifies the letter set, which should be used during recognition. Contains a string with the letter set characters. For example, "ABCDabcd'-.". By default, the letter set of the language, specified in the language parameter, is used.
regExp No ""

Specifies the regular expression which defines which words are allowed in the field and which are not. See the description of regular expressions. By default, the set of allowed words is defined by the dictionary of the language, specified in the language parameter.

Note that regular expressions do not strictly limit the set of characters of the output result, i.e. the recognized value may contain characters which are not included into the regular expression. During recognition all hypotheses of a word recognition are checked against the specified regular expression. If a given recognition variant conforms to the expression, it has higher probability of being selected as final recognition output. But if there is no variant that matches regular expression, the result will not conform to the expression. If you want to limit the set of characters, which can be recognized, the best way to do it is to use letterSet parameter.

textType No "normal" Specifies the type of the text in the field. This parameter may also contain several text types separated with commas, for example "normal,matrix". The following values can be used:
  • normal
  • typewriter
  • matrix
  • index
  • handprinted
  • ocrA
  • ocrB
  • e13b
  • cmc7
  • gothic
oneTextLine No "false" Specifies whether the field contains only one text line. The value should be true, if there is one text line in the field; otherwise it should be false.
oneWordPerTextLine No "false" Specifies whether the field contains only one word in each text line. The value should be true, if no text line contains more than one word (so the lines of text will be recognized as a single word); otherwise it should be false.
markingType No "simpleText" This property is valid only for the handprint recognition. Specifies the type of marking around letters (for example, underline, frame, box, etc.). By default, there is no marking around letters. The value can be one of the following:
  • simpleText
  • underlinedText
  • textInFrame
  • greyBoxes
  • charBoxSeries
  • simpleComb
  • combInFrame
  • partitionedFrame
Note: For correct handprint recognition specify the value of the placeholdersCount parameter.
placeholdersCount No "1" Specifies the number of character cells for the field.

This property has a sense only for the field marking types (the markingType parameter) that imply splitting the text in cells.

Default value for this property is 1, but you should set the appropriate value to recognize the text correctly.

writingStyle No "default" Provides additional information about handprinted letters writing style. It can be one of the following:
  • default
  • american
  • german
  • russian
  • polish
  • thai
  • japanese
  • arabic
  • baltic
  • british
  • bulgarian
  • canadian
  • czech
  • croatian
  • french
  • greek
  • hungarian
  • italian
  • romanian
  • slovak
  • spanish
  • turkish
  • ukrainian
  • common
  • chinese
  • azerbaijan
  • kazakh
  • kirgiz
  • latvian
description No "" Contains the description of the processing task. Must contain no more that 255 characters.
pdfPassword No "" Contains a password for accessing password-protected images in PDF format.

Status codes and response format

General status codes and response format of the method are described in HTTP Status Codes and Response Formats.

The following status codes can be returned when this method is called:

CodeDescription
200 Successful method call.
450 Incorrect parameters have been passed. One of the following errors occurred:
  • Image file has not been specified.
  • Incorrect region of the field has been specified.
  • Incorrect recognition language has been specified.
  • Incorrect regular expression has been specified.
  • Incorrect text type has been specified.
  • Handprinted text type is not supported for the specified languages.
  • Incorrect field marking type has been specified.
  • The number of cells in the field is not a positive number.
  • Incorrect writing style has been specified.
  • Task description length exceeds 255 characters.
  • Incorrect password for accessing password-protected image file has been specified.
  • Exceeded quota to add images. This error is returned if the number of images you have uploaded exceeds the number of images you can process with the credits available on your account plus some threshold. You can resolve this issue by topping up your account or by removing the tasks which have been submitted but have not been processed.
550 An internal program error occurred while processing the image.
551 An error occurred on the server side:
  • The format of the image file passed for processing is not supported.
  • The PDF file passed for processing has restrictions on creating raster images.

Output file format

The output XML file has the following format:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<document xmlns="@link" xmlns:xsi="@link" xsi:schemaLocation="@link" version="1.0">
	<field left=”0” top=”0” right=”199” bottom=”100” type=”text”>
		<value encoding=”UTF-16”>Data Capture Sample Text Data</value>
		<line left=”0” top=”0” right=”199” bottom=”100”>
			<char left=”0” top=”0” right=”199” bottom=”100” confidence=”98”>
D
</char> … </line> … </field> </document>