‚Äč

Cloud OCR SDK Documentation

Output XML with Receipt Data

To recognize receipts with Cloud OCR SDK, you can use the processReceipt method which returns the data extracted from the receipt in XML format.

The fields extracted from the receipt are represented by XML elements in the output file. Two features are shared by many of the fields:

  1. The field value may be presented in two forms: normalized value (standard for all fields of this type: e.g. for phone numbers the spaces would be removed, useful for further automatic processing) and recognized value (the field text as it is found on the receipt, useful for manual verification by the operator). The two forms are represented by child elements called normalizedValue and recognizedValue.
  2. The confidence level and the boolean flag to indicate the probability of correct field extraction are available. These are represented by the confidence and isSuspicious optional attributes.
    • confidence is a numerical estimate that the field is extracted correctly. It can be in the range from 0 to 100.
    • isSuspicious uses the same estimate internally to determine when the field extraction is uncertain. The cutoff level is currently tuned to provide 97% accuracy, which matches the probable error level for a human operator.

You can find the description of the main elements of this XML file in the table below. See also the XML schema of an XML document.

Name/TypeDescription
receipts
(receiptsType)
The root element, representing a collection of receipts recognized in one method call. Contains a sequence of receipt elements. The element has the following attributes:
  • count — the number of the receipts (integer)
receipt
(receiptType)
A recognized receipt. May include the following elements, depending on the fields that were found on the receipt:
  • vendor — the vendor organization which issued the receipt
  • date — the date the receipt was printed
  • time — the time the receipt was printed
  • subTotal — the subtotal of the receipt
  • total — the total sum of the receipt
  • tax — the sales tax
  • payment — information about payment
  • recognizedItems — the collection of lines of the receipt, each corresponding to one type of purchased item
  • recognizedText — the full text of the receipt as a string
  • ruEklz [Russian receipts only] — EKLZ (fiscal memory device) identifier
  • ruKpkNumber [Russian receipts only] — KPK number (security code identifier)
  • ruKpkValue [Russian receipts only] — KPK value (security code value)
The element has the following attributes:
  • currency — the receipt currency. Can have one of the following values: $, USD, JPYEUR, BRL, RUB, CNY, KRW, TWD, GBP, AUD, SGD, CAD, Undefined.
vendor
(vendorType)

The vendor organization which issued the receipt. May include the following elements:

  • name — the name of the vendor
  • address — the address of the vendor
  • phone — the phone number
  • fax — the fax number
  • purchaseType — purchase type (string)
  • city — the city component of the address
  • zip — the zip code component of the address
  • administrativeRegion — the administrative region (e.g. state, province, region) component of the address
  • vatNumber — the taxpayer identifier specified on the receipt (VATIN, TIN, etc.)
The element has the following attributes:
  • confidence
  • isSuspicious

date
(dateType)

The date when the receipt was printed. Includes the following elements:
  • normalizedValue
  • recognizedValue
The element has the following attributes:
  • confidence
  • isSuspicious
time
(timeType)
The time when the receipt was printed. Includes the following elements:
  • normalizedValue
  • recognizedValue
The element has the following attributes:
  • confidence
  • isSuspicious
tax
(taxType)
The sales tax of the receipt. Includes the following elements:
  • normalizedValue
  • recognizedValue
The element has the following attributes:
  • total — indicates if the element represents the total tax for the receipt (boolean)
  • type — (optional) the name of the tax, only for specific taxes; this attribute will not be filled in if total attribute is true (string)
  • rate — (optional) the rate of the tax, only for specific taxes; this attribute will not be filled in if total attribute is true (decimal)
  • confidence
  • isSuspicious
payment
(paymentType)
Payment information for the receipt. Includes the following elements:
  • value — the sum paid
  • cardNumber — the number of the card, if any
The element has the following attributes:
  • type — the type of payment. Can have one of the following values: Card, Cash, Undefined, Mixed.
  • cardType — (optional) the type of the card used. Can have one of the following values: AmericanExpress, EuroCard, MasterCard, Visa, Undefined
  • confidence
  • isSuspicious
recognizedItems
(recognizedItemsType)
Contains a sequence of item tags. The element has the following attributes:
  • count — the number of items (integer)
item
(recognizedItemType)
Represents one line of the receipt, containing one type of purchased item. May include the following elements:
  • name — the name of the item
  • price — the price of the item
  • count — the number or amount of the items purchased
  • total — the total sum paid for the amount
  • tax — the sales tax for this item
  • recognizedText — the full text of the line (string)
  • sku — the stock-keeping unit identifier (SKU) of the item
  • amountUnits — the unit of measurement for this item
The element has the following attributes:
  • index — the index of the line in the receipt (integer)
amountUnits
(amountUnitsType)

The units of measurement. Based on the enumeration with the following values: unit (piece), l (liter), ml (milliliter), fl oz (fluid ounce), gal (gallon), kg (kilogram), g (gram), q (quintal), lb (pound), oz (ounce), m (meter), cm (centimeter), ft (foot), in (inch), pt (point), Unknown.

The element has the following attributes:
  • confidence
  • isSuspicious
classifiedItems

This element and all its child elements will not be used in the result of receipt recognition by Cloud OCR SDK.

It is supported only by ABBYY Receipt Capture SDK, where it delivers the collection of items on the receipt which are taking part in a specific customer loyalty management program. The client application is responsible for providing information about types of products, coupons, etc., and creating, on the basis of this information, a commodity model which will be used to determine which items are of interest.

totalType A sum of money specified in one of the lines, subtotal or total of the receipt. Includes the following elements:
  • normalizedValue
  • recognizedValue
The element has the following attributes:
  • confidence
  • isSuspicious

normalizedValueType

Provides the normalized and recognized values for a receipt field. Includes the following elements:
  • normalizedValue
  • recognizedValue
The element has the following attributes:
  • confidence
  • isSuspicious
recognizedValueType Provides the recognized text of a receipt field, with the information on which characters were recognized uncertainly. Includes the following elements:
  • text — the recognized text (string)
  • suspiciousCharacters — uncertainly recognized characters
  • region — the region on the image enclosing the recognized text
The element has the following attributes:
  • confidence
  • isSuspicious
suspiciousCharacters
(suspiciousCharactersType)
Contains a sequence of sc elements, representing character indices for all uncertainly recognized characters. Note that indexing starts with 1.
sc
(charType)
An uncertainly recognized character. The element has the following attributes:
  • — the index of the character in the text (integer). Note that indexing starts with 1.
region
(regionType)

The region on the image. Contains a sequence of vertex elements, representing the vertices of the region.

The element has the following attributes:
  • isInOriginalImage — indicates if the region coordinates are on the original uncorrected image (boolean)
vertex
(vertexType)
A vertex of the region. The element has the following attributes:
  • — the horizontal coordinate of the point (integer)
  • y — the vertical coordinate of the point (integer)