Cloud OCR SDK Documentation

processDocument Method

The method starts the processing task with the specified parameters.

[GET] http(s)://cloud.ocrsdk.com/processDocument

This method allows you to process several images using the same settings and obtain recognition result as a multi-page document. You can upload several images to one task using submitImage method.

It is also possible to specify up to three file formats for the result, in which case the server response for the completed task will contain several result URLs.

Only the task with Submitted, Completed or NotEnoughCredits status can be started using this method.

For details on task cost please see billing terms.


ParameterIs requiredDefault valueDescription
taskId Yes No Specifies the identifier of the task. If the task with the specified identifier does not exist or has been deleted, an error is returned.
language No "English" Specifies recognition language of the document. This parameter can contain several language names separated with commas, for example "English,French,German". See the list of available recognition languages.
profile No "documentConversion" Specifies a profile with predefined processing settings. It can be one of the following:
  • documentConversion
  • documentArchiving
  • textExtraction
  • fieldLevelRecognition
  • barcodeRecognition
textType No "normal" Specifies the type of the text in the document. This parameter may also contain several text types separated with commas, for example "normal,matrix". The following values can be used:
  • normal
  • typewriter
  • matrix
  • index
  • ocrA
  • ocrB
  • e13b
  • cmc7
  • gothic
imageSource No "auto"

Specifies the source of the image. It can be either a scanned image, or a photograph created with a digital camera. Special preprocessing operations can be performed with the image depending on the selected source. For example, the system can automatically correct distorted text lines, poor focus and lighting on photos.

The value of this parameter can be one of the following:

  • auto
    The image source is detected automatically.
  • photo
  • scanner
correctOrientation No "true" Specifies whether the orientation of the image should be automatically detected and corrected. It can have one of the following values:
  • true
    The page orientation is automatically detected, and if it differs from normal the image is rotated.
  • false
    The page orientation detection and correction is not performed.
correctSkew No "true" Specifies whether the skew of the image should be automatically detected and corrected. It can have either true or false value.
readBarcodes No "true" for xml export format and "false" in other cases Specifies whether barcodes must be detected on the image, recognized and exported to the result file. It can have either true or false value.
exportFormat No "rtf" Specifies the export format. This parameter can contain up to three export formats, separated with commas (example: "pdfa,txt,xml"). The available formats are:
  • txt
    The recognized text is exported to the file line by line from left to right. E.g. if the text was originally put in columns, the first lines of every column will be saved, then the second lines, etc.
    Please take into account the fact that in this format only text will be saved. No images or barcodes will remain in the output file. If you want to save the barcode recognition results in the exported file, use the txtUnstructured format.
  • txtUnstructured
    The exported file contains the text that was saved according to the order of the original blocks.
  • docx
  • xlsx
  • pptx
  • pdfSearchable
    The entire image is saved as a picture, the recognized text is put under it.
  • pdfTextAndImages
    The recognized text is saved as text, and the pictures are saved as pictures.
  • pdfa
    The file is saved in the PDF/A-1b format, with the entire image saved as a picture, and recognized text put under it.
  • xml
  • xmlForCorrectedImage
    The same as xml, but all coordinates written into the output XML file relate to the corrected image, not the original.
  • alto

If either of XML export formats is selected, barcodes are recognized on the image and saved to output XML no matter which profile is used for recognition.

Please note that setting multiple export formats does not affect the cost of task processing.

xml:writeRecognitionVariants No "false" Specifies whether the variants of characters recognition should be written to an output file in XML format. This parameter can be used only if the exportFormat parameter contains xml value. The parameter can have one of the following values:
  • true
  • false
pdf:writeTags No "auto"

Specifies whether the result must be written as tagged PDF. This parameter can be used only if the exportFormat parameter contains one of the values for export to PDF. It can have one of the following values:

  • auto
    Automatic selection: the tags are written into the output PDF file if it must comply with PDF/A-1a standard, and are not written otherwise.
  • write
  • dontWrite
description No "" Contains the description of the processing task. Cannot contain more than 255 characters.

Status codes and response format

General status codes and response format of the method are described in HTTP Status Codes and Response Formats.

The following status codes can be returned when this method is called:

200 Successful method call.
450 Incorrect parameters have been passed. One of the following errors occurred:
  • The identifier of the task has not been specified.
  • Incorrect recognition language has been specified.
  • Incorrect processing profile has been specified.
  • Incorrect export format has been specified.
  • The task with the specified identifier has been deleted.
  • There is no task with the specified identifier.
  • The task with the specified identifier cannot be started (e.g. because it is already being processed).
  • Task description length exceeds 255 characters.
550 An internal program error occurred while processing the image.
551 The number of uploaded images exceeds the allowed limit.