OCR API Documentation
Complete reference for OCR document processing endpoints
Authentication
Include your API key in the Authorization header using Bearer token format:
Authorization: Bearer your_api_key_here
Health Check
Authentication
Not required
Response
200 OK - Service is running
Use Case
Infrastructure monitoring and health checks
Submit File for Processing
Request Parameters
file
IFormFile - File to be processed (max size depends on your plan: Free 4MB, paid plans higher limits). Supported formats: PDF, JPEG, PNG, TIFF, BMP
languages
string - Languages for OCR using ISO codes (e.g., "en,lt,ru")
outputs
string - Output formats (e.g., "pdf,text")
models
string - OCR model to use (default: "tesseract"). Only "tesseract" is available.
priority
integer - Job priority (1-10, where 1 is highest, default: 5)
webhookUrl
string - URL to notify when processing is complete. Must be publicly accessible
reference
string - Customer's own ID/reference for document identification
jobExpiryInMinutes
integer - Job expiry time in minutes (default: 1440, range: 1-10080)
response
string - Result handling mode (default: "polling"). Options: "polling", "direct", "webhook"
Response
{ "id": "uuid", // Unique identifier for the task "status": "string", // Current status of the task "reference": "string", // Customer's reference code if provided "models": "string", // OCR model used for processing "priority": 5, // Job priority (1-10) "processingTimeInSeconds": 0.0, // Time taken to process request "responseMode": "string", // How results are returned (direct, polling, webhook) "content": [{ // Array of immediate results (direct mode only) "text": "string", // Plain text content "pdf": "string" // Base64-encoded PDF content }], "error": { // Error details if failed (omitted if null) "message": "string" // Human-readable error description } }
Example Request
curl -X POST https://api.ainova.systems/api/v1/ocr/submit-file \ -H "Authorization: Bearer your_api_key" \ -F "file=@document.pdf" \ -F "languages=en,ru" \ -F "outputs=text,pdf" \ -F "models=tesseract" \ -F "priority=5" \ -F "response=polling"
Submit Base64 Document
Request Body (JSON)
{ "documentBase64": "string", // Required - Base64-encoded document content (max size depends on plan) "filename": "string", // Optional - Filename with extension "languages": "string", // Required - Languages for OCR (e.g., "en,lt,ru") "outputs": "string", // Required - Output formats (e.g., "pdf,text") "models": "string", // Optional - OCR model (default: "tesseract") "priority": 5, // Optional - Job priority (1-10, default: 5) "webhookUrl": "string", // Optional - URL for completion notifications "reference": "string", // Optional - Customer's reference ID "jobExpiryInMinutes": 1440, // Optional - Job expiry (default: 1440) "response": "string" // Optional - "polling", "direct", "webhook" }
Example Request
curl -X POST https://api.ainova.systems/api/v1/ocr/submit-base64 \ -H "Authorization: Bearer your_api_key" \ -H "Content-Type: application/json" \ -d '{ "documentBase64": "JVBERi0xLjQKJdPr6eEKMSAwIG9iago8PAo...", "filename": "document.pdf", "languages": "en,ru", "outputs": "text,pdf", "response": "polling" }'
Response format: Same as submit-file endpoint
Submit External URL Document
Request Body (JSON)
{ "sourceUrl": "string", // Required - External URL where document is stored "languages": "string", // Required - Languages for OCR (e.g., "en,lt,ru") "outputs": "string", // Required - Output formats (e.g., "pdf,text") "models": "string", // Optional - OCR model (default: "tesseract") "priority": 5, // Optional - Job priority (1-10, default: 5) "webhookUrl": "string", // Optional - URL for completion notifications "reference": "string", // Optional - Customer's reference ID "jobExpiryInMinutes": 1440, // Optional - Job expiry (default: 1440) "response": "string" // Optional - "polling", "direct", "webhook" }
Example Request
curl -X POST https://api.ainova.systems/api/v1/ocr/submit-url \ -H "Authorization: Bearer your_api_key" \ -H "Content-Type: application/json" \ -d '{ "sourceUrl": "https://example.com/document.pdf", "languages": "en,ru", "outputs": "text,pdf", "response": "polling" }'
Response format: Same as submit-file endpoint
Get Job Status
Path Parameters
jobId
UUID - The unique identifier of the task returned from submit endpoints
Response
{ "jobId": "uuid", // Unique identifier for the task "status": "string", // Current status: created, accepted, processing, completed, failed, cancelled "reference": "string", // Customer's reference code if provided "models": "string", // OCR model used for processing "priority": 5, // Job priority (1-10) "error": "string", // Error message if failed "createdAt": "datetime", // When the job was created "startedAt": "datetime", // When processing started "completedAt": "datetime", // When processing completed "expiryAt": "datetime", // When the job expires "processingTimeInSeconds": 0.0, // Time elapsed from creation to completion "files": { // Download URLs when status is "Completed" "text": { "url": "string", // Pre-signed URL for text file "expiration": "datetime" // URL expiration time (typically 1 hour) }, "pdf": { "url": "string", // Pre-signed URL for PDF file "expiration": "datetime" // URL expiration time }, } }
Example Request
curl -X GET https://api.ainova.systems/api/v1/ocr/jobs/123e4567-e89b-12d3-a456-426614174000 \ -H "Authorization: Bearer your_api_key"
Data Types & Enums
Files are stored and the client must poll for status using the job status endpoint. Best for most use cases.
Files are returned directly in the API response content array. Use for small documents requiring immediate results.
Files are stored and a webhook notification is sent to the specified URL upon completion. Best for async processing.
Webhook Integration
When response
is set to "webhook"
and webhookUrl
is provided:
- System processes the document asynchronously
- Upon completion, sends POST request to webhook URL
- Webhook payload contains the complete job status result - identical to API response
Webhook Consistency
- Webhook payload structure is identical to
GET /api/v1/ocr/jobs/{jobId}
API response - Same enum serialization (lowercase strings)
- Same file structure with pre-signed URLs
- Same error handling and optional fields
{ "result": { "id": "uuid", // Required - Unique identifier for the task "status": "string", // Required - Current status (e.g., "completed", "failed") "reference": "string", // Optional - Customer's reference code if provided "models": "string", // Required - OCR model(s) used (comma-separated string) "priority": 5, // Required - Job priority (1-10, where 1 is highest) "responseMode": "string", // Required - How results are returned ("webhook") "error": "string", // Optional - Error message if failed "createdAt": "datetime", // Required - When the job was created "startedAt": "datetime", // Optional - When processing started "completedAt": "datetime", // Optional - When processing completed "expiryAt": "datetime", // Optional - When the job expires "processingTimeInSeconds": 0.0, // Required - Time elapsed from creation to completion "files": [{ // Optional - Array of download URLs (empty array if no files ready) "models": "string", // OCR models used for this result "pdf": { // Download URL for PDF output (omitted if not requested) "url": "string", // Pre-signed URL for downloading the file "expiration": "datetime" // When the download URL expires }, "text": { // Download URL for text output (omitted if not requested) "url": "string", // Pre-signed URL for downloading the file "expiration": "datetime" // When the download URL expires } }] } }
{ "result": { "id": "132376e4-201a-43c1-b71e-cd2e86980706", "status": "completed", "reference": "A123456789", "models": "tesseract", "priority": 5, "responseMode": "webhook", "createdAt": "2025-01-05T18:00:00Z", "startedAt": "2025-01-05T18:01:00Z", "completedAt": "2025-01-05T18:10:00Z", "expiryAt": "2025-01-06T18:00:00Z", "processingTimeInSeconds": 600.0, "files": [{ "models": "tesseract", "pdf": { "url": "https://s3.example.com/results/132376e4-201a-43c1-b71e-cd2e86980706_pdf.pdf?X-Amz-Expires=3600", "expiration": "2025-01-05T19:10:00Z" }, "text": { "url": "https://s3.example.com/results/132376e4-201a-43c1-b71e-cd2e86980706_text.txt?X-Amz-Expires=3600", "expiration": "2025-01-05T19:10:00Z" } }] } }
{ "result": { "id": "132376e4-201a-43c1-b71e-cd2e86980706", "status": "failed", "reference": "A123456789", "models": "tesseract", "priority": 5, "responseMode": "webhook", "error": "Invalid file format: corrupted PDF", "createdAt": "2025-01-05T18:00:00Z", "startedAt": "2025-01-05T18:01:00Z", "completedAt": "2025-01-05T18:02:00Z", "expiryAt": "2025-01-06T18:00:00Z", "processingTimeInSeconds": 120.0, "files": [] } }
Supported Languages
European Languages
sq
Albanianeu
Basquebe
Belarusianbg
Bulgarianca
Catalanhr
Croatiancs
Czechda
Danishnl
Dutchen
Englishet
Estonianfi
Finnishfr
Frenchgl
Galiciande
Germanel
Greekhu
Hungarianis
Icelandicga
Irishit
Italianlv
Latvianlt
Lithuanianmk
Macedonianmt
Malteseno
Norwegianpl
Polishpt
Portuguesero
Romanianru
Russiansr
Serbiansk
Slovaksl
Slovenianes
Spanishsv
Swedishuk
Ukrainiancy
Welshbs
Bosnianbr
Bretonlb
Luxembourgishgd
Scottish GaelicAsian Languages
as
Assamesebn
Bengaliceb
Cebuanozh-cn
Chinese (Simplified)zh-tw
Chinese (Traditional)hi
Hindija
Japaneseko
Koreanpa
Punjabita
Tamilte
Teluguth
Thaibo
Tibetanur
Urduvi
VietnameseMiddle Eastern
ar
Arabicaz
Azerbaijanitr
TurkishAfrican Languages
am
Amharicsw
SwahiliNote: If an unsupported language code is provided, the API will return a 400 Bad Request error with details about supported languages.
Error Handling
"Invalid file format"
Unsupported file type
"File too large"
Exceeds size limits
"Processing timeout"
Job took too long
"Invalid language code"
Unsupported OCR language
"Priority must be between 1 and 10"
Invalid priority value
{ "error": { "message": "Human-readable error description" } }
Rate Limits & Constraints
Processing Logic: Jobs are processed by priority first, then by creation time (FIFO) within the same priority level.
Ready to start processing documents?
Get your API key and start building with OCR API today