OCR Overview - PraisonAI

Source must be a URL (https://) or base64-encoded document. Local file paths are not supported.

Extract Text

Basic
Advanced

from praisonaiagents import OCRAgent

agent = OCRAgent()
text = agent.read("https://arxiv.org/pdf/2201.04234")
print(text)

from praisonaiagents import OCRAgent

agent = OCRAgent()

# Extract specific pages (0-indexed)
result = agent.extract("https://arxiv.org/pdf/2201.04234", pages=[0, 1])
for page in result.pages:
    print(f"Page {page.index}: {page.markdown[:100]}")

From Image URL

from praisonaiagents import OCRAgent

agent = OCRAgent()
text = agent.read("https://example.com/screenshot.png")
print(text)

Providers

Mistral

Best-in-class OCR

Mistral OCRDocument and image OCR