Skip to main content

VisionAgent

Defined in the Vision Agent module.
AI Agent A specialized agent for image analysis and understanding. Provides:
  • Image analysis and description
  • Multi-image comparison
  • Text extraction from images
Supported Providers:
  • OpenAI: gpt-4o, gpt-4o-mini, gpt-4-turbo
  • Anthropic: claude-3-5-sonnet-20241022, claude-3-opus-20240229
  • Google: gemini/gemini-1.5-pro, gemini/gemini-1.5-flash

Constructor

name
Optional
No description available.
instructions
Optional
No description available.
llm
Optional
No description available.
model
Optional
No description available.
base_url
Optional
No description available.
api_key
Optional
No description available.
vision
Optional
No description available.
verbose
Union
default:"True"
No description available.

Methods

console()

Lazily initialize Rich Console.

litellm()

Lazy load litellm module when needed.

analyze()

Analyze an image and return analysis.

describe()

Generate a detailed description of an image.

compare()

Compare multiple images.

extract_text()

Extract text from an image (OCR-like functionality).

aanalyze()

Async version of analyze().

adescribe()

Async version of describe().

acompare()

Async version of compare().

aextract_text()

Async version of extract_text().

Usage

from praisonaiagents import VisionAgent
    
    # Simple usage
    agent = VisionAgent()
    description = agent.describe("https://example.com/image.jpg")
    print(description)
    
    # Analyze with custom prompt
    result = agent.analyze(
        "https://example.com/chart.png",
        prompt="What data does this chart show?"
    )
    
    # Compare images
    comparison = agent.compare([
        "image1.jpg",
        "image2.jpg"
    ])
    
    # Extract text
    text = agent.extract_text("document.png")

Source

View on GitHub

praisonaiagents/agent/vision_agent.py at line 48

Agents Concept

Single Agent Guide

Multi-Agent Guide

Agent Configuration

Auto Agents