Skip to main content

Browser Agent

Control web browsers using AI agents through a Chrome Extension connected to PraisonAI.

Quick Start

1. Start the Bridge Server

praisonai browser start --port 8765 --model gpt-4o

2. Load the Chrome Extension

  1. Open chrome://extensions
  2. Enable β€œDeveloper mode”
  3. Load unpacked extension from praisonai-chrome-extension/dist

3. Use the Side Panel

Click the extension icon or press Ctrl+Shift+P to open the side panel. Enter your goal and the AI will control the browser.

Architecture

Flow: Chrome Extension ↔ WebSocket ↔ Bridge Server ↔ PraisonAI Agent The system consists of:
  • Chrome Extension: Captures page state and executes actions via CDP
  • Bridge Server: FastAPI WebSocket server that routes messages to agents
  • BrowserAgent: PraisonAI agent that decides actions based on observations
  • SessionManager: SQLite-based persistence for session history
  • Hybrid Mode: Falls back to on-device Gemini Nano if server unavailable

Session Flow

Session States

Smart Features

Click Fallbacks

When clicks fail, the agent automatically tries:
  1. Viewport click using getBoundingClientRect() + scrollIntoView()
  2. JavaScript click via element.click()
  3. Focus + Enter for buttons

Goal Context & Self-Correction

Every observation sent to the LLM includes:
  • Original goal: Always visible to prevent drift
  • Action history: Last 5 actions with success/failure status
  • Progress notes: Summary of steps completed

Failure Communication

When actions fail, the LLM receives explicit feedback:
β›” LAST ACTION FAILED!
   Error: All click methods failed for: a.MV3Tnb
   β†’ You MUST try a DIFFERENT approach!
This enables the agent to self-correct and find alternate paths.

CLI Commands

Run Browser Agent

Execute a goal directly from CLI with live progress display:
praisonai browser run "Go to google and search praisonai"
praisonai browser run "Find flights to Paris" --model gpt-4o
praisonai browser run "task" --debug  # Show all WebSocket messages
Options:
  • --url, -u: Start URL (default: https://www.google.com)
  • --model, -m: LLM model (default: gpt-4o-mini)
  • --timeout, -t: Timeout in seconds (default: 120)
  • --debug, -d: Debug mode - show all events
Example Output:
πŸš€ Starting browser agent
   Goal: Go to google and search praisonai
   Model: gpt-4o-mini

Session: 4a703667

Step 0: β–Ά TYPE β†’ textarea#APjFqb
        πŸ“ https://www.google.com/

Step 1: β–Ά CLICK

Step 2: β–Ά CLICK
        πŸ“ https://www.google.com/search?q=praisonai

βœ… Task completed!

Launch Browser with Goal

Launch Chrome with the extension and optionally run a goal:
# Just launch Chrome with extension
praisonai browser launch

# Launch and run goal
praisonai browser launch "Go to google and search AI"

# With specific engine
praisonai browser launch "Search for AI" --engine cdp
praisonai browser launch "Search for AI" --engine extension
Options:
  • --url, -u: Start URL (default: https://www.google.com)
  • --model, -m: LLM model (default: gpt-4o-mini)
  • --max-steps: Maximum steps (default: 20)
  • --engine: Automation engine: extension, cdp, auto (default: auto)
  • --debug, -d: Debug mode with detailed logging
  • --record-video: Record video of browser session
  • --profile: Enable performance profiling
  • --deep-profile: Enable deep profiling with cProfile

Performance Profiling

Track execution time per step to identify bottlenecks:
praisonai browser launch "Go to google, search for AI" --profile
Example Output:
πŸ“Š Performance Profile
──────────────────────────────────────────────────────────────────────
Total Time: 16.4s | Steps: 3 | Avg: 5.5s/step

Step |    LLM | Screen | Action | Verify | Stable |  Total
──────────────────────────────────────────────────────────────────────
   0 |   0.0s |   0.0s |   0.0s |   0.0s |   0.0s |   5.1s
   1 |   0.0s |   0.0s |   0.0s |   0.0s |   0.0s |   1.5s
   2 |   0.0s |   0.0s |   0.0s |   0.0s |   0.0s |   3.6s
──────────────────────────────────────────────────────────────────────
Total |   0.0s |   0.0s |   0.0s |   0.0s |   0.0s |  16.4s

Bottlenecks: LLM 0% | Verify 0% | Stable 0%
For deep function-level profiling (cProfile):
praisonai browser launch "goal" --deep-profile

Tab Management

praisonai browser tabs              # List all tabs
praisonai browser tabs --new https://google.com  # Open new tab
praisonai browser tabs --close TAB_ID    # Close tab
praisonai browser tabs --focus TAB_ID    # Focus tab
praisonai browser navigate "https://github.com"
praisonai browser navigate "https://docs.praison.ai" --tab TAB_ID

Execute JavaScript

praisonai browser execute "document.title"
praisonai browser execute "document.querySelectorAll('a').length"

Page Inspection (New)

Inspect browser pages without the extension:
# List all open pages
praisonai browser pages

# Get DOM tree
praisonai browser dom <PAGE_ID>

# Read page content as text
praisonai browser content <PAGE_ID>

# Capture console logs
praisonai browser console <PAGE_ID>

# Execute JavaScript
praisonai browser js <PAGE_ID> "document.title"
These commands work via CDP (Chrome DevTools Protocol) and require Chrome running with --remote-debugging-port=9222.

Automation Engines

Choose different execution engines with --engine:
# Extension mode (default) - requires extension
praisonai browser run "task" --engine extension

# CDP mode - direct Chrome control, no extension needed
praisonai browser run "task" --engine cdp

# Playwright mode - cross-browser, headless support
praisonai browser run "task" --engine playwright
EngineExtensionHeadlessMulti-Browser
extensionRequiredNoNo
cdpNoYesChrome only
playwrightNoYesChrome/Firefox/WebKit

Screenshot

praisonai browser screenshot -o page.png
praisonai browser screenshot --fullpage -o full.png

Start Server

praisonai browser start [OPTIONS]
Options:
  • --port, -p: Port to listen on (default: 8765)
  • --host, -H: Host to bind to (default: 0.0.0.0)
  • --model, -m: LLM model (default: gpt-4o-mini)
  • --max-steps: Maximum steps per session (default: 20)
  • --verbose, -v: Enable verbose logging

List Sessions

praisonai browser sessions [OPTIONS]
Options:
  • --status, -s: Filter by status (running, completed, failed)
  • --limit, -l: Maximum sessions to show

View History

praisonai browser history <SESSION_ID>

Clear Sessions

praisonai browser clear --status completed --yes

Reload Extension

Reload the Chrome extension after making changes:
praisonai browser reload
praisonai browser reload --port 9222  # Custom Chrome debug port

Health Diagnostics

Run health checks for the browser automation system:
praisonai browser doctor          # Run all checks
praisonai browser doctor server   # Check bridge server
praisonai browser doctor chrome   # Check Chrome debugging
praisonai browser doctor extension  # Check extension loaded
praisonai browser doctor db       # Check session database
Example Output:
Browser Health Check

βœ… Server: ok
   Connections: 1
   Sessions: 0

βœ… Chrome: Chrome/131.0.6778.85
   WebSocket: ws://127.0.0.1:9222/devtools/browser/...

βœ… Extension loaded
   URL: chrome-extension://fkmfdklcegbbpipbcimb...

βœ… Session database
   Path: ~/.praisonai/browser_sessions.db
   Sessions: 42
   Steps: 387

Python API

from praisonai.browser import BrowserServer, BrowserAgent

# Start server
server = BrowserServer(port=8765, model="gpt-4o")
server.start()  # Blocks

# Or create agent directly
agent = BrowserAgent(model="gpt-4o")
action = agent.process_observation({
    "task": "Search for AI frameworks",
    "url": "https://google.com",
    "title": "Google",
    "elements": [{"selector": "#search", "tag": "input", "text": ""}]
})

Session Management

from praisonai.browser.sessions import SessionManager

manager = SessionManager()

# Create session
session = manager.create_session("Find best restaurants")
print(session["session_id"])

# List sessions
sessions = manager.list_sessions(status="running")

# Get session details with steps
details = manager.get_session(session_id)
for step in details["steps"]:
    print(f"Step {step['step_number']}: {step['action']}")

Hybrid Mode (Extension)

The Chrome Extension supports hybrid mode:
  1. Bridge Mode: Connect to PraisonAI server for cloud LLMs
  2. Built-in Mode: Use Chrome’s Gemini Nano on-device
If the bridge server is unavailable, it automatically falls back to built-in AI.

Keyboard Shortcuts

ShortcutAction
Ctrl+Shift+PToggle side panel
Alt+AStart agent
Alt+SCapture screenshot

Supported Actions

ActionDescription
clickClick on element
typeEnter text
submitPress Enter to submit forms
scrollScroll page
navigateGo to URL
clear_inputClear input field (fixes garbled/duplicated text)
waitWait for page
screenshotCapture screen
doneTask complete

Error Detection & Recovery (v1.3+)

The agent automatically detects and recovers from errors:

Detected Errors

  • Garbled/duplicated text in input fields
  • Wrong page navigation (user or browser interference)
  • Failed actions (click not working, submit didn’t fire)
  • Blocking elements (popups, consent dialogs, login walls)

Recovery Actions

When errors are detected, the agent will:
  1. Set error_detected: true with description
  2. Report input_field_value showing actual text visible
  3. Use clear_input to fix garbled input
  4. Use navigate to return to correct URL if off-track

Step Timestamps

Debug mode now shows elapsed time for each step:
praisonai browser launch "goal" --debug
Output:
[+0.0s] Step 1: type β†’ #APjFqb = "search term" (done=False)
   πŸ“ Input field shows: "search term"
   πŸ“Š Progress: 50% [βœ“ on track]

[+2.3s] Step 2: submit β†’ #APjFqb (done=False)
   πŸ“Š Progress: 75% [βœ“ on track]

[+4.1s] Step 3: done (done=True)
   πŸ“Š Progress: 100% [βœ“ on track]

Performance Optimized

Action delays have been optimized for faster execution:
  • Click: 200ms (was 500ms)
  • Submit: 300ms (was 500ms)
  • Search: 400ms (was 1000ms)

WebSocket Protocol

Connect to ws://localhost:8765/ws and send/receive JSON messages:
// Start session
{"type": "start_session", "goal": "Find flights to Paris", "model": "gpt-4o"}

// Send observation
{"type": "observation", "session_id": "...", "task": "...", 
 "url": "...", "elements": [...]}

// Receive action
{"type": "action", "action": "click", "selector": "#search", 
 "thought": "Clicking search button"}

Environment Variables

VariableDescription
OPENAI_API_KEYOpenAI API key for GPT models
ANTHROPIC_API_KEYAnthropic API key for Claude
GOOGLE_API_KEYGoogle API key for Gemini