Browser Agent

Control web browsers using AI agents through a Chrome Extension connected to PraisonAI.

Quick Start

1. Start the Bridge Server

praisonai browser start --port 8765 --model gpt-4o

2. Load the Chrome Extension

Open chrome://extensions
Enable “Developer mode”
Load unpacked extension from praisonai-chrome-extension/dist

3. Use the Side Panel

Click the extension icon or press Ctrl+Shift+P to open the side panel. Enter your goal and the AI will control the browser.

Architecture

Flow: Chrome Extension ↔ WebSocket ↔ Bridge Server ↔ PraisonAI Agent The system consists of:

Chrome Extension: Captures page state and executes actions via CDP
Bridge Server: FastAPI WebSocket server that routes messages to agents
BrowserAgent: PraisonAI agent that decides actions based on observations
SessionManager: SQLite-based persistence for session history
Hybrid Mode: Falls back to on-device Gemini Nano if server unavailable

Session Flow

Session States

Smart Features

Click Fallbacks

When clicks fail, the agent automatically tries:

Viewport click using getBoundingClientRect() + scrollIntoView()
JavaScript click via element.click()
Focus + Enter for buttons

Goal Context & Self-Correction

Every observation sent to the LLM includes:

Original goal: Always visible to prevent drift
Action history: Last 5 actions with success/failure status
Progress notes: Summary of steps completed

Failure Communication

When actions fail, the LLM receives explicit feedback:

⛔ LAST ACTION FAILED!
   Error: All click methods failed for: a.MV3Tnb
   → You MUST try a DIFFERENT approach!

This enables the agent to self-correct and find alternate paths.

CLI Commands

Run Browser Agent

Execute a goal directly from CLI with live progress display:

praisonai browser run "Go to google and search praisonai"
praisonai browser run "Find flights to Paris" --model gpt-4o
praisonai browser run "task" --debug  # Show all WebSocket messages

Options:

--url, -u: Start URL (default: https://www.google.com)
--model, -m: LLM model (default: gpt-4o-mini)
--timeout, -t: Timeout in seconds (default: 120)
--debug, -d: Debug mode - show all events

Example Output:

🚀 Starting browser agent
   Goal: Go to google and search praisonai
   Model: gpt-4o-mini

Session: 4a703667

Step 0: ▶ TYPE → textarea#APjFqb
        📍 https://www.google.com/

Step 1: ▶ CLICK

Step 2: ▶ CLICK
        📍 https://www.google.com/search?q=praisonai

✅ Task completed!

Launch Browser with Goal

Launch Chrome with the extension and optionally run a goal:

# Just launch Chrome with extension
praisonai browser launch

# Launch and run goal
praisonai browser launch "Go to google and search AI"

# With specific engine
praisonai browser launch "Search for AI" --engine cdp
praisonai browser launch "Search for AI" --engine extension

Options:

--url, -u: Start URL (default: https://www.google.com)
--model, -m: LLM model (default: gpt-4o-mini)
--max-steps: Maximum steps (default: 20)
--engine: Automation engine: extension, cdp, auto (default: auto)
--debug, -d: Debug mode with detailed logging
--record-video: Record video of browser session
--profile: Enable performance profiling
--deep-profile: Enable deep profiling with cProfile

Performance Profiling

Track execution time per step to identify bottlenecks:

praisonai browser launch "Go to google, search for AI" --profile

Example Output:

📊 Performance Profile
──────────────────────────────────────────────────────────────────────
Total Time: 16.4s | Steps: 3 | Avg: 5.5s/step

Step |    LLM | Screen | Action | Verify | Stable |  Total
──────────────────────────────────────────────────────────────────────
   0 |   0.0s |   0.0s |   0.0s |   0.0s |   0.0s |   5.1s
   1 |   0.0s |   0.0s |   0.0s |   0.0s |   0.0s |   1.5s
   2 |   0.0s |   0.0s |   0.0s |   0.0s |   0.0s |   3.6s
──────────────────────────────────────────────────────────────────────
Total |   0.0s |   0.0s |   0.0s |   0.0s |   0.0s |  16.4s

Bottlenecks: LLM 0% | Verify 0% | Stable 0%

For deep function-level profiling (cProfile):

praisonai browser launch "goal" --deep-profile

Tab Management

praisonai browser tabs              # List all tabs
praisonai browser tabs --new https://google.com  # Open new tab
praisonai browser tabs --close TAB_ID    # Close tab
praisonai browser tabs --focus TAB_ID    # Focus tab

Navigate

praisonai browser navigate "https://github.com"
praisonai browser navigate "https://docs.praison.ai" --tab TAB_ID

Execute JavaScript

praisonai browser execute "document.title"
praisonai browser execute "document.querySelectorAll('a').length"

Page Inspection (New)

Inspect browser pages without the extension:

# List all open pages
praisonai browser pages

# Get DOM tree
praisonai browser dom <PAGE_ID>

# Read page content as text
praisonai browser content <PAGE_ID>

# Capture console logs
praisonai browser console <PAGE_ID>

# Execute JavaScript
praisonai browser js <PAGE_ID> "document.title"

These commands work via CDP (Chrome DevTools Protocol) and require Chrome running with --remote-debugging-port=9222.

Automation Engines

Choose different execution engines with --engine:

# Extension mode (default) - requires extension
praisonai browser run "task" --engine extension

# CDP mode - direct Chrome control, no extension needed
praisonai browser run "task" --engine cdp

# Playwright mode - cross-browser, headless support
praisonai browser run "task" --engine playwright

Engine	Extension	Headless	Multi-Browser
extension	Required	No	No
cdp	No	Yes	Chrome only
playwright	No	Yes	Chrome/Firefox/WebKit

Screenshot

praisonai browser screenshot -o page.png
praisonai browser screenshot --fullpage -o full.png

Start Server

praisonai browser start [OPTIONS]

Options:

--port, -p: Port to listen on (default: 8765)
--host, -H: Host to bind to (default: 0.0.0.0)
--model, -m: LLM model (default: gpt-4o-mini)
--max-steps: Maximum steps per session (default: 20)
--verbose, -v: Enable verbose logging

List Sessions

praisonai browser sessions [OPTIONS]

Options:

--status, -s: Filter by status (running, completed, failed)
--limit, -l: Maximum sessions to show

View History

praisonai browser history <SESSION_ID>

Clear Sessions

praisonai browser clear --status completed --yes

Reload Extension

Reload the Chrome extension after making changes:

praisonai browser reload
praisonai browser reload --port 9222  # Custom Chrome debug port

Health Diagnostics

Run health checks for the browser automation system:

praisonai browser doctor          # Run all checks
praisonai browser doctor server   # Check bridge server
praisonai browser doctor chrome   # Check Chrome debugging
praisonai browser doctor extension  # Check extension loaded
praisonai browser doctor db       # Check session database

Example Output:

Browser Health Check

✅ Server: ok
   Connections: 1
   Sessions: 0

✅ Chrome: Chrome/131.0.6778.85
   WebSocket: ws://127.0.0.1:9222/devtools/browser/...

✅ Extension loaded
   URL: chrome-extension://fkmfdklcegbbpipbcimb...

✅ Session database
   Path: ~/.praisonai/browser_sessions.db
   Sessions: 42
   Steps: 387

Python API

from praisonai.browser import BrowserServer, BrowserAgent

# Start server
server = BrowserServer(port=8765, model="gpt-4o")
server.start()  # Blocks

# Or create agent directly
agent = BrowserAgent(model="gpt-4o")
action = agent.process_observation({
    "task": "Search for AI frameworks",
    "url": "https://google.com",
    "title": "Google",
    "elements": [{"selector": "#search", "tag": "input", "text": ""}]
})

Session Management

from praisonai.browser.sessions import SessionManager

manager = SessionManager()

# Create session
session = manager.create_session("Find best restaurants")
print(session["session_id"])

# List sessions
sessions = manager.list_sessions(status="running")

# Get session details with steps
details = manager.get_session(session_id)
for step in details["steps"]:
    print(f"Step {step['step_number']}: {step['action']}")

Hybrid Mode (Extension)

The Chrome Extension supports hybrid mode:

Bridge Mode: Connect to PraisonAI server for cloud LLMs
Built-in Mode: Use Chrome’s Gemini Nano on-device

If the bridge server is unavailable, it automatically falls back to built-in AI.

Keyboard Shortcuts

Shortcut	Action
`Ctrl+Shift+P`	Toggle side panel
`Alt+A`	Start agent
`Alt+S`	Capture screenshot

Supported Actions

Action	Description
`click`	Click on element
`type`	Enter text
`submit`	Press Enter to submit forms
`scroll`	Scroll page
`navigate`	Go to URL
`clear_input`	Clear input field (fixes garbled/duplicated text)
`wait`	Wait for page
`screenshot`	Capture screen
`done`	Task complete

Error Detection & Recovery (v1.3+)

The agent automatically detects and recovers from errors:

Detected Errors

Garbled/duplicated text in input fields
Wrong page navigation (user or browser interference)
Failed actions (click not working, submit didn’t fire)
Blocking elements (popups, consent dialogs, login walls)

Recovery Actions

When errors are detected, the agent will:

Set error_detected: true with description
Report input_field_value showing actual text visible
Use clear_input to fix garbled input
Use navigate to return to correct URL if off-track

Step Timestamps

Debug mode now shows elapsed time for each step:

praisonai browser launch "goal" --debug

Output:

[+0.0s] Step 1: type → #APjFqb = "search term" (done=False)
   📝 Input field shows: "search term"
   📊 Progress: 50% [✓ on track]

[+2.3s] Step 2: submit → #APjFqb (done=False)
   📊 Progress: 75% [✓ on track]

[+4.1s] Step 3: done (done=True)
   📊 Progress: 100% [✓ on track]

Performance Optimized

Action delays have been optimized for faster execution:

Click: 200ms (was 500ms)
Submit: 300ms (was 500ms)
Search: 400ms (was 1000ms)

WebSocket Protocol

Connect to ws://localhost:8765/ws and send/receive JSON messages:

// Start session
{"type": "start_session", "goal": "Find flights to Paris", "model": "gpt-4o"}

// Send observation
{"type": "observation", "session_id": "...", "task": "...", 
 "url": "...", "elements": [...]}

// Receive action
{"type": "action", "action": "click", "selector": "#search", 
 "thought": "Clicking search button"}

Environment Variables

Variable	Description
`OPENAI_API_KEY`	OpenAI API key for GPT models
`ANTHROPIC_API_KEY`	Anthropic API key for Claude
`GOOGLE_API_KEY`	Google API key for Gemini

Getting Started

Core Concepts

Guides

Features

Models

Databases

Observability

Memory

Knowledge

RAG

Persistence

Tools

Other Features

Developers

Configuration

Best Practices

Getting Started (No Code)

​Browser Agent

​Quick Start

​1. Start the Bridge Server

​2. Load the Chrome Extension

​3. Use the Side Panel

​Architecture

​Session Flow

​Session States

​Smart Features

​Click Fallbacks

​Goal Context & Self-Correction

​Failure Communication

​CLI Commands

​Run Browser Agent

​Launch Browser with Goal

​Performance Profiling

​Tab Management

​Navigate

​Execute JavaScript

​Page Inspection (New)

​Automation Engines

​Screenshot

​Start Server

​List Sessions

​View History

​Clear Sessions

​Reload Extension

​Health Diagnostics

​Python API

​Session Management

​Hybrid Mode (Extension)

​Keyboard Shortcuts

​Supported Actions

​Error Detection & Recovery (v1.3+)

​Detected Errors

​Recovery Actions

​Step Timestamps

​Performance Optimized

​WebSocket Protocol

​Environment Variables

Browser Agent

Quick Start

1. Start the Bridge Server

2. Load the Chrome Extension

3. Use the Side Panel

Architecture

Session Flow

Session States

Smart Features

Click Fallbacks

Goal Context & Self-Correction

Failure Communication

CLI Commands

Run Browser Agent

Launch Browser with Goal

Performance Profiling

Tab Management

Navigate

Execute JavaScript

Page Inspection (New)

Automation Engines

Screenshot

Start Server

List Sessions

View History

Clear Sessions

Reload Extension

Health Diagnostics

Python API

Session Management

Hybrid Mode (Extension)

Keyboard Shortcuts

Supported Actions

Error Detection & Recovery (v1.3+)

Detected Errors

Recovery Actions

Step Timestamps

Performance Optimized

WebSocket Protocol

Environment Variables