Browser Agent
Control web browsers using AI agents through a Chrome Extension connected to PraisonAI.
Quick Start
1. Start the Bridge Server
praisonai browser start --port 8765 --model gpt-4o
2. Load the Chrome Extension
- Open
chrome://extensions
- Enable βDeveloper modeβ
- Load unpacked extension from
praisonai-chrome-extension/dist
3. Use the Side Panel
Click the extension icon or press Ctrl+Shift+P to open the side panel. Enter your goal and the AI will control the browser.
Architecture
Flow: Chrome Extension β WebSocket β Bridge Server β PraisonAI Agent
The system consists of:
- Chrome Extension: Captures page state and executes actions via CDP
- Bridge Server: FastAPI WebSocket server that routes messages to agents
- BrowserAgent: PraisonAI agent that decides actions based on observations
- SessionManager: SQLite-based persistence for session history
- Hybrid Mode: Falls back to on-device Gemini Nano if server unavailable
Session Flow
Session States
Smart Features
Click Fallbacks
When clicks fail, the agent automatically tries:
- Viewport click using
getBoundingClientRect() + scrollIntoView()
- JavaScript click via
element.click()
- Focus + Enter for buttons
Goal Context & Self-Correction
Every observation sent to the LLM includes:
- Original goal: Always visible to prevent drift
- Action history: Last 5 actions with success/failure status
- Progress notes: Summary of steps completed
Failure Communication
When actions fail, the LLM receives explicit feedback:
β LAST ACTION FAILED!
Error: All click methods failed for: a.MV3Tnb
β You MUST try a DIFFERENT approach!
This enables the agent to self-correct and find alternate paths.
CLI Commands
Run Browser Agent
Execute a goal directly from CLI with live progress display:
praisonai browser run "Go to google and search praisonai"
praisonai browser run "Find flights to Paris" --model gpt-4o
praisonai browser run "task" --debug # Show all WebSocket messages
Options:
--url, -u: Start URL (default: https://www.google.com)
--model, -m: LLM model (default: gpt-4o-mini)
--timeout, -t: Timeout in seconds (default: 120)
--debug, -d: Debug mode - show all events
Example Output:
π Starting browser agent
Goal: Go to google and search praisonai
Model: gpt-4o-mini
Session: 4a703667
Step 0: βΆ TYPE β textarea#APjFqb
π https://www.google.com/
Step 1: βΆ CLICK
Step 2: βΆ CLICK
π https://www.google.com/search?q=praisonai
β
Task completed!
Launch Browser with Goal
Launch Chrome with the extension and optionally run a goal:
# Just launch Chrome with extension
praisonai browser launch
# Launch and run goal
praisonai browser launch "Go to google and search AI"
# With specific engine
praisonai browser launch "Search for AI" --engine cdp
praisonai browser launch "Search for AI" --engine extension
Options:
--url, -u: Start URL (default: https://www.google.com)
--model, -m: LLM model (default: gpt-4o-mini)
--max-steps: Maximum steps (default: 20)
--engine: Automation engine: extension, cdp, auto (default: auto)
--debug, -d: Debug mode with detailed logging
--record-video: Record video of browser session
--profile: Enable performance profiling
--deep-profile: Enable deep profiling with cProfile
Track execution time per step to identify bottlenecks:
praisonai browser launch "Go to google, search for AI" --profile
Example Output:
π Performance Profile
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Total Time: 16.4s | Steps: 3 | Avg: 5.5s/step
Step | LLM | Screen | Action | Verify | Stable | Total
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
0 | 0.0s | 0.0s | 0.0s | 0.0s | 0.0s | 5.1s
1 | 0.0s | 0.0s | 0.0s | 0.0s | 0.0s | 1.5s
2 | 0.0s | 0.0s | 0.0s | 0.0s | 0.0s | 3.6s
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Total | 0.0s | 0.0s | 0.0s | 0.0s | 0.0s | 16.4s
Bottlenecks: LLM 0% | Verify 0% | Stable 0%
For deep function-level profiling (cProfile):
praisonai browser launch "goal" --deep-profile
Tab Management
praisonai browser tabs # List all tabs
praisonai browser tabs --new https://google.com # Open new tab
praisonai browser tabs --close TAB_ID # Close tab
praisonai browser tabs --focus TAB_ID # Focus tab
Navigate
praisonai browser navigate "https://github.com"
praisonai browser navigate "https://docs.praison.ai" --tab TAB_ID
Execute JavaScript
praisonai browser execute "document.title"
praisonai browser execute "document.querySelectorAll('a').length"
Page Inspection (New)
Inspect browser pages without the extension:
# List all open pages
praisonai browser pages
# Get DOM tree
praisonai browser dom <PAGE_ID>
# Read page content as text
praisonai browser content <PAGE_ID>
# Capture console logs
praisonai browser console <PAGE_ID>
# Execute JavaScript
praisonai browser js <PAGE_ID> "document.title"
These commands work via CDP (Chrome DevTools Protocol) and require Chrome
running with --remote-debugging-port=9222.
Automation Engines
Choose different execution engines with --engine:
# Extension mode (default) - requires extension
praisonai browser run "task" --engine extension
# CDP mode - direct Chrome control, no extension needed
praisonai browser run "task" --engine cdp
# Playwright mode - cross-browser, headless support
praisonai browser run "task" --engine playwright
| Engine | Extension | Headless | Multi-Browser |
|---|
| extension | Required | No | No |
| cdp | No | Yes | Chrome only |
| playwright | No | Yes | Chrome/Firefox/WebKit |
Screenshot
praisonai browser screenshot -o page.png
praisonai browser screenshot --fullpage -o full.png
Start Server
praisonai browser start [OPTIONS]
Options:
--port, -p: Port to listen on (default: 8765)
--host, -H: Host to bind to (default: 0.0.0.0)
--model, -m: LLM model (default: gpt-4o-mini)
--max-steps: Maximum steps per session (default: 20)
--verbose, -v: Enable verbose logging
List Sessions
praisonai browser sessions [OPTIONS]
Options:
--status, -s: Filter by status (running, completed, failed)
--limit, -l: Maximum sessions to show
View History
praisonai browser history <SESSION_ID>
Clear Sessions
praisonai browser clear --status completed --yes
Reload Extension
Reload the Chrome extension after making changes:
praisonai browser reload
praisonai browser reload --port 9222 # Custom Chrome debug port
Health Diagnostics
Run health checks for the browser automation system:
praisonai browser doctor # Run all checks
praisonai browser doctor server # Check bridge server
praisonai browser doctor chrome # Check Chrome debugging
praisonai browser doctor extension # Check extension loaded
praisonai browser doctor db # Check session database
Example Output:
Browser Health Check
β
Server: ok
Connections: 1
Sessions: 0
β
Chrome: Chrome/131.0.6778.85
WebSocket: ws://127.0.0.1:9222/devtools/browser/...
β
Extension loaded
URL: chrome-extension://fkmfdklcegbbpipbcimb...
β
Session database
Path: ~/.praisonai/browser_sessions.db
Sessions: 42
Steps: 387
Python API
from praisonai.browser import BrowserServer, BrowserAgent
# Start server
server = BrowserServer(port=8765, model="gpt-4o")
server.start() # Blocks
# Or create agent directly
agent = BrowserAgent(model="gpt-4o")
action = agent.process_observation({
"task": "Search for AI frameworks",
"url": "https://google.com",
"title": "Google",
"elements": [{"selector": "#search", "tag": "input", "text": ""}]
})
Session Management
from praisonai.browser.sessions import SessionManager
manager = SessionManager()
# Create session
session = manager.create_session("Find best restaurants")
print(session["session_id"])
# List sessions
sessions = manager.list_sessions(status="running")
# Get session details with steps
details = manager.get_session(session_id)
for step in details["steps"]:
print(f"Step {step['step_number']}: {step['action']}")
Hybrid Mode (Extension)
The Chrome Extension supports hybrid mode:
- Bridge Mode: Connect to PraisonAI server for cloud LLMs
- Built-in Mode: Use Chromeβs Gemini Nano on-device
If the bridge server is unavailable, it automatically falls back to built-in AI.
Keyboard Shortcuts
| Shortcut | Action |
|---|
Ctrl+Shift+P | Toggle side panel |
Alt+A | Start agent |
Alt+S | Capture screenshot |
Supported Actions
| Action | Description |
|---|
click | Click on element |
type | Enter text |
submit | Press Enter to submit forms |
scroll | Scroll page |
navigate | Go to URL |
clear_input | Clear input field (fixes garbled/duplicated text) |
wait | Wait for page |
screenshot | Capture screen |
done | Task complete |
Error Detection & Recovery (v1.3+)
The agent automatically detects and recovers from errors:
Detected Errors
- Garbled/duplicated text in input fields
- Wrong page navigation (user or browser interference)
- Failed actions (click not working, submit didnβt fire)
- Blocking elements (popups, consent dialogs, login walls)
Recovery Actions
When errors are detected, the agent will:
- Set
error_detected: true with description
- Report
input_field_value showing actual text visible
- Use
clear_input to fix garbled input
- Use
navigate to return to correct URL if off-track
Step Timestamps
Debug mode now shows elapsed time for each step:
praisonai browser launch "goal" --debug
Output:
[+0.0s] Step 1: type β #APjFqb = "search term" (done=False)
π Input field shows: "search term"
π Progress: 50% [β on track]
[+2.3s] Step 2: submit β #APjFqb (done=False)
π Progress: 75% [β on track]
[+4.1s] Step 3: done (done=True)
π Progress: 100% [β on track]
Action delays have been optimized for faster execution:
- Click: 200ms (was 500ms)
- Submit: 300ms (was 500ms)
- Search: 400ms (was 1000ms)
WebSocket Protocol
Connect to ws://localhost:8765/ws and send/receive JSON messages:
// Start session
{"type": "start_session", "goal": "Find flights to Paris", "model": "gpt-4o"}
// Send observation
{"type": "observation", "session_id": "...", "task": "...",
"url": "...", "elements": [...]}
// Receive action
{"type": "action", "action": "click", "selector": "#search",
"thought": "Clicking search button"}
Environment Variables
| Variable | Description |
|---|
OPENAI_API_KEY | OpenAI API key for GPT models |
ANTHROPIC_API_KEY | Anthropic API key for Claude |
GOOGLE_API_KEY | Google API key for Gemini |