The profile command provides detailed cProfile-based profiling for query execution, showing per-function and per-file timing, call graphs, and latency metrics.
Quick Start
# Profile a query with detailed timing breakdown
praisonai profile query "What is 2+2?"
# Profile with file grouping
praisonai profile query "Hello" --show-files --limit 20
# Profile startup time
praisonai profile startup
# Profile import times
praisonai profile imports
Subcommands
Query
Profile a query execution with detailed timing breakdown.
praisonai profile query "Your prompt here"
Options:
| Option | Description |
|---|
--model, -m | Model to use |
--stream/--no-stream | Use streaming mode |
--deep | Enable deep call tracing (higher overhead) |
--limit, -n | Top N functions to show (default: 30) |
--sort, -s | Sort by: cumulative or tottime |
--show-files | Group timing by file/module |
--show-callers | Show caller functions |
--show-callees | Show callee functions |
--importtime | Show module import times |
--first-token | Track time to first token (streaming) |
--save | Save artifacts to path (.prof, .txt) |
--format, -f | Output format: text or json |
Example:
praisonai profile query "Write a poem about AI" --show-files --limit 15
Output:
======================================================================
PraisonAI Profile Report
======================================================================
## System Information
Timestamp: 2025-12-31T17:37:46.662247Z
Python Version: 3.12.11
Platform: macOS-15.7.4-arm64-arm-64bit
PraisonAI: 2.9.2
Model: default
## Timing Breakdown
CLI Parse: 0.00 ms
Imports: 867.21 ms
Agent Construct: 0.06 ms
Model Init: 0.00 ms
Total Run: 2302.64 ms
## Per-Function Timing (Top Functions)
----------------------------------------------------------------------
Function Calls Cumulative (ms) Self (ms)
----------------------------------------------------------------------
start 1 2302.57 0.03
chat 1 2302.54 0.03
_chat_completion 1 2302.45 0.02
...
======================================================================
Imports
Profile module import times to identify slow imports.
praisonai profile imports
Output:
======================================================================
Import Time Analysis
======================================================================
Module Self (μs) Cumul (μs)
----------------------------------------------------------------------
praisonaiagents 12345 123456
praisonaiagents.agent 5432 98765
...
----------------------------------------------------------------------
Total import time: 123.45 ms
Startup
Profile CLI startup time.
praisonai profile startup
Output:
==================================================
Startup Time Analysis
==================================================
Cold Start: 60.81 ms
Warm Start: 61.40 ms
==================================================
Advanced Usage
Deep Call Tracing
Enable deep call tracing for detailed call graph analysis:
praisonai profile query "Test" --deep --show-callers --show-callees
Deep call tracing adds significant overhead. Use only for detailed debugging.
Save Artifacts
Save profiling artifacts for later analysis:
praisonai profile query "Test" --save ./profile_results
This creates:
profile_results.prof - Binary cProfile data (can be loaded with pstats)
profile_results.txt - Human-readable report
JSON Output
Get machine-readable output for CI/CD integration:
praisonai profile query "Test" --format json > profile.json
Streaming with First Token Tracking
Track time to first token in streaming mode:
praisonai profile query "Test" --stream --first-token
Python API
You can also use the profiler programmatically:
from praisonai.cli.features.profiler import (
ProfilerConfig,
QueryProfiler,
format_profile_report,
)
# Configure profiler
config = ProfilerConfig(
deep=False,
limit=20,
show_files=True,
)
# Run profiled query
profiler = QueryProfiler(config)
result = profiler.profile_query("What is 2+2?", model="gpt-4o-mini")
# Print report
print(format_profile_report(result, config))
# Access timing data
print(f"Total time: {result.timing.total_run_ms:.2f} ms")
print(f"Imports: {result.timing.imports_ms:.2f} ms")
Use Cases
Identify Slow Imports
# Find which modules are slowing down startup
praisonai profile imports
# Profile with file grouping to find hotspots
praisonai profile query "Complex task" --show-files --limit 30
Debug Latency Issues
# Track time to first token for streaming
praisonai profile query "Test" --stream --first-token
CI/CD Integration
# Export JSON for automated analysis
praisonai profile query "Test" --format json --save ./ci_profile
Safety Notes
- Secrets are redacted from profiling output (API keys, tokens)
- Deep tracing is opt-in due to overhead
- No prompt logging unless explicitly saved
- Safe by default - minimal overhead in normal mode
Profile Suite
Run comprehensive profiling across multiple scenarios:
# Full suite (4 scenarios, 3 iterations each)
praisonai profile suite
# Quick mode (2 scenarios, 1 iteration)
praisonai profile suite --quick
# Custom output directory
praisonai profile suite --output ./my_profile_results
# More iterations for statistical significance
praisonai profile suite --iterations 5
Output Files:
suite_results.json - Machine-readable JSON with all timing data
suite_report.txt - Human-readable summary report
Scenarios Tested:
simple_non_stream - Simple prompt, non-streaming
simple_stream - Simple prompt, streaming
medium_non_stream - Medium prompt, non-streaming
medium_stream - Medium prompt, streaming
Observed Timing Breakdown
Based on profiling runs, here’s where time is spent:
| Phase | Time (ms) | % of Total |
|---|
| CLI Startup | 60-400 | 1-8% |
| Import praisonaiagents | 1300-2800 | 25-55% |
| Agent Construction | 0.1-1 | 0.1% |
| Model API Call | 2000-5000 | 40-70% |
| Total | 2300-7000 | 100% |
Import Time Hotspots
Top modules by import time:
| Module | Time (ms) | Notes |
|---|
praisonaiagents | 2700-3500 | Root import |
openai | 1300-1400 | OpenAI SDK |
openai.types | 1100-1200 | Type definitions |
openai.types.batch | 600-700 | Batch types |
openai._models | 250-650 | Pydantic models |
Function Time Hotspots
Top functions by cumulative time:
| Function | File | Time (ms) |
|---|
start | agent.py | 2300-6900 |
chat | agent.py | 2300-6900 |
_chat_completion | agent.py | 2300-6900 |
create | completions.py | 2000-4200 |
send | _client.py | 2000-4200 |
Root Causes
-
Heavy OpenAI SDK imports (~1.3s)
- Pydantic model validation at import time
- Type definitions loaded eagerly
-
Network latency (~2-5s)
- API round-trip dominates total time
- Cannot be optimized locally
-
Streaming vs Non-streaming
- Streaming shows faster time-to-first-token
- Total time similar or slightly better
Optimization Opportunities
Tier 0 (Safe, Fast Wins):
- Lazy import OpenAI SDK only when needed
- Cache provider resolution
Tier 1 (Medium Effort):
- Preload common providers in background
- Connection pooling for repeated calls
Tier 2 (Architectural):
- Optional “lite” mode without full type checking
- Async initialization pipeline
Create baseline snapshots and compare against them to detect regressions:
# Create a baseline snapshot
praisonai profile snapshot --baseline
# Later, compare against baseline
praisonai profile snapshot current --compare
# Save snapshot with custom name
praisonai profile snapshot v2.0
# Get JSON output
praisonai profile snapshot --format json
Output (comparison):
======================================================================
Performance Comparison Report
======================================================================
Baseline: baseline (2025-01-01T00:00:00Z)
Current: current (2025-01-02T00:00:00Z)
----------------------------------------------------------------------
Metric Baseline Current Diff %
----------------------------------------------------------------------
Startup Cold (ms) 100.00 105.00 +5.00 +5.0%
Import Time (ms) 500.00 520.00 +20.00 +4.0%
Query Time (ms) 2000.00 2100.00 +100.00 +5.0%
----------------------------------------------------------------------
✅ No significant regression
======================================================================
Configure opt-in performance optimizations:
# Show current optimization status
praisonai profile optimize --show
# Enable provider pre-warming
praisonai profile optimize --prewarm
# Show lite mode configuration
praisonai profile optimize --lite
Environment Variables
Enable optimizations via environment variables:
| Variable | Description |
|---|
PRAISONAI_LITE_MODE=1 | Enable lite mode (skip heavy validation) |
PRAISONAI_SKIP_TYPE_VALIDATION=1 | Skip type validation |
PRAISONAI_MINIMAL_IMPORTS=1 | Use minimal imports |
Optimization Tiers
Tier 0 (Always Safe):
- Provider/model resolution caching
- Lazy imports for heavy modules
- CLI startup path optimization
Tier 1 (Opt-in):
- Connection pooling for repeated API calls
- Provider pre-warming (background initialization)
Tier 2 (Opt-in, Architectural):
- Lite mode (skip expensive validation)
- Performance snapshot baselines