Knowledge Backends

PraisonAI supports multiple knowledge storage backends through a protocol-driven architecture. This allows you to choose the best backend for your use case while maintaining a consistent API.

Available Backends

Backend	Description	Best For
mem0 (default)	Long-term memory with semantic search	Multi-user apps, persistent memory
chroma	Local vector database	Development, single-user apps
internal	Built-in lightweight storage	Simple use cases

Agent-First Usage

The recommended way to use knowledge is through the Agent API:

from praisonaiagents import Agent

# Create agent with knowledge (uses mem0 by default)
agent = Agent(
    name="ResearchAssistant",
    instructions="You are a research assistant.",
    knowledge=["./documents/"],  # Add documents
    memory={"user_id": "user123"} ,  # Required for mem0 backend
)

# Chat automatically retrieves relevant context
response = agent.chat("What are the main findings?")

Scope Identifiers

Knowledge backends support three scope identifiers for multi-tenant isolation:

Identifier	Purpose	Example
`user_id`	Isolate per user	`"user_alice"`
`agent_id`	Isolate per agent type	`"research_agent_v1"`
`run_id`	Isolate per session	`"session_abc123"`

The mem0 backend requires at least one scope identifier. If none is provided, operations will fail with a ScopeRequiredError.

Example with Scope

from praisonaiagents import Agent

# User-scoped knowledge
agent = Agent(
    name="PersonalAssistant",
    instructions="You are a personal assistant.",
    knowledge=["./user_docs/"],
    memory={"user_id": "alice"} ,  # Knowledge scoped to Alice
)

# Agent-scoped knowledge (shared across users)
shared_agent = Agent(
    name="CompanyBot",
    instructions="You answer company policy questions.",
    knowledge=["./policies/"],
    agent_id="company_bot_v1",  # Shared knowledge
)

Identifier Naming Rules

SQL/CQL backends (PGVector, SingleStore, Cassandra) enforce strict identifier validation for security. Collection names and related identifiers must match the pattern [A-Za-z0-9_]+ to prevent SQL injection attacks.

Affected Fields

Field	Backend	Description
`collection_name`	PGVector, SingleStore, Cassandra	Table/collection name
`schema`	PGVector	PostgreSQL schema name
`keyspace`	Cassandra	Cassandra keyspace name
`table_prefix`	All SQL backends	Prefix for generated table names

mem0 and chroma backends are not affected by these restrictions. They accept any valid string identifiers.

Passing Examples

# These identifiers pass validation
config = {
    "vector_store": {
        "provider": "pgvector",
        "config": {
            "collection_name": "user_docs",      # ✅ Valid
            "schema": "ai_knowledge",            # ✅ Valid
            "table_prefix": "knowledge_store_"   # ✅ Valid
        }
    }
}

Failing Examples

# These identifiers fail validation with ValueError
config = {
    "vector_store": {
        "provider": "pgvector", 
        "config": {
            "collection_name": "my-collection",    # ❌ Contains dash
            "schema": "public.docs",               # ❌ Contains dot
            "table_prefix": "data with spaces"     # ❌ Contains spaces
        }
    }
}

Security Context

This validation was added in PraisonAI 4.6.34 to address GHSA-3643-7v76-5cj2 (SQL identifier injection). Previously, user-controlled collection names were interpolated directly into SQL DDL/DML statements. For more details, see the security advisory.

Direct Knowledge API

For advanced use cases, you can use the Knowledge class directly:

from praisonaiagents.knowledge import Knowledge

# Initialize with config
knowledge = Knowledge(config={
    "vector_store": {
        "provider": "chroma",
        "config": {
            "collection_name": "my_docs",
            "path": "./.praison/knowledge/my_docs",
        }
    }
})

# Add documents
knowledge.add("./documents/", memory={"user_id": "user123"})

# Search
results = knowledge.search("query", user_id="user123", limit=10)

Normalization Guarantees

PraisonAI normalizes all backend results to ensure consistent behavior:

metadata is ALWAYS a dict (never None)
text field is always present (mapped from memory for mem0)
score is always a float (defaults to 0.0)

This means you can safely access metadata without null checks:

# Safe - metadata is guaranteed to be a dict
for result in results['results']:
    source = result.get('metadata', {}).get('source', 'unknown')
    # This works even if the backend returns metadata=None

Protocol-Driven Architecture

All backends implement the KnowledgeStoreProtocol:

from praisonaiagents.knowledge import KnowledgeStoreProtocol

class MyCustomBackend:
    """Custom backend implementing the protocol."""
    
    def search(self, query, *, user_id=None, agent_id=None, run_id=None, **kwargs):
        # Your implementation
        pass
    
    def add(self, content, *, user_id=None, agent_id=None, run_id=None, **kwargs):
        # Your implementation
        pass
    
    # ... other methods

Configuration Options

mem0 Backend (Default)

config = {
    "vector_store": {
        "provider": "qdrant",  # mem0 uses qdrant by default
        "config": {
            "collection_name": "my_collection",
        }
    }
}

Chroma Backend

config = {
    "vector_store": {
        "provider": "chroma",
        "config": {
            "collection_name": "my_collection",
            "path": "./.praison/knowledge/my_collection",
        }
    }
}

Error Handling

from praisonaiagents.knowledge import (
    ScopeRequiredError,
    BackendNotAvailableError,
)

try:
    results = knowledge.search("query")  # Missing scope!
except ScopeRequiredError as e:
    print(f"Please provide user_id, agent_id, or run_id: {e}")
except BackendNotAvailableError as e:
    print(f"Backend not available: {e}")

Best Practices

Always provide scope identifiers for mem0 backend
Use user_id for user-specific data (multi-tenant apps)
Use agent_id for shared agent knowledge (company policies, FAQs)
Use run_id for ephemeral session data (conversation context)
Prefer Agent API over direct Knowledge API for most use cases

Getting Started

Core Concepts

Guides

Features

Models

Databases

Observability

Memory

Knowledge

RAG

Persistence

Tools

Other Features

Developers

Configuration

Best Practices

Getting Started (No Code)

Knowledge Backends

Knowledge Backends

Available Backends

Agent-First Usage

Scope Identifiers

Example with Scope

Identifier Naming Rules

Affected Fields

Passing Examples

Failing Examples

Security Context

Direct Knowledge API

Normalization Guarantees

Protocol-Driven Architecture

Configuration Options

mem0 Backend (Default)

Chroma Backend

Error Handling

Best Practices

Getting Started

Core Concepts

Guides

Features

Models

Databases

Observability

Memory

Knowledge

RAG

Persistence

Tools

Other Features

Developers

Configuration

Best Practices

Getting Started (No Code)

Documentation Index

​Knowledge Backends

​Available Backends

​Agent-First Usage

​Scope Identifiers

​Example with Scope

​Identifier Naming Rules

​Affected Fields

​Passing Examples

​Failing Examples

​Security Context

​Direct Knowledge API

​Normalization Guarantees

​Protocol-Driven Architecture

​Configuration Options

​mem0 Backend (Default)

​Chroma Backend

​Error Handling

​Best Practices

Knowledge Backends

Available Backends

Agent-First Usage

Scope Identifiers

Example with Scope

Identifier Naming Rules

Affected Fields

Passing Examples

Failing Examples

Security Context

Direct Knowledge API

Normalization Guarantees

Protocol-Driven Architecture

Configuration Options

mem0 Backend (Default)

Chroma Backend

Error Handling

Best Practices