Uses embeddings to split text at semantic boundaries. Groups related content together for better retrieval.
Quick Start
Agent with Semantic Chunking
Direct API
from praisonaiagents import Agent
agent = Agent (
instructions = " Answer questions from research papers. " ,
knowledge ={
" sources " : [ " papers/ " ],
" chunker " : {
" type " : " semantic " ,
" chunk_size " : 512 ,
" embedding_model " : " all-MiniLM-L6-v2 "
}
}
)
response = agent . start ( " What methodology did they use? " )
When to Use
Good For
Research papers
Topic-dense content
Multi-subject documents
Quality over speed
Consider Alternatives
Speed-critical pipelines
Uniform chunk sizes needed
Simple structured content
Very short documents
Parameters
Parameter Type Default Description chunk_sizeint 512 Max tokens per chunk embedding_modelstr auto Model for semantic similarity
Examples
Research Analysis
agent = Agent (
instructions = " Analyze academic papers. " ,
knowledge ={
" sources " : [ " research/ " ],
" chunker " : {
" type " : " semantic " ,
" chunk_size " : 512 ,
" embedding_model " : " all-MiniLM-L6-v2 "
}
}
)
Knowledge Base
agent = Agent (
instructions = " Answer from knowledge base. " ,
knowledge ={
" sources " : [ " wiki/ " , " faq.txt " ],
" chunker " : {
" type " : " semantic " ,
" chunk_size " : 256 # Smaller for precise retrieval
}
}
)
How It Works
Semantic chunking:
Splits document into sentences
Generates embeddings for each sentence
Groups consecutive similar sentences
Creates new chunk when topic changes
Semantic chunking requires computing embeddings and is slower than token/sentence chunking. Use for quality-sensitive applications where retrieval accuracy matters more than speed.
Embedding Models
The default embedding model is all-MiniLM-L6-v2. You can use any model supported by the chonkie library:
knowledge ={
" sources " : [ " docs/ " ],
" chunker " : {
" type " : " semantic " ,
" embedding_model " : " all-MiniLM-L6-v2 " # Default
# Or: "text-embedding-3-small", etc.
}
}