Enable your agents to speak and listen with TTS and STT tools.
Quick Start
# Enable TTS tool for your bot
praisonai bot telegram --token $TOKEN --tts
# Enable both TTS and STT
praisonai bot telegram --token $TOKEN --tts --stt
# Auto-convert all responses to speech
praisonai bot telegram --token $TOKEN --auto-tts
from praisonai . tools . audio import create_tts_tool , create_stt_tool
from praisonaiagents import Agent
# Create agent with audio tools
agent = Agent (
name = " voice-assistant " ,
instructions = " You can speak and listen. " ,
tools =[ create_tts_tool (), create_stt_tool ()]
)
# Agent can now use tts() and stt() tools
response = agent . chat ( " Say hello in audio format " )
Convert text to speech and get an audio file.
Usage
Direct Function
As Agent Tool
from praisonai . tools . audio import tts_tool
result = tts_tool ( " Hello world! " , voice = " alloy " )
if result [ " success " ]:
print ( result [ " audio_path " ]) # /tmp/tts_abc123.mp3
print ( result [ " media_line " ]) # MEDIA:/tmp/tts_abc123.mp3
Options
Parameter Type Default Description textstrrequired Text to convert voicestr"alloy"Voice: alloy, echo, fable, onyx, nova, shimmer modelstr"openai/tts-1"TTS model output_formatstr"mp3"Format: mp3, opus, aac, flac, wav output_dirstrtemp dir Directory to save audio
Transcribe audio files to text.
Usage
Direct Function
As Agent Tool
from praisonai . tools . audio import stt_tool
result = stt_tool ( " recording.mp3 " , language = " en " )
if result [ " success " ]:
print ( result [ " text " ]) # Transcribed text
Options
Parameter Type Default Description audio_pathstrrequired Path to audio file languagestrauto Language code (en, es, fr, etc.) modelstr"openai/whisper-1"STT model
Bot CLI Options
Enable audio tools when starting bots:
Option Description --ttsEnable TTS tool --tts-voice VOICEVoice (alloy, echo, fable, onyx, nova, shimmer) --tts-model MODELTTS model (default: openai/tts-1) --auto-ttsAuto-convert all responses to speech --sttEnable STT tool --stt-model MODELSTT model (default: openai/whisper-1)
Examples
# Basic TTS
praisonai bot telegram --token $TOKEN --tts
# Custom voice
praisonai bot telegram --token $TOKEN --tts --tts-voice nova
# Auto-TTS mode (all responses become audio)
praisonai bot telegram --token $TOKEN --auto-tts
# Full audio capabilities
praisonai bot telegram --token $TOKEN --tts --stt --auto-tts
Supported Providers
Audio tools use the core AudioAgent which supports multiple providers via LiteLLM:
Provider Model Notes OpenAI openai/tts-1, openai/tts-1-hdDefault, high quality Azure azure/tts-1Enterprise ElevenLabs elevenlabs/eleven_multilingual_v2Premium voices Gemini gemini/gemini-2.5-flash-preview-ttsGoogle
Provider Model Notes OpenAI openai/whisper-1Default, accurate Azure azure/whisperEnterprise Groq groq/whisper-large-v3Fast Deepgram deepgram/nova-2Real-time
Voice Options
Available voices for OpenAI TTS:
Voice Description alloyNeutral, balanced (default) echoWarm, conversational fableExpressive, storytelling onyxDeep, authoritative novaFriendly, upbeat shimmerClear, professional
Architecture
Audio tools are in the wrapper layer (praisonai), not the core SDK. They wrap the core AudioAgent for easy use with agents and bots.