Skip to main content
Enable your agents to speak and listen with TTS and STT tools.

Quick Start

# Enable TTS tool for your bot
praisonai bot telegram --token $TOKEN --tts

# Enable both TTS and STT
praisonai bot telegram --token $TOKEN --tts --stt

# Auto-convert all responses to speech
praisonai bot telegram --token $TOKEN --auto-tts

TTS Tool

Convert text to speech and get an audio file.

Usage

from praisonai.tools.audio import tts_tool

result = tts_tool("Hello world!", voice="alloy")

if result["success"]:
    print(result["audio_path"])   # /tmp/tts_abc123.mp3
    print(result["media_line"])   # MEDIA:/tmp/tts_abc123.mp3

Options

ParameterTypeDefaultDescription
textstrrequiredText to convert
voicestr"alloy"Voice: alloy, echo, fable, onyx, nova, shimmer
modelstr"openai/tts-1"TTS model
output_formatstr"mp3"Format: mp3, opus, aac, flac, wav
output_dirstrtemp dirDirectory to save audio

STT Tool

Transcribe audio files to text.

Usage

from praisonai.tools.audio import stt_tool

result = stt_tool("recording.mp3", language="en")

if result["success"]:
    print(result["text"])  # Transcribed text

Options

ParameterTypeDefaultDescription
audio_pathstrrequiredPath to audio file
languagestrautoLanguage code (en, es, fr, etc.)
modelstr"openai/whisper-1"STT model

Bot CLI Options

Enable audio tools when starting bots:
OptionDescription
--ttsEnable TTS tool
--tts-voice VOICEVoice (alloy, echo, fable, onyx, nova, shimmer)
--tts-model MODELTTS model (default: openai/tts-1)
--auto-ttsAuto-convert all responses to speech
--sttEnable STT tool
--stt-model MODELSTT model (default: openai/whisper-1)

Examples

# Basic TTS
praisonai bot telegram --token $TOKEN --tts

# Custom voice
praisonai bot telegram --token $TOKEN --tts --tts-voice nova

# Auto-TTS mode (all responses become audio)
praisonai bot telegram --token $TOKEN --auto-tts

# Full audio capabilities
praisonai bot telegram --token $TOKEN --tts --stt --auto-tts

Supported Providers

Audio tools use the core AudioAgent which supports multiple providers via LiteLLM:
ProviderModelNotes
OpenAIopenai/tts-1, openai/tts-1-hdDefault, high quality
Azureazure/tts-1Enterprise
ElevenLabselevenlabs/eleven_multilingual_v2Premium voices
Geminigemini/gemini-2.5-flash-preview-ttsGoogle
ProviderModelNotes
OpenAIopenai/whisper-1Default, accurate
Azureazure/whisperEnterprise
Groqgroq/whisper-large-v3Fast
Deepgramdeepgram/nova-2Real-time

Voice Options

Available voices for OpenAI TTS:
VoiceDescription
alloyNeutral, balanced (default)
echoWarm, conversational
fableExpressive, storytelling
onyxDeep, authoritative
novaFriendly, upbeat
shimmerClear, professional

Architecture

Audio tools are in the wrapper layer (praisonai), not the core SDK. They wrap the core AudioAgent for easy use with agents and bots.