Skip to main content

AudioAgent

Defined in the Audio Agent module.
AI Agent A specialized agent for audio processing using AI models. Provides:
  • Text-to-Speech (TTS): Convert text to spoken audio
  • Speech-to-Text (STT): Transcribe audio to text
TTS Providers:
  • OpenAI: openai/tts-1, openai/tts-1-hd
  • Azure: azure/tts-1
  • Gemini: gemini/gemini-2.5-flash-preview-tts
  • Vertex AI: vertex_ai/gemini-2.5-flash-preview-tts
  • ElevenLabs: elevenlabs/eleven_multilingual_v2
  • MiniMax: minimax/speech-01
STT Providers:
  • OpenAI: openai/whisper-1
  • Azure: azure/whisper
  • Groq: groq/whisper-large-v3
  • Deepgram: deepgram/nova-2
  • Gemini: gemini/gemini-2.0-flash

Constructor

name
Optional
No description available.
instructions
Optional
No description available.
llm
Optional
No description available.
model
Optional
No description available.
base_url
Optional
No description available.
api_key
Optional
No description available.
audio
Optional
No description available.
verbose
Union
default:"True"
No description available.

Methods

console()

Lazily initialize Rich Console.

litellm()

Lazy load litellm module when needed.

speech()

Convert text to speech.

aspeech()

Async version of speech().

transcribe()

Transcribe audio to text.

atranscribe()

Async version of transcribe().

say()

Quick TTS - convert text and save to file.

asay()

Async version of say().

listen()

Quick STT - transcribe audio file.

alisten()

Async version of listen().

Usage

from praisonaiagents import AudioAgent
    
    # Text-to-Speech
    agent = AudioAgent(llm="openai/tts-1")
    agent.speech("Hello world!", output="hello.mp3")
    
    # Speech-to-Text
    agent = AudioAgent(llm="openai/whisper-1")
    text = agent.transcribe("audio.mp3")
    print(text)

Source

View on GitHub

praisonaiagents/agent/audio_agent.py at line 67

Agents Concept

Single Agent Guide

Multi-Agent Guide

Agent Configuration

Auto Agents