Skip to main content

Voice-Enabled Agents

Give your Agents the ability to speak and listen. Build voice assistants, phone bots, and audio interfaces with OpenAI TTS/Whisper and ElevenLabs.

Voice Agent - Complete Example

import { Agent, createOpenAIVoice } from 'praisonai';
import { readFileSync, writeFileSync } from 'fs';

const voice = createOpenAIVoice();

const agent = new Agent({
  name: 'Voice Assistant',
  instructions: 'You are a friendly voice assistant. Keep responses concise for natural speech.'
});

// Complete voice interaction loop
async function voiceChat(audioInput: Buffer): Promise<Buffer> {
  // 1. Listen: Convert user's speech to text
  const userMessage = await voice.listen(audioInput);
  console.log('👤 User:', userMessage);

  // 2. Think: Agent processes the message
  const response = await agent.chat(userMessage);
  console.log('🤖 Agent:', response);

  // 3. Speak: Convert Agent's response to audio
  const audioResponse = await voice.speak(response, { voice: 'nova' });
  
  return audioResponse;
}

// Usage
const userAudio = readFileSync('user-question.mp3');
const agentAudio = await voiceChat(userAudio);
writeFileSync('agent-response.mp3', agentAudio);

Multi-Agent Voice System

Different Agents with different voices:
import { Agent, Agents, createOpenAIVoice, createElevenLabsVoice } from 'praisonai';

const openaiVoice = createOpenAIVoice();
const elevenLabsVoice = createElevenLabsVoice({ apiKey: process.env.ELEVENLABS_API_KEY });

// Agent 1: Greeter with friendly voice
const greeterAgent = new Agent({
  name: 'Greeter',
  instructions: 'Warmly greet users and understand their needs.',
  voice: { provider: openaiVoice, voiceId: 'nova' }
});

// Agent 2: Expert with professional voice
const expertAgent = new Agent({
  name: 'Expert',
  instructions: 'Provide detailed technical explanations.',
  voice: { provider: openaiVoice, voiceId: 'onyx' }
});

// Agent 3: Custom ElevenLabs voice
const brandAgent = new Agent({
  name: 'Brand Voice',
  instructions: 'Represent the company brand.',
  voice: { provider: elevenLabsVoice, voiceId: 'custom-voice-id' }
});

async function multiVoiceConversation(userAudio: Buffer) {
  const userText = await openaiVoice.listen(userAudio);
  
  // Greeter handles initial interaction
  const greeting = await greeterAgent.chat(userText);
  const greetingAudio = await openaiVoice.speak(greeting, { voice: 'nova' });
  
  // Expert provides detailed answer
  const explanation = await expertAgent.chat(`Explain in detail: ${userText}`);
  const explanationAudio = await openaiVoice.speak(explanation, { voice: 'onyx' });
  
  return { greetingAudio, explanationAudio };
}

Agent with Voice Tools

Give your Agent tools to control voice output:
import { Agent, createTool, createOpenAIVoice } from 'praisonai';

const voice = createOpenAIVoice();

// Tool to speak with specific emotion/style
const speakTool = createTool({
  name: 'speak_aloud',
  description: 'Speak text aloud with a specific voice style',
  parameters: {
    type: 'object',
    properties: {
      text: { type: 'string', description: 'Text to speak' },
      voice: { type: 'string', enum: ['alloy', 'echo', 'nova', 'onyx'], description: 'Voice style' },
      speed: { type: 'number', description: 'Speed (0.5-2.0)' }
    },
    required: ['text']
  },
  execute: async ({ text, voice: voiceId = 'nova', speed = 1.0 }) => {
    const audio = await voice.speak(text, { voice: voiceId, speed });
    // In real app: play audio or send to client
    return `Spoke: "${text}" with ${voiceId} voice`;
  }
});

// Tool to listen for user input
const listenTool = createTool({
  name: 'listen_for_input',
  description: 'Listen for voice input from the user',
  parameters: { type: 'object', properties: {} },
  execute: async () => {
    // In real app: capture audio from microphone
    const audioBuffer = await captureAudio();
    const transcript = await voice.listen(audioBuffer);
    return transcript;
  }
});

const voiceAgent = new Agent({
  name: 'Interactive Voice Agent',
  instructions: `You can speak aloud and listen to users. 
Use speak_aloud to respond verbally. Use listen_for_input to hear the user.`,
  tools: [speakTool, listenTool]
});

await voiceAgent.chat('Greet the user and ask how you can help');

Phone/IVR Agent

Build an automated phone system:
import { Agent, createOpenAIVoice } from 'praisonai';

const voice = createOpenAIVoice();

const phoneAgent = new Agent({
  name: 'Phone Support',
  instructions: `You are a phone support agent. 
- Keep responses under 30 words for natural phone conversation
- Ask one question at a time
- Confirm important details by repeating them back`
});

class PhoneSession {
  private history: { role: string; content: string }[] = [];
  
  async handleCall(audioInput: Buffer): Promise<Buffer> {
    // Transcribe caller's speech
    const callerMessage = await voice.listen(audioInput);
    this.history.push({ role: 'user', content: callerMessage });
    
    // Build context
    const context = this.history.map(m => `${m.role}: ${m.content}`).join('\n');
    
    // Get Agent response
    const response = await phoneAgent.chat(context);
    this.history.push({ role: 'assistant', content: response });
    
    // Convert to speech
    return await voice.speak(response, { 
      voice: 'nova',
      speed: 0.9  // Slightly slower for phone clarity
    });
  }
  
  async startCall(): Promise<Buffer> {
    const greeting = await phoneAgent.chat('Start the call with a greeting');
    this.history.push({ role: 'assistant', content: greeting });
    return await voice.speak(greeting, { voice: 'nova' });
  }
}

Podcast/Content Agent

Agent that creates audio content:
import { Agent, createOpenAIVoice, createElevenLabsVoice } from 'praisonai';
import { writeFileSync } from 'fs';

const voice = createOpenAIVoice();

const scriptWriter = new Agent({
  name: 'Script Writer',
  instructions: 'Write engaging podcast scripts with natural conversational flow.'
});

const narrator = new Agent({
  name: 'Narrator',
  instructions: 'You narrate content. Add appropriate pauses with "..." for dramatic effect.'
});

async function createPodcastEpisode(topic: string) {
  // Agent writes the script
  const script = await scriptWriter.chat(`Write a 2-minute podcast intro about: ${topic}`);
  
  // Agent refines for narration
  const narrationScript = await narrator.chat(`Prepare this for voice narration: ${script}`);
  
  // Generate audio
  const audio = await voice.speak(narrationScript, {
    voice: 'fable',  // Expressive voice for narration
    speed: 0.95
  });
  
  writeFileSync(`podcast-${Date.now()}.mp3`, audio);
  return { script: narrationScript, audioPath: `podcast-${Date.now()}.mp3` };
}

Supported Providers

ProviderTTSSTTBest For
OpenAI✅ (Whisper)General purpose Agents
ElevenLabsCustom brand voices

OpenAI Voices

VoiceStyleBest For
novaFemale, friendlyCustomer service Agents
alloyNeutralGeneral purpose
onyxMale, deepAuthority/expertise
shimmerFemale, clearInstructions/tutorials
echoMale, warmConversational
fableExpressiveStorytelling/content

Environment Variables

OPENAI_API_KEY=sk-...
ELEVENLABS_API_KEY=...
  • Agents - Core Agent documentation
  • Tools - Creating Agent tools
  • Sessions - Maintaining voice conversation state