Text-to-Speech Providers - Burki Voice AI Docs

🎙️ Give Your Assistant a Voice

Transform text into natural, human-like speech with our integrated TTS providers. Each provider offers unique advantages for different use cases.

Quick Provider Comparison

⚡ ElevenLabs

Premium Quality & Customization70+ languages, voice cloning, advanced controlsBest for: High-quality customer interactions

🚀 Deepgram Aura

Ultra-Low Latency3x faster than competitors, phone-optimizedBest for: Real-time conversations

🎯 Cartesia Sonic 3

Multilingual Excellence42 languages, voice cloning, low latencyBest for: Multilingual agents, global deployments

☁️ Azure Speech

Enterprise Scale500+ neural voices, 100+ languagesBest for: Enterprise, Azure ecosystem

🎭 Inworld.ai

AI-Powered EmotionsMultilingual, emotional markup, voice cloningBest for: Expressive, contextual responses

🎙️ Resemble AI

Custom Voice CreationWebSocket streaming, personalized voicesBest for: Brand-specific voice identity

🧠 OpenAI TTS

Instruction-Aware Voicestts-1, tts-1-hd, and gpt-4o-mini-ttsBest for: OpenAI-native stacks and voice instructions

🧩 Additional Providers

Kokoro, Uplift, Murf, SonioxSelf-hosted and specialized provider optionsBest for: Custom deployments and specialized voices

Feature Matrix

Provider	Latency	Languages	Voice Cloning	Streaming	Best For
ElevenLabs	Vendor-reported low latency	70+	✅ Advanced	WebSocket	Premium quality
Deepgram	Vendor-reported very low latency	English	❌	WebSocket	Speed & phone calls
Cartesia	Vendor-reported low latency	42	✅ Yes	WebSocket	Multilingual
Azure	Provider/config dependent	100+	❌	HTTP	Microsoft ecosystem
Inworld	Provider/config dependent	11	✅ Zero-shot	HTTP/WS	Emotional expression
Resemble	Provider/config dependent	English	✅ Custom	WebSocket	Brand voices
OpenAI	Provider/config dependent	English-first	❌	HTTP streaming	OpenAI-native stacks
Kokoro	Self-hosted dependent	Model-dependent	❌	HTTP	Self-hosted/local deployments
Uplift	Provider-dependent	Specialized	❌	Streaming service	Uplift voices
Murf	Provider-dependent	Multi-language	❌	HTTP	Style/rate/pitch controls
Soniox	Provider-dependent	Model-dependent	❌	Streaming service	Soniox voice stack

New to TTS? Start with ElevenLabs for the best balance of quality and features, Deepgram if speed is your priority, or Cartesia for multilingual support.

Setup Overview

All providers follow the same basic setup pattern:

Get API Credentials

Configure in Burki

Add your credentials in the assistant’s AI Configuration → TTS tab

Select Voice & Model

Choose from available voices and models for your use case

Fine-tune Settings

Adjust speed, stability, and other provider-specific options

Provider Deep Dives

🎭 ElevenLabs - Premium Voice Quality

Latest Models: Flash v2.5 (75ms), v3 (70+ languages), Turbo v2.5Key Features: Advanced voice controls, multilingual support, custom voice creationPerfect For: Customer service, content creation, multilingual applications→ Complete ElevenLabs Guide

⚡ Deepgram Aura - Ultra-Fast TTS

Latest Models: Aura-2 (next-gen), Aura (proven)Key Features: Industry-leading speed, phone optimization, µ-law encodingPerfect For: Real-time phone calls, live chat, interactive applications→ Complete Deepgram Guide

🎯 Cartesia Sonic 3 - Multilingual Excellence

Latest Models: sonic-3 (latest), sonic-3-2025-10-27 (stable)Key Features: 42 languages, voice cloning from ~5 sec, low latency WebSocketPerfect For: Global deployments, multilingual voice agents→ Complete Cartesia Guide

☁️ Azure Speech - Enterprise Scale

Latest Models: Neural (high-quality), Standard (basic)Key Features: 500+ voices, 100+ languages, SSML support, Microsoft integrationPerfect For: Enterprise applications, Azure ecosystem users→ Complete Azure Guide

🎪 Inworld.ai - AI-Powered Expression

Latest Models: inworld-tts-1, inworld-tts-1-max, inworld-tts-1.5-max, inworld-tts-1.5-miniKey Features: Emotional markup, context awareness, 11 languagesPerfect For: Gaming, entertainment, emotional customer support→ Complete Inworld Guide

🎙️ Resemble AI - Custom Brand Voices

Key Features: WebSocket streaming, unlimited voice creation, business plansPerfect For: Brand consistency, personalized experiences, enterprise→ Complete Resemble Guide

🧠 OpenAI TTS - Instruction-Aware Speech

Latest Models: tts-1, tts-1-hd, gpt-4o-mini-ttsKey Features: Built-in OpenAI voices, speed control, instruction support on GPT-4o mini TTS modelsPerfect For: Teams already using OpenAI keys and model-specific voice instructions→ Complete OpenAI TTS Guide

Additional Supported Providers

The backend also supports these TTS providers through tts_settings.provider:

Provider	Provider Key	Important Options
Kokoro	`kokoro`	`voice_id`, `model_id`, `language`, `speed`, `kokoro_base_url`; no API key required by default
Uplift	`uplift`	`voice_id`, `model_id`, `language`, `output_format`
Murf	`murf`	`voice_id`, `model_id`, `style`, `rate`, `pitch`, `region`, optional `variation`, `locale`, `pronunciation_dictionary`
Soniox	`soniox`	`voice_id`, `model_id`, `language`

{
  "tts_settings": {
    "provider": "kokoro",
    "voice_id": "af_heart",
    "model_id": "kokoro",
    "language": "en",
    "provider_config": {
      "kokoro_base_url": "http://localhost:8880"
    }
  }
}

AWS Polly and Google TTS appear only as future placeholders in backend enum comments and are not registered in the active TTS factory. Do not document them as supported providers until they are wired.

Advanced Topics

🎛️ Voice Tuning

Master stability, similarity, and style controls across all providers

🔧 Troubleshooting

Common issues and solutions with step-by-step fixes

📈 Best Practices

Performance optimization, cost reduction, and production tips

🔗 See Also

Configuration: Learn how to configure TTS in your AI Configuration settings.Integration: Understand how TTS fits into the overall Architecture of Burki Voice AI.Call Management: Discover how TTS works with Call Management features.

Quick Start Guide

Business Calls
Customer Support
Multilingual
Enterprise
Real-time Apps

Recommended Setup:

Provider: ElevenLabs or Deepgram
Model: Flash v2.5 or Aura-2
Voice: Professional (Rachel, Asteria)
Settings: Stability 0.5, Speaker Boost ON

API Rate Limits: Each provider has different rate limits and pricing models. Check the individual provider pages for detailed pricing information.

🚀 Ready to Get Started?

Choose your provider and dive into the detailed setup guides, or check out our Best Practices for optimization tips.

🎙️ Give Your Assistant a Voice

​Quick Provider Comparison

⚡ ElevenLabs

🚀 Deepgram Aura

🎯 Cartesia Sonic 3

☁️ Azure Speech

🎭 Inworld.ai

🎙️ Resemble AI

🧠 OpenAI TTS

🧩 Additional Providers

​Feature Matrix

​Setup Overview

​Provider Deep Dives

🎭 ElevenLabs - Premium Voice Quality

⚡ Deepgram Aura - Ultra-Fast TTS

🎯 Cartesia Sonic 3 - Multilingual Excellence

☁️ Azure Speech - Enterprise Scale

🎪 Inworld.ai - AI-Powered Expression

🎙️ Resemble AI - Custom Brand Voices

🧠 OpenAI TTS - Instruction-Aware Speech

​Additional Supported Providers

​Advanced Topics

🎛️ Voice Tuning

🔧 Troubleshooting

📈 Best Practices

​Related Documentation

🔗 See Also

​Quick Start Guide

🚀 Ready to Get Started?

Quick Provider Comparison

Feature Matrix

Setup Overview

Provider Deep Dives

Additional Supported Providers

Advanced Topics

Related Documentation

Quick Start Guide