Skip to main content

πŸ”§ TTS Troubleshooting Guide

Solve common TTS issues quickly with provider-specific solutions and general troubleshooting strategies.

Quick Diagnosis

🩺 Identify Your Issue

Start here to quickly identify the type of problem you’re experiencing.
  • No Audio
  • Poor Quality
  • High Latency
  • API Errors
Symptoms: TTS request completes but no audio is producedQuick Checks:
  • βœ… API key is valid and has TTS permissions
  • βœ… Voice ID exists and is spelled correctly
  • βœ… Audio format is supported by your system
  • βœ… Network connectivity is stable
Jump to: No Audio Output

No Audio Output

Common Causes & Solutions:Voice ID Issues:
# Check if voice exists
curl -X GET "https://api.elevenlabs.io/v1/voices" \
  -H "xi-api-key: YOUR_API_KEY"
  • Verify voice ID is correct (case-sensitive)
  • Ensure voice is available on your plan
  • Try with default voice: 21m00Tcm4TlvDq8ikWAM (Rachel)
Model Compatibility:API Key Issues:
  • Verify API key has TTS permissions
  • Check key isn’t expired or revoked
  • Test with a simple curl request first
Common Causes & Solutions:WebSocket Connection:
// Test WebSocket connection
const ws = new WebSocket(
  'wss://api.deepgram.com/v1/speak?model=aura-asteria-en',
  { headers: { 'Authorization': 'Token YOUR_API_KEY' }}
);

ws.on('error', (error) => {
  console.log('Connection failed:', error);
});
Audio Format Issues:
  • Ensure your system supports the requested format
  • Try Β΅-law for phone systems: encoding=mulaw&sample_rate=8000
  • Use linear16 for web: encoding=linear16&sample_rate=24000
Voice Model Issues:
  • Use correct voice format: aura-asteria-en not asteria
  • Verify model exists: aura-2 vs aura
  • Check Deepgram voice list
Common Causes & Solutions:Bearer Token:
# Test authentication
curl -X GET "https://api.inworld.ai/v1/voices" \
  -H "Authorization: Bearer YOUR_TOKEN"
Language/Voice Compatibility:
  • Verify voice supports selected language
  • Check language code format: en not english
  • Use language matrix
Model Selection:
  • Try inworld-tts-1 before inworld-tts-1-max
  • Ensure model supports your voice
  • Check model compatibility
Common Causes & Solutions:WebSocket Requirements:
  • Business plan required for WebSocket streaming
  • Check plan status in Resemble dashboard
  • Fallback to REST API if needed
UUID Format:
{
  "project_uuid": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
  "voice_uuid": "yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy"
}
Voice Training Status:
  • Ensure custom voice training is complete
  • Check voice status in Resemble dashboard
  • Wait for training completion before using

Audio Quality Issues

ElevenLabs Solutions:
{
  "stability": 0.5,           // Try 0.4-0.6 range
  "similarity_boost": 0.75,   // Optimal setting
  "style": 0.0,              // Keep low for natural sound
  "use_speaker_boost": true   // Always enable
}
Inworld Solutions:
  • Reduce emotional markup intensity
  • Try different voice with your content
  • Switch from TTS-1-Max to TTS-1 for stability
Deepgram Solutions:
  • Use Aura-2 instead of original Aura
  • Ensure proper audio encoding for your system
  • Check sample rate matches playback system
General Solutions:
  • Test with shorter text samples
  • Remove special characters from input text
  • Verify network stability during generation
Text Preprocessing:
def fix_pronunciations(text):
    fixes = {
        "API": "A P I",
        "HTTP": "H T T P", 
        "OAuth": "O Auth",
        "UUID": "U U I D",
        "AWS": "A W S",
        "URL": "U R L"
    }
    
    for term, pronunciation in fixes.items():
        text = text.replace(term, pronunciation)
    return text
Provider-Specific:
  • ElevenLabs: Use SSML for pronunciation control
  • Inworld: Leverage phonetic variations in training
  • Deepgram: English-optimized, fewer pronunciation issues
  • Resemble: Train custom voice with problematic words
Stability Optimization:
  • ElevenLabs: Increase stability to 0.6-0.7
  • Inworld: Use TTS-1 instead of TTS-1-Max
  • Resemble: Retrain voice with more consistent samples
Network Optimization:
  • Use WebSocket connections for streaming providers
  • Implement connection keepalive
  • Add retry logic for failed chunks
  • Monitor network latency and jitter

Latency Problems

⚑ Speed Optimization

Optimize TTS response times across all providers.
Model Selection:
{
  "model": "eleven_flash_v2_5",  // Fastest model
  "latency": 1,                  // Optimize setting 
  "stability": 0.5,              // Don't go too low
  "use_speaker_boost": true      // Maintain quality
}
Best Practices:
  • Use Flash v2.5 for phone calls (~75ms)
  • Keep text chunks under 100 characters
  • Avoid complex punctuation and formatting
  • Use WebSocket streaming for real-time apps
Optimal Configuration:
{
  "model": "aura-2-asteria-en",
  "encoding": "mulaw",
  "sample_rate": 8000
}
Speed Tips:
  • Already fastest provider (~75ms)
  • Use Β΅-law encoding for phone systems
  • Keep WebSocket connections alive
  • Send text in 20-50 word chunks
Text Optimization:
def optimize_text_for_speed(text):
    # Remove unnecessary punctuation
    text = re.sub(r'[\.]{2,}', '.', text)
    
    # Break into optimal chunks
    chunks = split_into_chunks(text, max_words=30)
    
    # Remove extra whitespace
    chunks = [chunk.strip() for chunk in chunks]
    
    return chunks
Connection Optimization:
  • Reuse connections where possible
  • Implement connection pooling
  • Use regional endpoints when available
  • Monitor and retry failed requests quickly

API and Authentication Errors

Common Causes:
  • Expired or invalid API key
  • Incorrect authentication header format
  • Key doesn’t have required permissions
Solutions by Provider:ElevenLabs:
# Correct header format
curl -H "xi-api-key: YOUR_API_KEY"
Deepgram:
# Correct header format  
curl -H "Authorization: Token YOUR_API_KEY"
Inworld:
# Correct header format
curl -H "Authorization: Bearer YOUR_TOKEN"
Resemble:
# Correct header format
curl -H "Authorization: Bearer YOUR_API_KEY"
Common Causes:
  • Plan limitations (voice access, features)
  • Usage quota exceeded
  • Geographic restrictions
Solutions:
  • Check plan features and upgrade if needed
  • Verify voice is available on your plan
  • Review usage dashboard for quota limits
  • Contact provider support for restrictions
Rate Limit Solutions:
import time
import random

def retry_with_backoff(func, max_retries=3):
    for attempt in range(max_retries):
        try:
            return func()
        except RateLimitError:
            if attempt == max_retries - 1:
                raise
            
            # Exponential backoff with jitter
            delay = (2 ** attempt) + random.uniform(0, 1)
            time.sleep(delay)
Prevention:
  • Implement proper rate limiting in your code
  • Use connection pooling and queuing
  • Distribute requests across time
  • Consider upgrading to higher tier plans

Provider-Specific Issues

  • ElevenLabs
  • Deepgram
  • Inworld
  • Resemble
Training Issues:
  • Upload 1-25 minutes of clear audio
  • Use consistent speaker and environment
  • Include diverse sentence types
  • Wait for full training completion
Usage Issues:
  • Use correct voice ID from dashboard
  • Ensure plan supports voice cloning
  • Try different similarity_boost values
  • Check voice model compatibility
Language Detection:
  • Explicitly set language parameter
  • Use models that support target language
  • Test with native speakers
  • Avoid mixing languages in single request
Model Compatibility:
{
  "model": "eleven_v3",        // Best multilingual
  "language": "es",            // Explicit language
  "text": "Hola mundo"
}

Emergency Troubleshooting

🚨 When Everything Breaks

Quick recovery strategies for critical TTS failures.

Fallback Strategy Implementation

class TTSWithFallback:
    def __init__(self):
        self.providers = [
            {"name": "elevenlabs", "priority": 1},
            {"name": "deepgram", "priority": 2},
            {"name": "inworld", "priority": 3}
        ]
    
    async def synthesize_with_fallback(self, text):
        for provider in sorted(self.providers, key=lambda x: x["priority"]):
            try:
                return await self.try_provider(provider["name"], text)
            except Exception as e:
                logger.warning(f"{provider['name']} failed: {e}")
                continue
        
        # All providers failed - use local fallback
        return await self.local_tts_fallback(text)

Health Check Implementation

async def check_provider_health():
    """Monitor TTS provider health and switch if needed"""
    health_status = {}
    
    for provider in ["elevenlabs", "deepgram", "inworld", "resemble"]:
        try:
            start_time = time.time()
            await test_provider_connection(provider)
            latency = time.time() - start_time
            
            health_status[provider] = {
                "status": "healthy",
                "latency": latency,
                "last_check": time.time()
            }
        except Exception as e:
            health_status[provider] = {
                "status": "unhealthy", 
                "error": str(e),
                "last_check": time.time()
            }
    
    return health_status

Getting Help

πŸ“ž Support Resources

When you need additional help beyond this troubleshooting guide.

πŸ’‘ Still Having Issues?

If this guide didn’t solve your problem, check our Best Practices guide or reach out to our community for help!