Inworld.ai TTS

🎭 Inworld.ai: AI-Powered Expression

Advanced AI-driven TTS with emotional markup, context awareness, and support for 11 languages. Perfect for gaming, entertainment, and expressive customer interactions.

Quick Setup

Get API Key

Visit Inworld Studio and create an account
Navigate to Integrations → API Keys
Generate a new API key for TTS
Copy your Bearer token

Configure in Burki

Go to AI Configuration → TTS tab
Select Inworld.ai as provider
Paste your Bearer token in the TTS API Key field

Choose Model & Voice

Select TTS-1 or TTS-1-Max model and your preferred voice

Available Models

🎪 TTS-1

~200ms latencyFlagship model with realistic, context-aware synthesisLanguages: 11 supported languages Best for: Production applications, customer service

🔬 TTS-1-Max

~250ms latencyLarger, more expressive model (experimental)Languages: 11 supported languages Best for: Creative content, gaming, entertainment

Multilingual Voice Library

English
Spanish
French
Other Languages

Hades

Deep and commandingPerfect for authoritative charactersVoice ID: Hades

Alex

Clear and naturalGreat for professional applicationsVoice ID: Alex

Ashley

Warm and friendlyIdeal for customer serviceVoice ID: Ashley

Aria

Professional and articulatePerfect for business communicationsVoice ID: Aria

Emotional Markup System

🎭 Express Emotions in Speech

Inworld’s unique emotional markup allows you to add feelings and speaking styles directly in your text.

Emotional Tags

Basic Emotions
Speaking Styles
Advanced Examples

[happy] I'm excited to help you today!
[sad] I'm sorry to hear about that issue.
[angry] This is completely unacceptable!
[surprised] Wow, I didn't expect that!
[fearful] Please be careful with that.
[disgusted] That's not what I ordered.

Available emotions: happy, sad, angry, surprised, fearful, disgusted, neutral

Zero-Shot Voice Cloning

🎯 Custom Voice Creation

Create custom voices without training data. Just provide a voice ID and Inworld handles the rest.

Enable Custom Voice

In your assistant configuration, select “Custom” voice option

Enter Voice ID

Provide your custom voice identifier in the Custom Voice ID field

Test & Refine

Test with sample text and adjust based on results

Language Support & Quality

🌍 Production-Ready Languages

Inworld provides native-quality voices for multiple languages with varying production readiness.

Language	Status	Voices Available	Quality Rating
English	🟢 Production	4+ voices	⭐⭐⭐⭐⭐
Spanish	🟢 Production	4+ voices	⭐⭐⭐⭐⭐
French	🟢 Production	4+ voices	⭐⭐⭐⭐
German	🟡 Beta	2+ voices	⭐⭐⭐⭐
Chinese	🟡 Beta	4+ voices	⭐⭐⭐⭐
Japanese	🟡 Beta	2+ voices	⭐⭐⭐
Italian	🟡 Beta	2+ voices	⭐⭐⭐
Portuguese	🟡 Beta	2+ voices	⭐⭐⭐
Dutch	🟡 Beta	2+ voices	⭐⭐⭐
Korean	🟡 Beta	2+ voices	⭐⭐⭐
Polish	🟡 Beta	2+ voices	⭐⭐⭐

API Integration

import requests
import json

def synthesize_with_emotions(text, voice_id="Ashley", language="en"):
    url = "https://api.inworld.ai/v1/text-to-speech/stream"
    
    headers = {
        "Authorization": "Bearer YOUR_INWORLD_API_KEY",
        "Content-Type": "application/json"
    }
    
    data = {
        "text": text,
        "voice_id": voice_id,
        "model": "inworld-tts-1",
        "language": language,
        "output_format": "wav",
        "sample_rate": 8000
    }
    
    response = requests.post(url, json=data, headers=headers, stream=True)
    return response

# Example with emotional markup
text_with_emotion = "[happy] Welcome to our service! [excited] How can I help you today?"
response = synthesize_with_emotions(text_with_emotion, "Ashley", "en")

Use Case Examples

Customer Support
Gaming/Entertainment
Educational Content
Multilingual Business

# Empathetic customer service
[understanding] I completely understand your frustration. [reassuring] Let me help you resolve this issue right away. [confident] I'll have this sorted out for you in just a moment.

Best voices: Ashley (EN), Lupita (ES), Hélène (FR) Model: TTS-1 for consistency

Configuration Examples

🎛️ Optimal Settings

Recommended configurations for different use cases.

Phone Call Setup

{
  "model": "inworld-tts-1",
  "voice_id": "Ashley",
  "language": "en",
  "output_format": "wav",
  "sample_rate": 8000,
  "text": "[friendly] Thank you for calling! [helpful] How may I assist you today?"
}

Features:

Phone-compatible audio format
Friendly, professional tone
Emotional warmth in greeting

Multilingual Application

{
  "model": "inworld-tts-1",
  "voice_id": "Diego",
  "language": "es",
  "output_format": "wav",
  "sample_rate": 16000,
  "text": "[amigable] ¡Bienvenido! [servicial] ¿En qué puedo ayudarte hoy?"
}

Features:

Native Spanish voice
Cultural appropriate emotions
Higher quality audio format

Gaming/Interactive

{
  "model": "inworld-tts-1-max",
  "voice_id": "Hades",
  "language": "en",
  "output_format": "wav",
  "sample_rate": 24000,
  "text": "[menacing] You dare enter my domain? [angry] Prepare to face my wrath!"
}

Features:

Maximum expression model
Character-appropriate voice
High-quality audio for immersion

Best Practices

🎯 Maximize Inworld's Emotional AI

Get the most out of Inworld’s unique features with these proven strategies.

Emotional Markup Guidelines

Do’s:

Use emotions that match the content context
Place tags at natural speech boundaries
Mix emotions sparingly for realistic conversations
Test different voices with the same emotions

Don’ts:

Overuse emotional tags (max 2-3 per sentence)
Use conflicting emotions close together
Rely only on markup - the text itself should convey meaning
Mix languages and emotions in complex ways

Voice Selection Strategy

For Business Applications:

English: Ashley, Alex, Aria
Spanish: Lupita, Rafael
French: Hélène, Mathieu

For Creative Content:

Dramatic: Hades, Aria
Friendly: Ashley, Diego
Professional: Alex, Johanna

For Gaming:

Heroes: Aria, Alex
Villains: Hades
NPCs: Ashley, Diego

Performance Optimization

# Efficient text processing
def optimize_for_inworld(text):
    # Keep sentences under 50 words for best latency
    sentences = split_into_sentences(text)
    optimized = []
    
    for sentence in sentences:
        if len(sentence.split()) > 50:
            # Break long sentences at natural points
            optimized.extend(break_long_sentence(sentence))
        else:
            optimized.append(sentence)
    
    return optimized

# Cache voice models for repeated use
voice_cache = {}

def get_cached_voice(voice_id, language):
    cache_key = f"{voice_id}_{language}"
    if cache_key not in voice_cache:
        voice_cache[cache_key] = load_voice_model(voice_id, language)
    return voice_cache[cache_key]

Pricing

💰 Flexible Pricing Model

Pay-per-character with volume discounts and free tier for testing.

Plan	Characters/Month	Price	Features
Free	25,000	$0	All voices, emotional markup
Starter	100,000	$9	Priority processing
Professional	500,000	$29	Custom voice support
Enterprise	Custom	Custom	Dedicated support, SLA

Troubleshooting

Emotional Markup Not Working

Problem: Emotions don’t seem to affect the voiceSolutions:

Verify emotion tag spelling (case-sensitive)
Check if the voice supports the specific emotion
Try with TTS-1-Max model for stronger expression
Ensure tags are properly formatted: [emotion]

Language Detection Issues

Problem: Wrong language pronunciationSolutions:

Explicitly set language parameter in API call
Use native voices for each language
Avoid mixing languages in single requests
Verify voice ID supports the target language

Custom Voice Problems

Problem: Custom voice ID not workingSolutions:

Verify custom voice ID with Inworld support
Check if voice is properly trained and available
Use fallback to standard voices during setup
Contact support for voice activation status

Migration Guide

From Other Providers
Voice Mapping

Key Advantages of Switching:

Emotional markup for better user engagement
Native multilingual support (11 languages)
Zero-shot voice cloning capabilities
AI-powered context awareness

Migration Steps:

Map existing voice preferences to Inworld voices
Add emotional markup to enhance user experience
Test multilingual capabilities if applicable
Optimize for Inworld’s strengths (emotions, languages)

🎭 Ready to Add Emotion to Your AI?

Configure Inworld.ai in your assistant settings and start creating more engaging, expressive conversations!

Getting Started

Core Concepts

AI Providers

Features

Advanced

Help & Resources

🎭 Inworld.ai: AI-Powered Expression

Quick Setup

Available Models

🎪 TTS-1

🔬 TTS-1-Max

Multilingual Voice Library

Hades

Alex

Ashley

Aria

Emotional Markup System

🎭 Express Emotions in Speech

Emotional Tags

Zero-Shot Voice Cloning

🎯 Custom Voice Creation

Language Support & Quality

🌍 Production-Ready Languages

API Integration

Use Case Examples

Configuration Examples

🎛️ Optimal Settings

Best Practices

🎯 Maximize Inworld's Emotional AI

Pricing

💰 Flexible Pricing Model

Troubleshooting

Migration Guide

🎭 Ready to Add Emotion to Your AI?

Getting Started

Core Concepts

AI Providers

Features

Advanced

Help & Resources

🎭 Inworld.ai: AI-Powered Expression

​Quick Setup

​Available Models

🎪 TTS-1

🔬 TTS-1-Max

​Multilingual Voice Library

Hades

Alex

Ashley

Aria

​Emotional Markup System

🎭 Express Emotions in Speech

​Emotional Tags

​Zero-Shot Voice Cloning

🎯 Custom Voice Creation

​Language Support & Quality

🌍 Production-Ready Languages

​API Integration

​Use Case Examples

​Configuration Examples

🎛️ Optimal Settings

​Best Practices

🎯 Maximize Inworld's Emotional AI

​Pricing

💰 Flexible Pricing Model

​Troubleshooting

​Migration Guide

🎭 Ready to Add Emotion to Your AI?

Quick Setup

Available Models

Multilingual Voice Library

Emotional Markup System

Emotional Tags

Zero-Shot Voice Cloning

Language Support & Quality

API Integration

Use Case Examples

Configuration Examples

Best Practices

Pricing

Troubleshooting

Migration Guide