Voice Cloning - Burki Voice AI Docs

Overview

Burki Voice AI’s voice cloning feature enables you to:

Upload Voice Samples: Upload high-quality audio recordings to create voice models
Multi-Provider Support: Use ElevenLabs, Resemble AI, and other providers that support voice cloning
Instant Voice Creation: Generate cloned voices ready for immediate use
Voice Management: Organize, test, and manage your custom voices
Usage Analytics: Track voice usage for billing and optimization

🎙️ Voice Sample Upload

Upload audio samples with validation and processing

🤖 AI Voice Training

Provider-powered voice training with quality optimization

📊 Usage Analytics

Track synthesis usage and voice performance

🔧 Easy Integration

Seamless integration with existing assistant configurations

Supported Providers

ElevenLabs

Instant Voice Cloning: Create voices from single audio samples
High Quality: Professional-grade voice synthesis
Multiple Languages: Support for 29+ languages
Quick Processing: Voices ready in seconds

Resemble AI

Professional Training: Advanced voice training algorithms
Custom Models: Highly personalized voice characteristics
Unlimited Voices: Create as many voices as needed
Enterprise Features: Advanced customization options

Future Providers

Inworld AI: Coming soon with emotional voice cloning
OpenAI: Voice cloning capabilities when available

Voice Sample Requirements

Audio Quality Guidelines

File Format Requirements

Supported Formats:

MP3 (recommended)
WAV (highest quality)
FLAC (lossless)
M4A/AAC
OGG

Technical Specifications:

Sample Rate: 22kHz or higher
Bit Rate: 128kbps minimum
Channels: Mono preferred, stereo acceptable
File Size: Maximum 50MB

Recording Guidelines

Duration Requirements:

Minimum: 10 seconds of clear speech
Recommended: 30-60 seconds for better quality
Maximum: 10 minutes (longer samples may not improve quality)

Content Guidelines:

Clear Speech: No background noise or music
Natural Tone: Conversational, not monotone
Consistent Volume: Steady audio levels throughout
Single Speaker: Only the target voice in the recording

Quality Tips

For Best Results:

Environment: Record in a quiet room with soft furnishings
Microphone: Use a quality microphone 6-12 inches from mouth
Content: Read varied sentences with different emotions
Consistency: Maintain the same speaking style throughout
Format: Save in WAV format for highest quality

Getting Started

Step 1: Upload Voice Sample

Navigate to your assistant’s configuration and open the Voice Cloning section:

Upload Audio File: Drag and drop or click to select your audio file
Add Metadata: Provide a name, description, and tags
Validation: System automatically validates audio quality
Processing: File is uploaded and prepared for cloning

Example Upload

const formData = new FormData();
formData.append('file', audioFile);
formData.append('name', 'Professional Voice');
formData.append('description', 'Clear, professional speaking voice');
formData.append('tags', 'professional, clear, business');

const response = await fetch('/assistants/123/voice-samples/upload', {
  method: 'POST',
  body: formData,
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY'
  }
});

Step 2: Create Cloned Voice

Once your sample is uploaded, create a cloned voice:

Select Provider: Choose ElevenLabs or Resemble AI
Configure Options: Set voice name, language, and quality settings
Initiate Cloning: Start the voice training process
Monitor Progress: Track cloning status in real-time

Example Voice Creation

const response = await fetch('/assistants/123/cloned-voices/create', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'Authorization': 'Bearer YOUR_API_KEY'
  },
  body: JSON.stringify({
    voice_sample_id: 456,
    provider: 'elevenlabs',
    name: 'Custom Professional Voice',
    language: 'en',
    enhance_quality: true
  })
});

Step 3: Use Cloned Voice

Once processing is complete, assign the voice to your assistant:

Voice Selection: Choose from your cloned voices
Testing: Preview the voice with sample text
Assignment: Set as the assistant’s default voice
Go Live: Start using the voice in live calls

Voice Management

Voice Library

Organizing Voices

Voice Categories:

Brand Voices: Official company voices
Character Voices: Specific personas or characters
Language Variants: Same voice in different languages
Seasonal/Campaign: Temporary or promotional voices

Tagging System:

Use consistent tags for easy filtering
Include language, gender, style descriptors
Add use case tags (customer service, sales, etc.)

Voice Analytics

Usage Tracking:

Synthesis Count: Number of times voice was used
Duration Metrics: Total audio generated
Cost Tracking: Provider usage and billing
Performance: Quality scores and user feedback

Optimization Insights:

Most/least used voices
Cost per synthesis by provider
Quality trends over time
User preference patterns

Voice Testing

Test your cloned voices before deployment:

Text-to-Speech Preview: Enter sample text to hear the voice
Quality Assessment: Evaluate clarity, naturalness, and accuracy
Comparison Testing: Compare with original samples and other voices
A/B Testing: Test different voices with real users

API Integration

Upload Voice Sample

curl -X POST "https://api.burki.dev/assistants/123/voice-samples/upload" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@voice_sample.wav" \
  -F "name=Professional Voice" \
  -F "description=Clear professional speaking voice"

Create Cloned Voice

curl -X POST "https://api.burki.dev/assistants/123/cloned-voices/create" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "voice_sample_id": 456,
    "provider": "elevenlabs",
    "name": "Custom Professional Voice",
    "language": "en",
    "enhance_quality": true
  }'

List Cloned Voices

curl -X GET "https://api.burki.dev/assistants/123/cloned-voices" \
  -H "Authorization: Bearer YOUR_API_KEY"

Best Practices

Recording Quality

Professional Recording Setup

Equipment Recommendations:

Microphone: USB condenser microphone (Audio-Technica AT2020USB+)
Environment: Quiet room with minimal echo
Software: Audacity, GarageBand, or professional DAW
Monitoring: Use headphones to monitor audio quality

Recording Techniques:

Consistent Distance: Maintain 6-12 inches from microphone
Proper Levels: Keep audio peaks between -12dB and -6dB
Room Treatment: Use blankets or acoustic foam to reduce echo
Multiple Takes: Record several versions and choose the best

Content Selection

Ideal Voice Sample Content:

Varied Sentences: Different sentence structures and lengths
Emotional Range: Include slight variations in tone
Natural Speech: Conversational, not reading tone
Complete Thoughts: Full sentences with natural pauses

What to Avoid:

Background noise or music
Multiple speakers
Heavy accents (unless desired)
Monotone or robotic delivery
Incomplete sentences or stuttering

Voice Management

Naming Convention: Use descriptive, consistent names
Version Control: Keep track of voice iterations and improvements
Usage Documentation: Document which voices work best for different scenarios
Regular Testing: Periodically test voice quality and user satisfaction
Cost Monitoring: Track usage and costs across different providers

Security and Privacy

Privacy Considerations: Voice cloning involves processing personal audio data. Ensure you have proper consent and follow privacy regulations when using voice samples.

Consent: Always obtain explicit consent before using someone’s voice
Data Protection: Store voice samples securely and follow GDPR/CCPA requirements
Access Control: Limit who can create and manage cloned voices
Audit Trail: Keep logs of voice creation and usage
Retention Policy: Define how long voice samples and models are stored

Troubleshooting

Common Issues

Upload Problems

File Upload Fails:

Check file format is supported (MP3, WAV, FLAC, M4A, OGG)
Ensure file size is under 50MB
Verify audio duration is between 10 seconds and 10 minutes
Check internet connection stability

Audio Quality Issues:

Use higher sample rate (22kHz+) and bit rate (128kbps+)
Remove background noise using audio editing software
Re-record in a quieter environment
Check microphone positioning and levels

Voice Creation Issues

Cloning Process Fails:

Verify provider API credentials are valid
Check account balance with voice cloning provider
Ensure voice sample meets provider requirements
Contact provider support for specific error messages

Poor Voice Quality:

Use higher quality source audio
Try different provider (ElevenLabs vs Resemble AI)
Experiment with quality enhancement settings
Consider recording new samples with better equipment

Performance Issues

Slow Processing:

Provider processing times vary (ElevenLabs: seconds, Resemble: minutes)
Check provider status pages for service issues
Large files take longer to process
Peak usage times may cause delays

High Costs:

Monitor usage through analytics dashboard
Set usage limits and alerts
Compare provider pricing for your use case
Optimize voice selection for cost efficiency

Provider Comparison

Feature	ElevenLabs	Resemble AI	Coming Soon
Processing Time	Seconds	Minutes	Varies
Quality	Excellent	Excellent	TBD
Languages	29+	English+	TBD
Cost Model	Per character	Per synthesis	TBD
Sample Requirements	30s+	60s+	TBD
Instant Preview	✅	❌	TBD
Emotional Control	Basic	Advanced	TBD
Enterprise Features	Limited	Full	TBD

Use Cases

Customer Service

Consistent Brand Voice: Maintain brand identity across all interactions
Multilingual Support: Create voices in different languages for global support
Personality Matching: Match voice characteristics to brand personality

Sales and Marketing

Campaign Voices: Create specific voices for marketing campaigns
Regional Variants: Adapt voices for different geographical markets
Seasonal Adjustments: Modify voice characteristics for holidays or events

Entertainment and Media

Character Voices: Create unique voices for virtual characters
Narrator Voices: Professional voices for content narration
Interactive Experiences: Engaging voices for games and interactive media

Enterprise Applications

Executive Voices: Clone executive voices for consistent communication
Training Systems: Consistent voices for e-learning and training
Brand Ambassadors: Virtual representatives with authentic brand voices

Getting Help

📖 Documentation

Complete TTS provider documentation

🎛️ Voice Tuning

Advanced voice configuration guide

💬 Community Support

Get help from the community

🎧 Technical Support

Contact our support team

Getting Started

Core Concepts

AI Providers

Features

Advanced

Help & Resources

​Overview

🎙️ Voice Sample Upload

🤖 AI Voice Training

📊 Usage Analytics

🔧 Easy Integration

​Supported Providers

​ElevenLabs

​Resemble AI

​Future Providers

​Voice Sample Requirements

​Audio Quality Guidelines

​Getting Started

​Step 1: Upload Voice Sample

​Step 2: Create Cloned Voice

​Step 3: Use Cloned Voice

​Voice Management

​Voice Library

​Voice Testing

​API Integration

​Upload Voice Sample

​Create Cloned Voice

​List Cloned Voices

​Best Practices

​Recording Quality

​Voice Management

​Security and Privacy

​Troubleshooting

​Common Issues

​Provider Comparison

​Use Cases

​Customer Service

​Sales and Marketing

​Entertainment and Media

​Enterprise Applications

​Getting Help

📖 Documentation

🎛️ Voice Tuning

💬 Community Support

🎧 Technical Support

Overview

Supported Providers

ElevenLabs

Resemble AI

Future Providers

Voice Sample Requirements

Audio Quality Guidelines

Getting Started

Step 1: Upload Voice Sample

Step 2: Create Cloned Voice

Step 3: Use Cloned Voice

Voice Management

Voice Library

Voice Testing

API Integration

Upload Voice Sample

Create Cloned Voice

List Cloned Voices

Best Practices

Recording Quality

Voice Management

Security and Privacy

Troubleshooting

Common Issues

Provider Comparison

Use Cases

Customer Service

Sales and Marketing

Entertainment and Media

Enterprise Applications

Getting Help