Skip to main content

Overview

Burki Voice AI’s voice cloning feature enables you to:
  • Upload Voice Samples: Upload high-quality audio recordings to create voice models
  • Multi-Provider Support: Use ElevenLabs, Resemble AI, and other providers that support voice cloning
  • Instant Voice Creation: Generate cloned voices ready for immediate use
  • Voice Management: Organize, test, and manage your custom voices
  • Usage Analytics: Track voice usage for billing and optimization

πŸŽ™οΈ Voice Sample Upload

Upload audio samples with validation and processing

πŸ€– AI Voice Training

Provider-powered voice training with quality optimization

πŸ“Š Usage Analytics

Track synthesis usage and voice performance

πŸ”§ Easy Integration

Seamless integration with existing assistant configurations

Supported Providers

ElevenLabs

  • Instant Voice Cloning: Create voices from single audio samples
  • High Quality: Professional-grade voice synthesis
  • Multiple Languages: Support for 29+ languages
  • Quick Processing: Voices ready in seconds

Resemble AI

  • Professional Training: Advanced voice training algorithms
  • Custom Models: Highly personalized voice characteristics
  • Unlimited Voices: Create as many voices as needed
  • Enterprise Features: Advanced customization options

Future Providers

  • Inworld AI: Coming soon with emotional voice cloning
  • OpenAI: Voice cloning capabilities when available

Voice Sample Requirements

Audio Quality Guidelines

Supported Formats:
  • MP3 (recommended)
  • WAV (highest quality)
  • FLAC (lossless)
  • M4A/AAC
  • OGG
Technical Specifications:
  • Sample Rate: 22kHz or higher
  • Bit Rate: 128kbps minimum
  • Channels: Mono preferred, stereo acceptable
  • File Size: Maximum 50MB
Duration Requirements:
  • Minimum: 10 seconds of clear speech
  • Recommended: 30-60 seconds for better quality
  • Maximum: 10 minutes (longer samples may not improve quality)
Content Guidelines:
  • Clear Speech: No background noise or music
  • Natural Tone: Conversational, not monotone
  • Consistent Volume: Steady audio levels throughout
  • Single Speaker: Only the target voice in the recording
For Best Results:
  1. Environment: Record in a quiet room with soft furnishings
  2. Microphone: Use a quality microphone 6-12 inches from mouth
  3. Content: Read varied sentences with different emotions
  4. Consistency: Maintain the same speaking style throughout
  5. Format: Save in WAV format for highest quality

Getting Started

Step 1: Upload Voice Sample

Navigate to your assistant’s configuration and open the Voice Cloning section:
  1. Upload Audio File: Drag and drop or click to select your audio file
  2. Add Metadata: Provide a name, description, and tags
  3. Validation: System automatically validates audio quality
  4. Processing: File is uploaded and prepared for cloning
Example Upload
const formData = new FormData();
formData.append('file', audioFile);
formData.append('name', 'Professional Voice');
formData.append('description', 'Clear, professional speaking voice');
formData.append('tags', 'professional, clear, business');

const response = await fetch('/assistants/123/voice-samples/upload', {
  method: 'POST',
  body: formData,
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY'
  }
});

Step 2: Create Cloned Voice

Once your sample is uploaded, create a cloned voice:
  1. Select Provider: Choose ElevenLabs or Resemble AI
  2. Configure Options: Set voice name, language, and quality settings
  3. Initiate Cloning: Start the voice training process
  4. Monitor Progress: Track cloning status in real-time
Example Voice Creation
const response = await fetch('/assistants/123/cloned-voices/create', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'Authorization': 'Bearer YOUR_API_KEY'
  },
  body: JSON.stringify({
    voice_sample_id: 456,
    provider: 'elevenlabs',
    name: 'Custom Professional Voice',
    language: 'en',
    enhance_quality: true
  })
});

Step 3: Use Cloned Voice

Once processing is complete, assign the voice to your assistant:
  1. Voice Selection: Choose from your cloned voices
  2. Testing: Preview the voice with sample text
  3. Assignment: Set as the assistant’s default voice
  4. Go Live: Start using the voice in live calls

Voice Management

Voice Library

Voice Categories:
  • Brand Voices: Official company voices
  • Character Voices: Specific personas or characters
  • Language Variants: Same voice in different languages
  • Seasonal/Campaign: Temporary or promotional voices
Tagging System:
  • Use consistent tags for easy filtering
  • Include language, gender, style descriptors
  • Add use case tags (customer service, sales, etc.)
Usage Tracking:
  • Synthesis Count: Number of times voice was used
  • Duration Metrics: Total audio generated
  • Cost Tracking: Provider usage and billing
  • Performance: Quality scores and user feedback
Optimization Insights:
  • Most/least used voices
  • Cost per synthesis by provider
  • Quality trends over time
  • User preference patterns

Voice Testing

Test your cloned voices before deployment:
  1. Text-to-Speech Preview: Enter sample text to hear the voice
  2. Quality Assessment: Evaluate clarity, naturalness, and accuracy
  3. Comparison Testing: Compare with original samples and other voices
  4. A/B Testing: Test different voices with real users

API Integration

Upload Voice Sample

curl -X POST "https://api.burki.dev/assistants/123/voice-samples/upload" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@voice_sample.wav" \
  -F "name=Professional Voice" \
  -F "description=Clear professional speaking voice"

Create Cloned Voice

curl -X POST "https://api.burki.dev/assistants/123/cloned-voices/create" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "voice_sample_id": 456,
    "provider": "elevenlabs",
    "name": "Custom Professional Voice",
    "language": "en",
    "enhance_quality": true
  }'

List Cloned Voices

curl -X GET "https://api.burki.dev/assistants/123/cloned-voices" \
  -H "Authorization: Bearer YOUR_API_KEY"

Best Practices

Recording Quality

Equipment Recommendations:
  • Microphone: USB condenser microphone (Audio-Technica AT2020USB+)
  • Environment: Quiet room with minimal echo
  • Software: Audacity, GarageBand, or professional DAW
  • Monitoring: Use headphones to monitor audio quality
Recording Techniques:
  1. Consistent Distance: Maintain 6-12 inches from microphone
  2. Proper Levels: Keep audio peaks between -12dB and -6dB
  3. Room Treatment: Use blankets or acoustic foam to reduce echo
  4. Multiple Takes: Record several versions and choose the best
Ideal Voice Sample Content:
  • Varied Sentences: Different sentence structures and lengths
  • Emotional Range: Include slight variations in tone
  • Natural Speech: Conversational, not reading tone
  • Complete Thoughts: Full sentences with natural pauses
What to Avoid:
  • Background noise or music
  • Multiple speakers
  • Heavy accents (unless desired)
  • Monotone or robotic delivery
  • Incomplete sentences or stuttering

Voice Management

  1. Naming Convention: Use descriptive, consistent names
  2. Version Control: Keep track of voice iterations and improvements
  3. Usage Documentation: Document which voices work best for different scenarios
  4. Regular Testing: Periodically test voice quality and user satisfaction
  5. Cost Monitoring: Track usage and costs across different providers

Security and Privacy

Privacy Considerations: Voice cloning involves processing personal audio data. Ensure you have proper consent and follow privacy regulations when using voice samples.
  • Consent: Always obtain explicit consent before using someone’s voice
  • Data Protection: Store voice samples securely and follow GDPR/CCPA requirements
  • Access Control: Limit who can create and manage cloned voices
  • Audit Trail: Keep logs of voice creation and usage
  • Retention Policy: Define how long voice samples and models are stored

Troubleshooting

Common Issues

File Upload Fails:
  • Check file format is supported (MP3, WAV, FLAC, M4A, OGG)
  • Ensure file size is under 50MB
  • Verify audio duration is between 10 seconds and 10 minutes
  • Check internet connection stability
Audio Quality Issues:
  • Use higher sample rate (22kHz+) and bit rate (128kbps+)
  • Remove background noise using audio editing software
  • Re-record in a quieter environment
  • Check microphone positioning and levels
Cloning Process Fails:
  • Verify provider API credentials are valid
  • Check account balance with voice cloning provider
  • Ensure voice sample meets provider requirements
  • Contact provider support for specific error messages
Poor Voice Quality:
  • Use higher quality source audio
  • Try different provider (ElevenLabs vs Resemble AI)
  • Experiment with quality enhancement settings
  • Consider recording new samples with better equipment
Slow Processing:
  • Provider processing times vary (ElevenLabs: seconds, Resemble: minutes)
  • Check provider status pages for service issues
  • Large files take longer to process
  • Peak usage times may cause delays
High Costs:
  • Monitor usage through analytics dashboard
  • Set usage limits and alerts
  • Compare provider pricing for your use case
  • Optimize voice selection for cost efficiency

Provider Comparison

FeatureElevenLabsResemble AIComing Soon
Processing TimeSecondsMinutesVaries
QualityExcellentExcellentTBD
Languages29+English+TBD
Cost ModelPer characterPer synthesisTBD
Sample Requirements30s+60s+TBD
Instant Previewβœ…βŒTBD
Emotional ControlBasicAdvancedTBD
Enterprise FeaturesLimitedFullTBD

Use Cases

Customer Service

  • Consistent Brand Voice: Maintain brand identity across all interactions
  • Multilingual Support: Create voices in different languages for global support
  • Personality Matching: Match voice characteristics to brand personality

Sales and Marketing

  • Campaign Voices: Create specific voices for marketing campaigns
  • Regional Variants: Adapt voices for different geographical markets
  • Seasonal Adjustments: Modify voice characteristics for holidays or events

Entertainment and Media

  • Character Voices: Create unique voices for virtual characters
  • Narrator Voices: Professional voices for content narration
  • Interactive Experiences: Engaging voices for games and interactive media

Enterprise Applications

  • Executive Voices: Clone executive voices for consistent communication
  • Training Systems: Consistent voices for e-learning and training
  • Brand Ambassadors: Virtual representatives with authentic brand voices

Getting Help