What is an Assistant?
An Assistant is a virtual agent that can answer calls, talk to users, and perform tasks using AI. Each assistant has its own configuration for language, voice, hearing, and integrations.- You can create multiple assistants for different use cases (e.g., sales, support, info lines).
- Each assistant can have its own settings, phone number, and knowledge base.
Multi-Tenancy & Modularity
- Multi-Tenancy: Manage multiple organizations and assistants from a single platform. Great for agencies, teams, or businesses with many brands.
- Modularity: Swap out any AI provider (LLM, TTS, STT) at any time. Mix and match to get the best results for your needs.
Why Modularity Matters
Why Modularity Matters
- Try new AI models as they launch—no lock-in!
- Use fallback providers for reliability
- Optimize for speed, cost, or quality as needed
How a Call Flows Through the System
- STT: Converts the caller’s speech to text (e.g., Deepgram)
- LLM: Understands and generates a response (e.g., OpenAI, Claude, Gemini, etc.)
- TTS: Converts the AI’s response back to speech (e.g., ElevenLabs, Deepgram, Inworld)
- Fallbacks: If the main LLM fails, the system automatically tries your backup providers
Configuration Sections Overview
- Basic Info: Name, description, and branding
- AI Configuration: Pick your LLM, TTS, and STT providers and models
- RAG (Knowledge Base): Add custom knowledge for smarter answers
- Call Management: Control call flow, interruptions, and recordings
- Integrations: Connect Twilio, Deepgram, and more
- Advanced: Fine-tune settings for power users
Architecture
Burki Voice AI is designed with modularity, scalability, and extensibility in mind. This page provides a high-level overview of the system architecture and codebase structure.System Overview
Folder Structure
Component Interaction
- Web Dashboard: Allows users to manage assistants, view analytics, and configure settings via a modern UI.
- API Layer: Handles all RESTful requests, authentication, and business logic.
- Telephony (Twilio): Receives and manages phone calls, streams audio to the backend.
- Speech-to-Text (Deepgram): Transcribes incoming audio in real time.
- LLM (OpenAI, etc.): Generates intelligent responses based on transcribed text.
- Text-to-Speech (ElevenLabs, etc.): Converts responses to natural-sounding speech.
- Audio Denoising: Cleans up audio streams for better recognition and user experience.
- Database: Stores user, assistant, call, and analytics data.
- S3 Storage: Stores call recordings and large audio files.