The problem
Most agent frameworks go quiet while working. The user sends a request, the agent runs tools for 30 seconds, then responds. During a voice call, this creates awkward silence. Real human assistants don’t work this way. They acknowledge your request, keep the conversation going, provide progress updates, and weave results in naturally when they arrive.Two brains, one conversation
Unity splits voice interactions into two processes: Slow brain — the ConversationManager. Sees the full picture: all conversations across channels, notifications, in-flight actions, memory. Makes deliberate decisions about what to do. Runs in the main process. Fast brain — a real-time voice agent on LiveKit, running as a separate subprocess. Sub-second latency. Handles the conversation autonomously — listening, responding, managing turn-taking. They communicate over IPC. When the slow brain finishes a task or wants to guide the conversation, it sends the fast brain a notification with one of three modes:| Mode | Behavior |
|---|---|
| SPEAK | ”Say exactly this” — bypasses the fast brain’s LLM entirely, directly synthesizing speech |
| NOTIFY | ”Here’s some context, decide what to do with it” — the fast brain’s LLM decides how to weave it in |
| BLOCK | Nothing — the fast brain continues on its own |
How it plays out
- User says: “Can you research flights to Tokyo for next week?”
- Fast brain immediately responds: “Sure, let me look into that for you.”
- Slow brain starts
actor.act("Research flights to Tokyo...")→ returns a steerable handle - While the Actor runs (querying web, comparing prices), the fast brain continues the conversation normally
- Slow brain sends NOTIFY: “Found 3 direct flights, cheapest is ¥85,000 on ANA”
- Fast brain weaves it in naturally: “I’ve found a few options — the best deal looks like an ANA direct flight for about 85,000 yen.”
Speech urgency
Not all slow brain outputs are equal. A progress update on a background task can wait for a natural pause in conversation. But if the user asks “what did you find?” and the results just came in, the fast brain needs to respond immediately. A speech urgency evaluator runs on each slow brain notification. It can preempt the current fast brain turn if the notification is urgent enough — for example, if the user just asked a question that the notification directly answers.Concurrent actions during voice
The ConversationManager tracks all running actions:Where to start reading
| File | What’s there |
|---|---|
unity/conversation_manager/conversation_manager.py | Dual-brain orchestration, in-flight actions |
unity/conversation_manager/domains/brain.py | Slow brain decision loop |
unity/conversation_manager/domains/brain_action_tools.py | How the brain starts, steers, and tracks concurrent work |
unity/conversation_manager/medium_scripts/call.py | Fast brain (voice agent) implementation |
unity/conversation_manager/prompt_builders.py | Voice agent prompt construction |
