Skip to main content

The problem

Most agent frameworks go quiet while working. The user sends a request, the agent runs tools for 30 seconds, then responds. During a voice call, this creates awkward silence. Real human assistants don’t work this way. They acknowledge your request, keep the conversation going, provide progress updates, and weave results in naturally when they arrive.

Two brains, one conversation

Unity splits voice interactions into two processes: Slow brain — the ConversationManager. Sees the full picture: all conversations across channels, notifications, in-flight actions, memory. Makes deliberate decisions about what to do. Runs in the main process. Fast brain — a real-time voice agent on LiveKit, running as a separate subprocess. Sub-second latency. Handles the conversation autonomously — listening, responding, managing turn-taking. They communicate over IPC. When the slow brain finishes a task or wants to guide the conversation, it sends the fast brain a notification with one of three modes:
ModeBehavior
SPEAK”Say exactly this” — bypasses the fast brain’s LLM entirely, directly synthesizing speech
NOTIFY”Here’s some context, decide what to do with it” — the fast brain’s LLM decides how to weave it in
BLOCKNothing — the fast brain continues on its own

How it plays out

  1. User says: “Can you research flights to Tokyo for next week?”
  2. Fast brain immediately responds: “Sure, let me look into that for you.”
  3. Slow brain starts actor.act("Research flights to Tokyo...") → returns a steerable handle
  4. While the Actor runs (querying web, comparing prices), the fast brain continues the conversation normally
  5. Slow brain sends NOTIFY: “Found 3 direct flights, cheapest is ¥85,000 on ANA”
  6. Fast brain weaves it in naturally: “I’ve found a few options — the best deal looks like an ANA direct flight for about 85,000 yen.”
The user never waits in silence. The conversation is alive throughout.

Speech urgency

Not all slow brain outputs are equal. A progress update on a background task can wait for a natural pause in conversation. But if the user asks “what did you find?” and the results just came in, the fast brain needs to respond immediately. A speech urgency evaluator runs on each slow brain notification. It can preempt the current fast brain turn if the notification is urgent enough — for example, if the user just asked a question that the notification directly answers.

Concurrent actions during voice

The ConversationManager tracks all running actions:
┌─ In-Flight Actions ─────────────────────────────────┐
│                                                      │
│  [0] research_flights  ██████████░░░  In progress    │
│      → ask, interject, stop, pause                   │
│                                                      │
│  [1] draft_summary     ████████████░  In progress    │
│      → ask, interject, stop, pause                   │
│                                                      │
└──────────────────────────────────────────────────────┘
Each action gets its own dynamically generated steering tools. During a voice call, the user can say “how’s the flight search going?” or “stop the summary, I’ll do that myself” — and only the targeted action is affected. The slow brain routes the voice input to the correct action’s handle.

Where to start reading

FileWhat’s there
unity/conversation_manager/conversation_manager.pyDual-brain orchestration, in-flight actions
unity/conversation_manager/domains/brain.pySlow brain decision loop
unity/conversation_manager/domains/brain_action_tools.pyHow the brain starts, steers, and tracks concurrent work
unity/conversation_manager/medium_scripts/call.pyFast brain (voice agent) implementation
unity/conversation_manager/prompt_builders.pyVoice agent prompt construction