The problem with JSON tool menus
Standard agent frameworks give the LLM a list of JSON-schema tools. Each turn, the model picks one (or a few), calls them, reads the results, and picks the next. For simple tasks this works. For complex tasks that require composing several operations — look up contacts, query knowledge, send communications, schedule follow-ups — it creates problems:- Combinatorial explosion: 5 sequential tool calls means 5 round-trips where the model re-reads everything each time
- No variables: results from step 1 can’t be referenced in step 3 except by re-stating them in natural language
- No control flow: loops, conditionals, and error handling require the model to simulate them turn-by-turn
- Context bloat: each tool call and result gets appended to the conversation, consuming the context window
CodeAct: programs as plans
Unity’s Actor doesn’t pick from a tool menu. It writes Python:primitives.* API available — the same typed interfaces the rest of the system uses. One program per turn, with variables, loops, and real control flow.
The term “CodeAct” comes from the ICML 2024 paper “Executable Code Actions Elicit Better LLM Agents”. Unity’s implementation extends the concept with steerable handles: each await primitives.X.method(...) call returns a SteerableToolHandle that can be steered from the outer loop.
The primitives API
The Actor’s sandbox exposes aprimitives namespace that maps to real manager APIs:
| Namespace | Maps to | Examples |
|---|---|---|
primitives.contacts | ContactManager | .ask("Who is Alice?"), .update("Add Alice's phone: ...") |
primitives.knowledge | KnowledgeManager | .ask("What's the Q3 revenue?"), .update("Record that ...") |
primitives.tasks | TaskScheduler | .ask("What's pending?"), .update("Create a task to ..."), .execute("Run the weekly report") |
primitives.transcripts | TranscriptManager | .ask("What did we discuss yesterday?") |
primitives.web | WebSearcher | .ask("What's the weather in London?") |
primitives.files | FileManager | .ask("Summarize the attached PDF"), .parse("Extract tables from ...") |
primitives.secrets | SecretManager | .ask("Do I have a Gmail token?"), .update("Store this API key") |
primitives.data | DataManager | Low-level filter, search, reduce, join operations |
Returning handles for steering
When the Actor’s code calls a primitive, the result is aSteerableToolHandle. The Actor can either:
- Await the result for immediate use in the next line of code
- Return the handle as the last expression, handing steering control back to the ConversationManager
How it compares
vs. HermesAgent’s PTC (Programmatic Tool Calling)
HermesAgent hasexecute_code which runs a Python script that calls tools via RPC. The goal is similar — batch many tool steps into one LLM turn to reduce context growth. But PTC is primarily an optimization: the script calls generic tools (file operations, shell commands) and only the script’s stdout goes into context.
CodeAct in Unity is architecturally different: the program calls typed domain primitives (primitives.contacts.ask, primitives.knowledge.update) that each spawn their own steerable LLM tool loop. It’s not batching generic tool calls — it’s composing domain-specific intelligence.
vs. LangGraph code nodes
LangGraph lets you write Python in graph nodes, but those nodes are statically defined at graph construction time. The LLM doesn’t write the graph — a human does. CodeAct lets the LLM write the program dynamically in response to each request.Where to start reading
| File | What’s there |
|---|---|
unity/actor/code_act_actor.py | The CodeActActor — plan generation, sandbox, execution |
unity/actor/environments/state_managers.py | How primitives are wired into the sandbox |
unity/function_manager/primitives/registry.py | How the typed Primitives API surface is assembled |
unity/actor/base.py | Abstract Actor interface |
