Skip to main content

The problem with JSON tool menus

Standard agent frameworks give the LLM a list of JSON-schema tools. Each turn, the model picks one (or a few), calls them, reads the results, and picks the next. For simple tasks this works. For complex tasks that require composing several operations — look up contacts, query knowledge, send communications, schedule follow-ups — it creates problems:
  • Combinatorial explosion: 5 sequential tool calls means 5 round-trips where the model re-reads everything each time
  • No variables: results from step 1 can’t be referenced in step 3 except by re-stating them in natural language
  • No control flow: loops, conditionals, and error handling require the model to simulate them turn-by-turn
  • Context bloat: each tool call and result gets appended to the conversation, consuming the context window

CodeAct: programs as plans

Unity’s Actor doesn’t pick from a tool menu. It writes Python:
contacts = await primitives.contacts.ask(
    "Who was involved in the Henderson project?"
)
for contact in contacts:
    history = await primitives.knowledge.ask(
        f"What was {contact} last working on?"
    )
    await primitives.contacts.update(
        f"Send {contact} a catch-up email referencing {history}"
    )
This runs in a sandboxed execution session with the full primitives.* API available — the same typed interfaces the rest of the system uses. One program per turn, with variables, loops, and real control flow. The term “CodeAct” comes from the ICML 2024 paper “Executable Code Actions Elicit Better LLM Agents”. Unity’s implementation extends the concept with steerable handles: each await primitives.X.method(...) call returns a SteerableToolHandle that can be steered from the outer loop.

The primitives API

The Actor’s sandbox exposes a primitives namespace that maps to real manager APIs:
NamespaceMaps toExamples
primitives.contactsContactManager.ask("Who is Alice?"), .update("Add Alice's phone: ...")
primitives.knowledgeKnowledgeManager.ask("What's the Q3 revenue?"), .update("Record that ...")
primitives.tasksTaskScheduler.ask("What's pending?"), .update("Create a task to ..."), .execute("Run the weekly report")
primitives.transcriptsTranscriptManager.ask("What did we discuss yesterday?")
primitives.webWebSearcher.ask("What's the weather in London?")
primitives.filesFileManager.ask("Summarize the attached PDF"), .parse("Extract tables from ...")
primitives.secretsSecretManager.ask("Do I have a Gmail token?"), .update("Store this API key")
primitives.dataDataManagerLow-level filter, search, reduce, join operations
Each of these is a natural-language interface. The parameter is plain English, not a structured query. The manager’s internal LLM tool loop figures out how to execute it.

Returning handles for steering

When the Actor’s code calls a primitive, the result is a SteerableToolHandle. The Actor can either:
  • Await the result for immediate use in the next line of code
  • Return the handle as the last expression, handing steering control back to the ConversationManager
# Option 1: Await (Actor uses the result immediately)
answer = await primitives.contacts.ask("What's Alice's email?")
await primitives.contacts.update(f"Send Alice at {answer} a reminder")

# Option 2: Return handle (user can steer a long-running operation)
return await primitives.tasks.execute("Generate the quarterly report")
Option 2 is preferred for long-running operations — it lets the user pause, interject, or redirect the task through the ConversationManager while it’s running.

How it compares

vs. HermesAgent’s PTC (Programmatic Tool Calling)

HermesAgent has execute_code which runs a Python script that calls tools via RPC. The goal is similar — batch many tool steps into one LLM turn to reduce context growth. But PTC is primarily an optimization: the script calls generic tools (file operations, shell commands) and only the script’s stdout goes into context. CodeAct in Unity is architecturally different: the program calls typed domain primitives (primitives.contacts.ask, primitives.knowledge.update) that each spawn their own steerable LLM tool loop. It’s not batching generic tool calls — it’s composing domain-specific intelligence.

vs. LangGraph code nodes

LangGraph lets you write Python in graph nodes, but those nodes are statically defined at graph construction time. The LLM doesn’t write the graph — a human does. CodeAct lets the LLM write the program dynamically in response to each request.

Where to start reading

FileWhat’s there
unity/actor/code_act_actor.pyThe CodeActActor — plan generation, sandbox, execution
unity/actor/environments/state_managers.pyHow primitives are wired into the sandbox
unity/function_manager/primitives/registry.pyHow the typed Primitives API surface is assembled
unity/actor/base.pyAbstract Actor interface