Also known as: multi-agent orchestration, agent routing, supervisor agent
TL;DR
Agent orchestration is the routing layer that decides which agent or model handles each step. The dominant patterns are workflow orchestration (a deterministic graph of agents) and autonomous orchestration (a supervisor delegating to sub-agents).
Agent orchestration is the layer that decides which agent or model handles which step of a task. A single LLM in a single loop scales only so far — once tasks span domains (retrieval, code generation, image analysis, billing actions), you want different agents handling different parts. Orchestration is the meta-controller: routing, delegating, sequencing, joining results.
Two dominant patterns
The field has converged on two shapes, and most production systems run one wrapped around the other.
Workflow orchestration is a deterministic graph of agents. The developer writes the graph: classifier agent first, then if intent is “billing” route to the billing agent, if intent is “support” route to the RAG agent, then a faithfulness-check agent before responding. Frameworks like LangGraph, Temporal, and Inngest provide the scaffolding. Failures are localized; per-node latency is bounded; the trace is interpretable.
Autonomous orchestration puts a supervisor LLM at the top with sub-agents exposed as tools . The supervisor reads the user request, decides which sub-agent to call, observes the result, decides the next call. The orchestration logic lives in the supervisor’s prompt and weights, not in code. Anthropic’s “Computer Use” and AutoGen’s GroupChat are reference implementations.
Workflows are graphs you wrote. Autonomous orchestration is a graph the model builds at runtime. Production systems use a workflow at the outer layer and autonomy at the leaves.
When to pick which
Pick workflow orchestration when the task space is enumerable. Customer support has a finite list of intents; a triage workflow that routes to specialized agents per intent is more reliable, cheaper, and easier to evaluate than a supervisor figuring it out per call.
Pick autonomous orchestration when the task space is open. A coding agent that may need to read files, run tests, search the web, edit, and re-run cannot be hardcoded into a graph — there are too many edge transitions. A supervisor LLM with a small tool palette wins.
The catalog of orchestration patterns — coordinator + specialists, sequential pipeline, debate, hierarchical decomposition, map-reduce over data — lives in multi-agent systems . The orchestration question this article answers is narrower: once you’ve picked a pattern, how does the routing actually work?
Decision logic for the supervisor
The supervisor’s job is identical to a single agent picking a tool — it’s just that the “tools” are other agents. Each sub-agent gets registered with a name, a one-paragraph capability description, and an input schema. The supervisor selects via function calling , passes the input, awaits the result, and decides the next move.
Three levers. Sub-agent descriptions — the supervisor’s only ground truth about what each sub-agent does. Treat them like docstrings: specific, terse, with input/output examples. The supervisor’s system prompt — explicit guidance like “for any question about X, prefer agent Y” cuts misrouting more than expected. A small specialized router — for high-traffic systems, a fine-tuned classifier in front of the supervisor is faster, cheaper, and more accurate than the supervisor LLM doing the routing itself. Route with a small model, delegate to a small or large model per node.
A common production move: start with autonomous routing, log every supervisor decision and outcome, then promote the dominant pathways into hardcoded workflow edges. The supervisor handles the long tail; the workflow handles the head of the distribution.
What goes wrong
Compounding errors across hops. Two hops at 95% routing accuracy gives end-to-end. Keep the supervisor’s branching factor low.
Context fragmentation. Sub-agents see only what the supervisor passes. Information lost in the handoff is gone. Either pass full context (expensive) or design typed handoff schemas.
Loops between agents. A and B keep calling each other. Always cap inter-agent depth.
Cost stacking. Every supervisor turn calls a frontier LLM; every sub-agent turn calls another. A naive multi-agent system costs 5-10× a single agent for marginal quality gains.
Go further
Workflow vs autonomous — which should I start with?
Workflow. Always. A deterministic graph of specialized agents fails predictably, debugs cleanly, and gives you per-step latency budgets. Autonomous orchestration looks more impressive in demos but compounds errors across delegation hops and is hard to evaluate. Move to autonomy only when the task space is too varied to enumerate.
How do supervisor agents pick which sub-agent to call?
The same way a single agent picks a tool — function calling against a registry of sub-agent descriptions. Each sub-agent is presented as a tool with a name, capability summary, and input schema. The supervisor's prompt and the quality of those descriptions determine routing accuracy.
Cost, latency, and specialization. Routing a code-edit task to a coding-specialized model and a customer-tone task to a general chat model is cheaper and often better than running both through one frontier LLM. The orchestration layer is what makes a constellation of specialists usable.