Planning and Decomposition

Also known as: task decomposition, planning, subgoal decomposition, plan-and-execute

TL;DR

Planning and decomposition is the agent pattern of breaking a complex goal into ordered sub-tasks and executing them, instead of trying to one-shot the whole thing.

Planning is the agent capability of taking a vague high-level goal — “draft a competitive analysis of these three companies” — and producing a concrete ordered sequence of steps that, executed, produce the result. Decomposition is the breaking-up half: turning one fuzzy task into N small tractable sub-tasks. Together, they’re how an agent reasons about what to do before it starts doing it.

PLANNING · DECOMPOSITIONVague goal becomes a DAG of executable leaves.GOAL · DEPTH 0“draft a competitive analysis of three companies.”STEP 1 · DEPTH 1Profile each company3 companies × 2 sourcesSTEP 2 · DEPTH 1Compare on four axesprice · scale · gtm · moatSTEP 3 · DEPTH 1Write the briefoutline → sections → polish↳ DECOMPOSE FIRSTAcme — financialssearch_web()Acme — presssearch_news()Beta — financialssearch_web()Define axesplan()Pull metricsfetch_kpi()Score & rankrank()Draft outlinedraft()Fill sectionswrite()CROSS-DEPENDENCYDEPTH 0 · goalDEPTH 1 · planDEPTH 2 · executeLEAF CRITERIONeach leaf maps to a single tool call — no further planning required.A fuzzy goal arrives.

Why decomposition is the reasoning kernel

LLMs are good at single-step reasoning over a small surface. They are bad at one-shotting a 10-step task with branching dependencies. Decomposition exploits the first to overcome the second: each sub-task is small enough that the model handles it reliably, and the plan is itself the reasoning artifact that carries the trajectory.

This is closely related to prompting — but where CoT is about exposing reasoning within one model call, planning is about structuring reasoning across multiple model calls and tool invocations.

Common planning shapes

Planning shapes
  • Linear plan. N ordered steps; each depends on the previous. Simplest, easiest to debug, can’t parallelize.
  • DAG plan. Steps have explicit dependencies; independent branches run in parallel. Faster wall-clock but harder to construct and recover from.
  • Recursive decomposition. Each sub-task can itself be decomposed. Powerful for open-ended goals (research, codebase changes); unbounded if not budgeted.
  • Tree search. Multiple candidate plans explored in parallel ( ), best one selected. Expensive but more robust on hard problems.

Plan-then-execute vs interleaved

Two execution shapes dominate:

  • Plan-then-execute — the model writes the whole plan, then a separate loop executes each step. Legible, auditable, parallelizable. Brittle when the environment surprises the plan.
  • Interleaved (ReAct-style) — the model emits think → act → observe → think → act one step at a time. Adapts naturally to surprises. Harder to inspect; risks step-level myopia (chasing the immediate gradient instead of the goal).

Most capable production agents use a hybrid: a coarse plan upfront, with checkpoints that can replan if execution goes off the rails.

Treat each step as an independent Bernoulli with success probability . The probability the whole 10-step plan succeeds is . At that’s 0.60; at it’s 0.35; at it’s 0.11. The math is brutal: a step accuracy you’d ship as “great” in a single-call benchmark — 95% — produces a multi-step agent that fails almost half the time. The two ways out are (1) raise per-step accuracy via better tools / grounding / specialized models, since and , or (2) introduce error correction so steps aren’t independent — every reflection or verification step that catches a 90%-detectable error effectively turns into for that step. This is why the trillion-dollar agent question isn’t “can we plan?” but “can we recover when a step fails?”

What makes planning fail

  • Plans grounded in the wrong context. If the model didn’t retrieve enough about the actual problem before planning, the plan will be confidently wrong.
  • Tool naivety. A plan that assumes a tool exists or behaves differently than it does. Mitigated by accurate and rehearsing tool semantics.
  • Dependency invention. The model adds dependencies that aren’t real (“step 4 needs step 2’s output” when actually step 4 only needs step 1), serializing what could parallelize.
  • Plan-execution drift. The executor wanders from the plan one step at a time and never replans, so the trajectory silently diverges from the goal.

Planning is most valuable when the task is novel and ill-structured. For repeated workflows, hardcoded pipelines beat LLM planning on every metric.

The right amount of LLM planning is the smallest amount that handles the variation in your tasks. Everything else should be code.

Go further

Plan-then-execute or interleaved?

Plan-then-execute writes the full plan upfront, then walks it. Interleaved (ReAct-style) plans one step, executes, observes, plans the next. Plan-then-execute is more legible and parallelizable; interleaved adapts to surprises. Most production agents use a hybrid — high-level plan upfront, step-level decisions interleaved.

Why do agents fail at multi-step tasks even when each step is easy?

Error compounding. Per-step accuracy of 95% becomes 60% over 10 steps. Reliable multi-step execution requires either a much higher per-step accuracy (better tools, better grounding, better models) or active error correction via reflection between steps.

Should I let the LLM plan, or hardcode the plan?

Hardcode when the task structure is fixed and well-known — you'll get faster, cheaper, more reliable execution. Let the LLM plan when the structure depends on the user input or the environment. The middle ground — templated plans with model-filled-in slots — is often the right answer in practice.

ZeroEntropy
The best AI teams build with ZeroEntropy models
Follow us on
GitHubTwitterSlackLinkedInDiscord