ReAct Prompting

Also known as: ReAct, reasoning and acting, thought-action-observation loop

TL;DR

ReAct (Reasoning + Acting) is the prompting pattern where the model alternates between thoughts, tool actions, and observations in a loop. It's the foundational structure behind nearly every modern LLM agent.

ReAct — Reasoning + Acting — is the prompting pattern that turns a one-shot into something that takes actions in the world. The original ReAct paper (Yao et al., 2022) introduced a simple template: the model alternates between Thoughts (free-text reasoning), Actions (tool calls), and Observations (tool results), running in a loop until it produces a final answer. Nearly every modern agent — coding assistants, research agents, browser-using agents — is a ReAct descendant.

THOUGHT · ACTION · OBSERVATIONReasoning, grounded one step at a time.QUESTIONCompare the populations of Paris and Tokyo.1Thought:I need the population of Paris.Action:search("population of Paris 2024")Observation:~2.1 million (2024 estimate).2Thought:Now I need Tokyo for comparison.Action:search("population of Tokyo 2024")Observation:~13.96 million (2024 estimate).3Thought:Compute the ratio: 13.96 / 2.1.Action:calc(13.96 / 2.1)Observation:6.65Answer:Tokyo is roughly 6.6× the size of Paris.EACH CYCLE: REASON · CALL A TOOL · READ THE RESULT

The template

Question: <user query>

Thought: I need to find the population of Paris in 2024.
Action: search("population of Paris 2024")
Observation: As of 2024 estimates, Paris has approximately 2.1 million...

Thought: Now I need the population of Tokyo in 2024 to compare.
Action: search("population of Tokyo 2024")
Observation: Tokyo's 2024 population is roughly 13.96 million...

Thought: Tokyo is much larger. I have the answer.
Final Answer: Tokyo's population is roughly 6.6x Paris's in 2024.

The model emits the Thought and Action; the runtime executes the Action (calling a real search API), captures the result as an Observation, appends it to the prompt, and re-invokes the model. The model continues until it emits a Final Answer.

Why it dominates as the agent template

Pure is closed-loop: the model reasons against its own training memory and can intermediate facts. ReAct opens the loop by inserting real observations between reasoning steps — every fact gets grounded by an external check before the next thought builds on it. This collapses the error surface on any task that requires real-world information (search, retrieval, code execution, database queries).

It also gives the model a graceful failure mode: if a tool call returns “no results”, the model can adjust its strategy in the next Thought, instead of fabricating a plausible-looking answer.

A pure chain-of-thought trace is a stochastic walk through the model’s parametric memory. Every intermediate fact is something the model “thinks it remembers” — and the further into a multi-step reasoning chain you go, the more the chain compounds errors from imprecise recalls. ReAct breaks this loop by making each Observation an external check on the model’s claim. The model proposes “Paris has population 2.1 million,” the search tool returns “2.1 million as of 2024 estimates,” and the next reasoning step builds on a verified fact rather than a recalled approximation. Mathematically: if intermediate-fact accuracy in pure CoT is , then ReAct replaces each with for any tool that’s actually accurate (search, calculator, code execution). The hallucination surface area collapses because the integration of external truth at every step prevents drift compounding. The catch: ReAct can’t fix bad action selection — it only fixes bad observation use.

Modern incarnations

The original ReAct used free-text Action lines that a harness parsed with regex. That worked, but parsing failures were a major source of agent breakage. Modern stacks use / structured , where the model emits Actions in a dedicated structured slot the API guarantees parses cleanly. The semantics are identical — Thought, Action, Observation, loop — but reliability is dramatically higher because providers post-train models specifically on the tool-call format.

The generalizes ReAct further, adding things like reflection ( ), planning ( into subgoals), and multi-agent coordination. But the inner core — emit thought, take action, observe, repeat — is still ReAct.

Every modern agent — coding, research, browser-using, customer-support — is a ReAct loop with better tools and better post-training. The template is the field’s most durable abstraction.

Go further

What does a single ReAct step actually look like?

Three structured fields: Thought (free-text reasoning about what to do next), Action (the tool to call, often as a JSON object or DSL line), Observation (the tool's response, fed back into the prompt). The model emits Thought + Action, the harness executes the Action and appends the Observation, then the model continues. Every modern agent framework — LangChain agents, function-calling loops, coding agents — is a variant of this template.

Why does ReAct work better than pure chain-of-thought for agent tasks?

CoT is purely internal — the model can't check its reasoning against the world. ReAct lets each step ground itself in an Observation: the model proposes a calculation, calls a calculator, sees the actual result, and adjusts. This collapses the hallucination surface area dramatically on tasks where intermediate facts matter (web search, code execution, database queries).

How is modern function calling different from the original ReAct?

Mostly cosmetic. The original ReAct prompted free-text Action lines that the harness parsed; modern providers expose 'tool use' or 'function calling' as a first-class API, where the model emits structured tool calls in dedicated message slots. The semantics are identical — Thought, Action, Observation, repeat — but the parsing is reliable and the post-training is explicit, so it works much better.

ZeroEntropy
The best AI teams build with ZeroEntropy models
Follow us on
GitHubTwitterSlackLinkedInDiscord