Approximately, with two important wrinkles. First, the chains in trained reasoning models are typically much longer than human-written CoT prompts elicit — thousands of tokens of internal deliberation per response. The model learned during RL post-training that long chains earn higher reward on its training distribution, and it generates accordingly.
Second, the chains often look different from human-style reasoning. They include backtracking (“wait, that doesn’t work, let me try…”), self-checking, exploration of multiple approaches, and dead-end recovery. This isn’t an emergent property — it’s directly trained for, because the RL reward function favored chains that landed on correct answers. The model learned that “exploring then committing” beats “committing then defending”.
The third wrinkle is opacity. Many reasoning models hide the chain from the user; you see the final answer plus maybe a summary of the reasoning. This is partly a product decision (chains are noisy and hard to read) and partly a competitive one (the chain is a window into the model’s training, which providers want to obscure). For non-reasoning models, explicit CoT prompting gets you a substantial fraction of the gain at vastly lower training cost, but with the chain visible and the depth bounded by what the prompt can elicit.