Topic · 16 concepts

Prompting

How you talk to an LLM, and when you stop.

Prompt engineering is the discipline of writing inputs that produce reliable outputs. Patterns that work (system prompts, few-shot examples, chain-of-thought), patterns that look right but break, and the cases where prompting hits its ceiling and fine-tuning takes over. The concepts below cover both the practical craft and the structural limits that make prompts a brittle dependency once your task scales past exploration.

Chain-of-Thought

Chain-of-thought (CoT) prompting asks the model to produce intermediate reasoning steps before its final answer. The intermediate tokens act as a scratchpad.
Constrained Decoding

Constrained decoding restricts an LLM's next-token distribution to only tokens that keep the partial output valid against a grammar or schema.
Few-Shot Prompting

Few-shot prompting is the technique of including 2-5 input/output examples in the prompt to demonstrate the desired behavior. It works dramatically better than describing the rule in words because the model picks up on format, edge cases.
In-Context Learning

In-context learning (ICL) is the empirical phenomenon that LLMs can adapt to new tasks from examples in the prompt — without any weight updates.
Prompt Caching

Prompt caching is the API-side feature that lets you reuse a model provider's KV cache for a stable prompt prefix across requests. You mark the prefix as cacheable, the provider keeps its KV cache warm.
Prompt Engineering

Prompt engineering is the practice of writing inputs to an LLM that reliably produce the outputs you want. It includes structure (system prompts, few-shot examples) and reasoning patterns (chain-of-thought).
Prompt Injection

Prompt injection is adversarial input that hijacks an LLM's instruction-following — making the model treat attacker text as if it came from the developer.
Prompt Template

A prompt template is a parameterized, reusable prompt — variables filled in at runtime from request data. In production systems, prompt templates are first-class artifacts: versioned, tested, and A/B-deployed like any other code.
ReAct Prompting

ReAct (Reasoning + Acting) is the prompting pattern where the model alternates between thoughts, tool actions, and observations in a loop. It's the foundational structure behind nearly every modern LLM agent.
Self-Consistency

Self-consistency samples N independent chain-of-thought reasoning paths and majority-votes the final answer. It's the cheapest test-time-compute trick.
Structured Output

Structured output is the practice of forcing an LLM to produce machine-parseable output — JSON, XML, or any schema-conforming format — instead of free-form text.
System Prompt

The system prompt is the privileged instruction channel — separate from user input — that sets a model's overall behavior, persona, and constraints.
Temperature Sampling

Temperature is a scalar that divides the logits before softmax, controlling how peaked or flat the next-token distribution is. Temperature 0 is greedy decoding (always pick the argmax); higher temperatures sample more diversely.
Top-p (Nucleus) Sampling

Top-p sampling restricts each step's sampling to the smallest set of tokens whose cumulative probability is at least p. Unlike top-k's fixed cutoff, the nucleus adapts to the distribution's shape.
Tree-of-Thought

Tree-of-thought (ToT) generalizes chain-of-thought by exploring a search tree of reasoning paths instead of a single linear chain. The model branches into multiple candidate next steps, evaluates them, and backtracks when a branch goes wrong.
Zero-Shot Prompting

Zero-shot prompting is asking a model to do a task with no examples — only a description of what you want. It works surprisingly well on common tasks because the model's training distribution already contains analogous patterns.