Prompting
How you talk to an LLM, and when you stop.
Prompt engineering is the discipline of writing inputs that produce reliable outputs. Patterns that work (system prompts, few-shot examples, chain-of-thought), patterns that look right but break, and the cases where prompting hits its ceiling and fine-tuning takes over. The concepts below cover both the practical craft and the structural limits that make prompts a brittle dependency once your task scales past exploration.
- Chain-of-Thought
Chain-of-thought (CoT) prompting asks the model to produce intermediate reasoning steps before its final answer. The intermediate tokens act as a scratchpad.
- Constrained Decoding
Constrained decoding restricts an LLM's next-token distribution to only tokens that keep the partial output valid against a grammar or schema.
- Few-Shot Prompting
Few-shot prompting is the technique of including 2-5 input/output examples in the prompt to demonstrate the desired behavior. It works dramatically better than describing the rule in words because the model picks up on format, edge cases.
- In-Context Learning
In-context learning (ICL) is the empirical phenomenon that LLMs can adapt to new tasks from examples in the prompt — without any weight updates.
- Prompt Caching
Prompt caching is the API-side feature that lets you reuse a model provider's KV cache for a stable prompt prefix across requests. You mark the prefix as cacheable, the provider keeps its KV cache warm.
- Prompt Engineering
Prompt engineering is the practice of writing inputs to an LLM that reliably produce the outputs you want. It includes structure (system prompts, few-shot examples) and reasoning patterns (chain-of-thought).
- Prompt Injection
Prompt injection is adversarial input that hijacks an LLM's instruction-following — making the model treat attacker text as if it came from the developer.
- Prompt Template
A prompt template is a parameterized, reusable prompt — variables filled in at runtime from request data. In production systems, prompt templates are first-class artifacts: versioned, tested, and A/B-deployed like any other code.
- ReAct Prompting
ReAct (Reasoning + Acting) is the prompting pattern where the model alternates between thoughts, tool actions, and observations in a loop. It's the foundational structure behind nearly every modern LLM agent.
- Self-Consistency
Self-consistency samples N independent chain-of-thought reasoning paths and majority-votes the final answer. It's the cheapest test-time-compute trick.
- Structured Output
Structured output is the practice of forcing an LLM to produce machine-parseable output — JSON, XML, or any schema-conforming format — instead of free-form text.
- System Prompt
The system prompt is the privileged instruction channel — separate from user input — that sets a model's overall behavior, persona, and constraints.
- Temperature Sampling
Temperature is a scalar that divides the logits before softmax, controlling how peaked or flat the next-token distribution is. Temperature 0 is greedy decoding (always pick the argmax); higher temperatures sample more diversely.
- Top-p (Nucleus) Sampling
Top-p sampling restricts each step's sampling to the smallest set of tokens whose cumulative probability is at least p. Unlike top-k's fixed cutoff, the nucleus adapts to the distribution's shape.
- Tree-of-Thought
Tree-of-thought (ToT) generalizes chain-of-thought by exploring a search tree of reasoning paths instead of a single linear chain. The model branches into multiple candidate next steps, evaluates them, and backtracks when a branch goes wrong.
- Zero-Shot Prompting
Zero-shot prompting is asking a model to do a task with no examples — only a description of what you want. It works surprisingly well on common tasks because the model's training distribution already contains analogous patterns.
- Foundations 48
The bedrock primitives every other topic builds on.
- Data 18
The corpora, curation, and quality decisions that make models possible.
- Language Models 32
The foundational substrate of modern AI.
- Multimodal 13
When text isn't the only signal — vision, audio, and joint embedding spaces.
- Agents 12
When LLMs become decision-makers in a loop.
- Search & Retrieval 21
How systems find relevant documents in the first place.
- Embeddings 16
The dense-vector layer of modern retrieval.
- Rerankers 9
The second stage that puts the right answer at the top.
- Evaluation 21
How to measure retrieval quality and trust the numbers.
- Training Methodology 21
How modern retrieval models get their relevance signal.
- Performance Engineering 25
Squeezing throughput, latency, and memory out of GPUs.
- Production 16
From notebook to live traffic.
