Flow Matching

Also known as: rectified flow, conditional flow matching, CFM, flow-based generation, stochastic interpolant

TL;DR

A generative-modeling objective that learns a continuous vector field transporting noise to data along straight or curved probability paths. Generalizes and often replaces diffusion: simpler training, faster sampling, and the substrate behind SD3, Flux, and Veo.

Flow matching is a generative-modeling framework that learns a continuous vector field transporting samples from a simple prior (Gaussian noise) to a complex target distribution (data) along a chosen probability path. Training is regression of against the path’s true velocity; sampling is numerical ODE integration of the learned field. It is the framework that displaced diffusion at the top of the open-weight image-generation stack in 2024 and currently powers SD3, Flux, Veo, and Movie Gen.

The objective

Pick a probability path connecting noise to data . The rectified-flow choice is the simplest one: a straight line.

The instantaneous velocity along this path is — a constant, by construction. The model is trained to regress this velocity:

Sample , sample a noise-data pair, interpolate, regress. That’s the training loop. There is no separate noise schedule, no variational bound, no score-matching identity — just supervised regression on a velocity field, optimized with gradient descent .

Why this is different from diffusion

Diffusion learns the score of a stochastic process; flow matching learns the velocity of a deterministic interpolation. Diffusion samples via SDE integration with Brownian noise injected at every step (see Brownian motion for the upstream stochastic-calculus framing); flow matching samples via ODE integration with no stochasticity at all.

Flow matching is what diffusion looks like when you stop pretending the forward process has to be stochastic. The objective is plain regression, the sampler is plain Euler, and the math you’d need to defend at a stats seminar fits on a postcard.

The training objective is a clean regression. The sampler is a first-order ODE solver. With straight paths, generation runs in 1-4 steps versus 25-50 for DDPM — a 10-50× latency win on the same hardware.

Path choices

The framework is parametric in the path. Different choices yield different methods, all sharing the same regression machinery:

Flow-matching path families

Linear / rectified flow (Liu et al. 2022, Lipman et al. 2023) — straight lines from noise to data. Constant velocity, optimal for Euler integration. The dominant choice in 2024-25 production models.
Diffusion-bridge paths — variance-preserving or variance-exploding schedules that recover the DDPM objective as a special case. Useful as a theoretical bridge; rarely the production choice.
Optimal transport paths — straight paths conditional on a specific (noise, data) coupling chosen to minimize average path length. The “ideal” path; harder to train because the coupling itself has to be approximated.
Stochastic interpolants (Albergo and Vanden-Eijnden, 2023) — a generalization framework that subsumes both diffusion and flow matching by allowing arbitrary deterministic + stochastic decompositions of the path.

Connection to Brownian motion and diffusion

Diffusion’s reverse SDE has a deterministic equivalent — the probability-flow ODE — that traces the same marginal distributions at every timestep without any noise injection. That ODE is a flow-matching solution. Flow matching just trains it directly rather than going through the SDE detour and then deriving the ODE from Anderson’s theorem. The upstream stochastic-calculus machinery built on Brownian motion is unnecessary if you only want the deterministic transport — you can write down the training objective without ever invoking an SDE.

A 25-step DDPM image at 1024×1024 takes ~2-4 seconds on an A100. A 4-step rectified-flow generation at the same resolution takes ~300-500 ms. For text-to-image at production scale — millions of queries per day, billions per month — that 6-10× latency cut is the difference between flow matching being a curiosity and being the default. Black Forest Labs’ Flux Schnell variant runs in 1-4 steps and was designed explicitly around this constraint: same architecture as Flux Dev, distilled to a few-step rectified-flow regime. The economics show up at every layer of the stack — GPU utilization, batch sizing, latency budgets, end-user perceived quality.

The framework is modality-agnostic. Veo (Google’s video model), CAT3D (multi-view 3D), and Movie Gen (Meta) all train flow-matching objectives over their respective tensor spaces — temporal latents for video, multi-view latents for 3D. The vector field operates over whatever tensor shape you pick; the path interpolation generalizes uniformly because it’s just a convex combination on Euclidean space. The architectural choices around conditioning (text injection, view-pose conditioning, temporal attention) are inherited from the diffusion era largely unchanged — a typical SD3-style vision transformer (DiT) backbone with CLIP and T5 text encoders. What flow matching changes is the objective and the sampler, not the rest of the stack.

The practical upshot: in 2024-25, “we trained a diffusion model” almost always means “we trained a rectified-flow model with a DiT backbone.” The vocabulary lags the practice. The terminology will likely converge over the next 18 months on flow matching as the umbrella name, with diffusion as the historical predecessor.

Go further

How is flow matching different from diffusion?

Diffusion learns to predict noise (or score) along a stochastic SDE that gradually corrupts data. Flow matching learns a deterministic vector field along arbitrary probability paths — typically straight lines from noise to data. Same generative power, simpler training objective ( regression on velocity), and much faster sampling.

Diffusion model

Why are straight paths better than curved ones?

Straight paths (rectified flow, Liu et al. 2022) discretize with much fewer ODE steps — often 1-4 steps versus 25-50 for DDPM. The intuition: a straight path has constant velocity, so first-order Euler integration is exact in the limit of perfect velocity prediction. Curved paths force the solver to take small steps to track curvature.

What models actually use flow matching now?

Stable Diffusion 3 (rectified flow transformer), Flux (Black Forest Labs), Veo (Google), and Movie Gen (Meta). The 2024-25 shift from diffusion to flow matching was driven by the sampling-speed advantage at production scale — when you serve millions of images a day, a 10x latency cut on inference is the entire business case.

Text-to-image

← All concepts

The best AI teams build with ZeroEntropy models

Book Demo View docs