Also known as: Wiener process, BM, standard Brownian motion
TL;DR
A continuous-time stochastic process with independent Gaussian increments. The continuous-state, continuous-time limit of a random walk — and the foundational object for stochastic calculus, diffusion models, and the noise terms in modern stochastic processes.
Brownian motion (also called the Wiener process) is a continuous-time stochastic process with three defining properties: it starts at zero (), its increments are independent (the change from to is independent of the path up to ), and those increments are Gaussian with mean zero and variance equal to the elapsed time:
Equivalently, it’s the continuous-time, continuous-state limit of a symmetric random walk: take an unbiased coin-flip walk with time-step and space-step , then let . What remains is Brownian motion.
Brownian motion is what you get when you take diffusion to its mathematical limit — independent Gaussian kicks, accumulated continuously. Every continuous stochastic process used in modern ML (diffusion models, neural SDEs, score matching) is built on this object as its noise source.
Defining properties
The three axioms uniquely characterize standard Brownian motion (up to a choice of probability space):
The three defining properties
Starts at zero. almost surely.
Independent increments. For any , the increment is independent of . This is what makes it Markovian.
Gaussian increments. — note that variance scales linearly with elapsed time, so the standard deviation grows as .
A fourth property — continuity of sample paths — follows from these but is usually called out explicitly. Brownian paths are continuous functions of with probability , but they are also nowhere differentiable — at every point, the path has infinite slope in both directions. This pathological smoothness is what makes ordinary calculus break and forces the development of stochastic calculus (Itô calculus, Stratonovich calculus).
The scaling
The single most important quantitative feature of Brownian motion is the scaling of its standard deviation. Particle physicists call this diffusive (versus ballistic, where displacement scales linearly with time). It has practical consequences:
A diffusive search covers distance in time , so visiting volume takes time . This is why random search is bad at high dimensions.
The Brownian path on a 2D plane is recurrent (returns arbitrarily close to its start infinitely often) but in 3D is transient — drifts off to infinity. The dimension matters because diffusive coverage of volume scales as but the volume of a ball grows polynomially.
Diffusion-model noise schedules choose how fast noise is added; the accumulation is what makes the forward SDE eventually destroy all signal.
Brownian motion is a Markov process — the future given the present is independent of the past — and this is direct from the independent-increments axiom. The transition density is Gaussian:
This is the heat kernel — the solution to the heat (diffusion) equation. The connection runs deep: Brownian motion is the stochastic process whose probability density evolves according to the heat equation, and reciprocally, the heat equation describes the smoothing-out of any initial distribution under Brownian-motion dynamics. See Markov chain for the discrete-time analog; Brownian motion is the continuous-time, continuous-state generalization.
Why this matters in modern ML
Brownian motion shows up in three places in the modern stack:
Diffusion models. The forward corruption process — gradually adding Gaussian noise to clean data — is a discretization of an SDE driven by Brownian motion. The reverse process learned by the model is the time-reversed SDE. Without Brownian motion as the noise source, the math doesn’t work.
Neural SDEs. A growing family of generative and dynamics models replaces deterministic neural ODEs with stochastic counterparts. The randomness is always Brownian — it’s the only continuous-time Gaussian-increment process.
Stochastic gradient descent analysis. SGD’s update rule with mini-batch noise is asymptotically equivalent to a Langevin SDE — gradient flow plus a Brownian noise term. This is how recent theory connects SGD’s implicit regularization, mixing time, and generalization.
Geometric Brownian motion and other variants
Standard Brownian motion isn’t the only useful process — variations show up across applied math:
Geometric Brownian motion — , the multiplicative version. The classical model for stock prices and other strictly-positive quantities.
Brownian bridge — Brownian motion conditioned to hit a specific endpoint. Used in interpolation, score-based generation between two distributions, and Bayesian time-series imputation.
Ornstein-Uhlenbeck process — Brownian motion with mean-reverting drift. The continuous-time analog of an AR(1) model; appears in Langevin dynamics and elsewhere.
Fractional Brownian motion — relaxes the independent-increments axiom, allowing long memory. Used in network traffic and finance; rarely in ML.
Each of these is built on standard Brownian motion as the noise primitive — change the drift, condition on something, or compose with a transformation. The underlying machinery is the same.
Go further
Why are Brownian paths continuous but nowhere differentiable?
Continuity follows from increments shrinking with — the process can't jump. Non-differentiability follows from the same scaling: a difference quotient has standard deviation , which blows up as . So the path is everywhere continuous and everywhere infinitely jagged — a fractal of dimension . This is what makes ordinary calculus inapplicable and forces the development of stochastic calculus.
What's the connection between Brownian motion and diffusion models?
Modern diffusion models (denoising diffusion, score-based generation) define a forward process that gradually corrupts data with Gaussian noise — concretely, this is a discretization of an SDE driven by Brownian motion: . Training a diffusion model is learning to reverse this SDE, which by Anderson's theorem requires estimating , the score. Brownian motion is the mathematical substrate; the score function is the learnable bit.
Because the increments are independent. Adding independent Gaussians with variance each gives a Gaussian with variance — variances add linearly when independent. Time-scaling Brownian motion preserves this: , and the process at time is Gaussian with mean and standard deviation . The scaling is the signature of a diffusive (not ballistic) process.