2-norm (Euclidean Length)

Also known as: L2 norm, Euclidean norm, vector magnitude, vector length

TL;DR

The 2-norm of a vector is its Euclidean length — the square root of the sum of squared components. Normalizing a vector to 2-norm = 1 makes it a unit vector.

The 2-norm of a vector is:

Geometrically, it’s the straight-line distance from the origin to the point the vector represents. The “2” is because each component is raised to the second power; there are also 1-norms (sum of absolute values) and ∞-norms (max absolute value). In retrieval, “norm” means 2-norm unless stated otherwise.

Why it matters for embeddings

Most embedding models output vectors with varying magnitudes. To compare two embeddings via , you have to divide by both magnitudes — that’s the in the formula. To skip that division at query time, embedding indexes typically normalize all stored vectors to unit length () once, offline. Then cosine similarity reduces to a plain dot product, which is just a SIMD-friendly fused-multiply-add per dimension.

The unit vector

A vector with is called a unit vector. To normalize:

v_unit = v / numpy.linalg.norm(v)

After normalization, every embedding lives on the surface of a unit hypersphere. Distance and angle become equivalent, and direction is the only signal that matters.

When magnitude actually matters

Most production embedding pipelines unit-normalize and lose magnitude information. But some training schemes deliberately encode signal in magnitude — for example, attaching a confidence-like quantity to longer or more salient documents. If you’re using a model that does this, normalizing destroys that signal. Check the model’s documentation before normalizing.

Picture a vector of magnitude pointing in some direction . Normalizing maps it to — same direction, magnitude exactly 1. The full set of unit vectors forms a sphere of radius 1 in your embedding’s dimensionality. Once you normalize, every embedding lives on that sphere.

That has two consequences. First, distance and angle become equivalent — Euclidean distance squared between unit vectors is , a strictly increasing function of the angle. So sorting by Euclidean distance, by cosine similarity, or by angle all produce the same ranking. Second, the dynamic range of the dot product is bounded: it’s between -1 (opposite directions) and +1 (same direction). This bounded range is what makes downstream calibration and quantization tractable.

The cost is permanent: once you normalize, the original magnitude is gone. If your pipeline stores only normalized vectors and you later realize you wanted the magnitude (for length-based filtering, for confidence weighting, for re-ranking by salience), you have to re-encode the corpus.

Go further

Why do most embedding indexes pre-normalize to unit length?

It collapses cosine similarity into a plain dot product — no division needed at query time. That's the inner loop of every dense retrieval system, so saving the divide per pair compounds fast over billions of comparisons.

When is normalizing actually a bad idea?

When the model deliberately encodes signal in magnitude — for instance, training schemes that map document salience or confidence to vector length. Normalizing those embeddings throws away signal you paid to put in. Always check the model card.

How does this interact with quantization?

Unit-normalized vectors quantize more cleanly because all components live in roughly the same range. int8 and even 1-bit quantization recover most of the recall on a normalized index, which is why quantization recipes assume normalized inputs.

ZeroEntropy
The best AI teams build with ZeroEntropy models
Follow us on
GitHubTwitterSlackLinkedInDiscord