Singular Value Decomposition (SVD)

Q: What does the Eckart-Young theorem say, and why does it matter for ML?

Truncating the SVD to the top-FORMULA singular values gives the best rank-FORMULA approximation under both the Frobenius and spectral norms — no other rank-FORMULA matrix can match the original more closely. This extremal property is the theoretical guarantee underneath LoRA, MRL prefix truncation, and SSA's low-rank reconstruction. When a method works by keeping a few directions and discarding the rest, the reason it works is Eckart-Young.

Q: What's the difference between full, thin, and truncated SVD?

Full SVD keeps all FORMULA singular values and the full FORMULA, FORMULA. Thin (or compact) SVD drops trailing zeros and the corresponding columns of FORMULA — the natural form for any rectangular FORMULA. Truncated SVD keeps only the top-FORMULA singular triples and is the only form that matters in practice for ML: every low-rank application uses it.

Also known as: singular value decomposition, thin SVD, truncated SVD, compact SVD, Eckart-Young

TL;DR

— every real matrix decomposes into rotation, axis-aligned stretch, and rotation. The single most-used matrix factorization in ML: powers PCA, LoRA, low-rank attention, embedding quantization, SSA, and the spectral analysis of any linear map.

Every real matrix admits a factorization

where and are orthogonal and is diagonal with non-negative entries . The are the singular values, the columns of the left singular vectors, and the columns of the right singular vectors. Existence is unconditional — symmetry, full rank, and squareness are not required — making SVD the universal factorization for linear maps between Euclidean spaces.

Geometric reading

A linear map sends the unit sphere of its domain to a hyperellipsoid in its codomain. SVD names the three operations that compose the map: rotates the domain to align input axes with the principal axes of the hyperellipsoid; stretches each axis by (zeroing out directions outside the rank); rotates the result into its final orientation. Every linear map is, up to choice of basis, a diagonal stretch.

SVD is the unique decomposition of any linear map into rotation, axis-aligned stretch, and rotation. Singular values are the stretch factors; singular vectors are the axes.

Relation to eigendecomposition

The two Gram matrices satisfy

so right singular vectors are eigenvectors of , left singular vectors are eigenvectors of , and singular values are the non-negative square roots of the shared eigenvalues. SVD therefore generalizes eigendecomposition to non-square and non-symmetric matrices and drops the requirement that eigenvalues be real or that a full eigenbasis exist.

Truncated SVD and Eckart-Young

Keeping only the top- singular triples gives the rank- approximation

Eckart-Young: is the closest rank- matrix to in both the Frobenius and spectral norms. No other rank- matrix matches more closely under either norm. This is the extremal property that justifies every low-rank method in ML — the singular spectrum names the cheapest way to throw away information.

Where SVD anchors in ML

Most matrix-spectrum machinery in modern ML is SVD with a wrapper around it.

SVD throughout the stack

Principal Component Analysis . PCA is the SVD of the centered data matrix — right singular vectors are principal components, are explained variances.
LoRA / PEFT . A low-rank adapter writes with rank . Eckart-Young justifies the parameterization: the best rank- approximation of any weight update is its truncated SVD.
Matryoshka representation learning . Training an embedding so any prefix of its dimensions stays useful is a learned analogue of SVD truncation — the prefix behaves like the top singular directions.
Embedding quantization . Rotating into the singular-vector basis before quantizing concentrates variance into a few coordinates and lowers error per bit.
Singular Spectrum Analysis . SSA decomposes a time series by SVD of its Hankel trajectory matrix ; trend, oscillation, and noise are read off the singular spectrum.
Spectral norm and condition number. and — optimization stability quantities are direct SVD readouts.
Mechanistic interpretability . Low-rank approximations of attention and MLP weights expose “circuits” — concentrated directions in singular-vector space tied to interpretable features.

Computing the SVD

Dense SVD uses Golub-Reinsch: bidiagonalize via Householder reflections, then iterate implicit-shifted QR until off-diagonals vanish, at cost. For top- only, randomized SVD (Halko-Martinsson-Tropp, 2011) sketches against a Gaussian and SVD-s the small sketch in with provably-tight accuracy on decaying spectra. Subspace iteration (the power method generalized) is cheapest for the leading singular pair and powers most spectral-norm estimators. Production routines (numpy.linalg.svd, TruncatedSVD) avoid forming — squaring the condition number destroys precision.

Among factorizations with diagonal, only SVD has rank- truncation minimize in every unitarily invariant norm at once. The geometric counterpart is the polar decomposition : diagonalizing the symmetric positive semi-definite stretch and absorbing into recovers . SVD diagonalizes the stretch part of a linear map after rotation has been factored out.

Go further

What does the Eckart-Young theorem say, and why does it matter for ML?

Truncating the SVD to the top- singular values gives the best rank- approximation under both the Frobenius and spectral norms — no other rank- matrix can match the original more closely. This extremal property is the theoretical guarantee underneath LoRA, MRL prefix truncation, and SSA's low-rank reconstruction. When a method works by keeping a few directions and discarding the rest, the reason it works is Eckart-Young.

LoRA / PEFT Matryoshka representation learning

What's the difference between full, thin, and truncated SVD?

Full SVD keeps all singular values and the full , . Thin (or compact) SVD drops trailing zeros and the corresponding columns of — the natural form for any rectangular . Truncated SVD keeps only the top- singular triples and is the only form that matters in practice for ML: every low-rank application uses it.

How does SVD relate to eigendecomposition?

SVD generalizes eigendecomposition to arbitrary rectangular and non-symmetric matrices. The singular values of are the (non-negative) square roots of the eigenvalues of (or equivalently ), and the right singular vectors are the eigenvectors of . For a symmetric positive semi-definite , the SVD and eigendecomposition coincide up to sign.

Eigenvalue

← All concepts

The best AI teams build with ZeroEntropy models

Book Demo View docs