Singular Spectrum Analysis (SSA)

Also known as: Singular Spectrum Analysis, SSA, MSSA, Multivariate SSA

TL;DR

SSA is the time-series analog of PCA. Embed a 1-D series into a Hankel trajectory matrix, SVD it, group eigentriples into trend / oscillatory / noise components, and reconstruct.

Singular Spectrum Analysis is the application of to a delay-embedded time series. You take a 1-D series, slide a window of length across it to form a trajectory matrix, SVD that matrix, group the resulting eigentriples into interpretable components (trend, oscillations, noise), and reconstruct each group back into a 1-D signal that sums to the original. The decomposition is nonparametric — no assumed model, no parametric seasonality, no stationarity requirement — and produces components you can interpret by eye.

SINGULAR SPECTRUM ANALYSISPeel a series into trend, oscillations, and noise.OBSERVEDx(t) · trend + osc + noiseWORKING COPYidentical to observedTRENDsmooth, monotonicRESTobserved − trendOSCILLATION 1period ≈ 50RESTosc 1 removedOSCILLATION 2period ≈ 17RESTosc 2 removedNOISE (RESIDUAL)≈ N(0, 0.15²)SINGULAR VALUE SPECTRUMσ₁ ≥ σ₂ ≥ … ≥ σLTRENDOSC 1· 2OSC 2· 2NOISE· 271234532eigentriple index i

The four steps

Given a series of length and a window length (with , typically ):

  1. Embed. Form the trajectory matrix where and column is . This is a Hankel matrix — constant along anti-diagonals.

  2. Decompose. SVD: . Each triple is an eigentriple, and .

  3. Group. Partition the indices into groups based on inspection of the singular values, eigenvectors, and pairings: a single large with a smooth is a trend component; pairs of close singular values whose eigenvectors are 90° phase-shifted sinusoids are oscillatory components; the long flat tail of small ‘s is noise.

  4. Reconstruct. For each group , compute and diagonally average back into a 1-D series of length . The reconstructed series for the chosen groups sums to the original.

SSA = embed into trajectory matrix → SVD → group eigentriples → diagonally average back. Four lines of numpy. No model fitting, no hyperparameter tuning beyond the window length.

Why it works

The trajectory matrix has a deep structural property: any time series that’s a sum of exponentials, sinusoids, or polynomial-times-sinusoid components has a Hankel matrix of rank exactly (in the noiseless limit). The SVD finds that rank, and the corresponding singular vectors are linear combinations of the underlying components.

Pure sinusoids show up as pairs of singular values that are nearly equal, with eigenvectors that are sine-cosine pairs at the same frequency. That’s the visual signature you look for in the scree plot: a step where two adjacent ‘s are essentially identical, separated from their neighbors. Trend components, by contrast, show up as a single large with a slowly-varying eigenvector.

After grouping, is no longer a Hankel matrix — the SVD decomposition doesn’t preserve the Hankel constraint. To get back to a 1-D series, the standard recipe is diagonal averaging: replace each anti-diagonal of with its mean, then read the result back out as a series of length . This is the orthogonal projection of onto the space of Hankel matrices in Frobenius norm, so it’s the optimal Hankel approximation to your grouped reconstruction. The reconstructed components from disjoint groups sum exactly to the original series, which is the property that lets SSA be used as a strict decomposition rather than an approximation.

MSSA: the multivariate generalization

Multivariate SSA (MSSA) extends the same recipe to multiple time series that share latent components. Stack the trajectory matrices for each series vertically, SVD the combined block, group, reconstruct each series via diagonal averaging on its block. The benefit: shared oscillations show up as singular vectors that span all series simultaneously, so a 12-month seasonality common to GDP, retail sales, and unemployment is recovered as a single component instead of three independently noisy ones.

For ML, MSSA is the right tool when you have a panel of related metrics — eval-loss curves across N training runs, latency at the p50/p90/p99/p999 percentiles, throughput across K replicas — and you want to separate the shared dynamics from the per-series noise.

Where SSA earns its keep in ML (and why nobody uses it)

Most ML practitioners reach for an LSTM, an ARIMA, or a moving average when faced with a time series. SSA quietly outperforms all three on tasks where what you actually want is decomposition rather than forecasting:

Underused SSA applications
  • Smoothing pretraining loss curves. A loss curve has three components: a smooth descent (trend), a high-frequency wobble from gradient noise (noise), and sometimes a periodic component from the data sampler or learning-rate schedule (oscillation). SSA separates all three in one shot, with no chosen smoothing window.
  • Training dynamics analysis. Group-norm trajectories, gradient-norm spikes, attention-entropy curves — all benefit from SSA’s ability to peel off the trend and surface the oscillatory or transient components that actually carry the signal.
  • Deployment metric monitoring. Latency series, request-rate series, and error-rate series usually have a strong daily/weekly seasonality plus drift. SSA decomposes these without you having to specify the seasonality period — useful when the period itself drifts (DST, traffic regime changes).
  • . Track the top SSA components over rolling windows. A regime change shows up as a sudden change in the eigenvector orientation or in the explained variance of each component, often before any aggregate metric has moved.
  • Eval benchmark stability. SSA on a long history of eval scores separates real model improvement from periodic eval-set variance and from one-off contamination spikes.

The reason SSA is underused isn’t that it’s bad — it’s that the canonical references (Broomhead-King 1986, Vautard-Ghil 1989, Golyandina’s textbook) are in the geophysics and signal-processing literature, not the ML reading list. The technique never made the jump.

What SSA isn’t

SSA is a decomposition, not a forecasting model in itself. The grouped components are descriptive — to forecast you separately fit a recurrence to each component (linear-recurrence forecasting is the canonical SSA-companion technique) and sum the per-component forecasts. That works well for trend and oscillatory components, less well for noise. For full forecasting accuracy on complex series, modern deep models ( , Temporal Fusion Transformer) usually win — but they don’t decompose, and decomposition is often what you actually wanted.

SSA is also blind to causality. It separates components by their statistical signature in the trajectory matrix, not by which exogenous variable drove which component. Pair it with a causal-inference toolkit if you need to attribute oscillations to specific upstream events.

Go further

Why hasn't SSA crossed over from signal processing into ML curricula?

Mostly historical accident. SSA was developed in the geophysics and signal-processing literature in the 1980s — the canonical references (Broomhead, King, Vautard, Ghil) sit outside the ML reading list. ML grew up in a world dominated by ARIMA and later deep models for time series; SSA's nonparametric, decomposition-first framing didn't fit either camp. The result is that an entire generation of ML practitioners reach for LSTMs to do work SSA does in three lines of numpy.

How is SSA different from a Fourier or wavelet decomposition?

Fourier assumes the series is a sum of pure sinusoids at fixed frequencies. SSA discovers oscillatory components from the data itself — they can be amplitude-modulated, phase-shifted, or non-harmonic, and SSA still separates them. Wavelets handle non-stationarity better than Fourier but require choosing a basis; SSA picks its own basis via SVD. The cost is interpretability: SSA components don't have a frequency label, you have to inspect them.

When does SSA fail?

Two main failure modes. (1) Window length chosen badly — too short and oscillatory components fragment; too long and the trajectory matrix gets unwieldy and short series can't support it. Rule of thumb: for a series of length , but tune by inspecting the singular value scree. (2) Components that share a singular value can't be separated by SVD alone — you need ESPRIT-style or rotation-based grouping. For most ML use cases (smoothing loss curves, decomposing latency signals) neither failure mode bites.

ZeroEntropy
The best AI teams build with ZeroEntropy models
Follow us on
GitHubTwitterSlackLinkedInDiscord