Topic · 25 concepts

Performance Engineering

Squeezing throughput, latency, and memory out of GPUs.

Modern AI is an exercise in moving tensors through GPU memory faster than anyone else. The concepts below cover the hardware abstraction (GPU memory hierarchy — HBM/SRAM/registers, arithmetic intensity, the roofline model), the parallelism strategies that let single jobs span multiple devices (tensor / pipeline / data parallel, FSDP), the inference-time tricks that compound (PagedAttention, kernel fusion, graph compilation, mixed precision), and the metrics that tell you whether you're actually using the silicon (MFU, throughput, latency tail). This is where the engineering depth of an AI team shows.

Other topics
ZeroEntropy
The best AI teams build with ZeroEntropy models
Follow us on
GitHubTwitterSlackLinkedInDiscord