Foundational
High Performance Web Sites
by Steve Souders (2007). The work that started modern web-performance engineering as a field. Specifics have aged but the 14 rules and the framing — "80% of end-user latency is on the client" — still shape how people think about it.
The Every Computer Performance Book
by Bob Wescott (2013). A short, practical, occasionally funny book on solving and avoiding performance problems. Heavy on the planning side — which is where most performance problems actually live.
Performance & systems
Systems Performance: Enterprise and the Cloud, 2nd Edition
by Brendan Gregg (2020). The modern reference for systems performance — Linux internals, BPF, bpftrace, perf, cloud. If you're trying to figure out why a server is slow under load, this is the book.
High Performance Browser Networking
by Ilya Grigorik (2013). Free online at hpbn.co. The reference for how TCP, TLS, HTTP/1, HTTP/2, and mobile networks actually behave under load. Required reading before you blame the network.
The Art of Capacity Planning, 2nd Edition
by Arun Kejariwal and John Allspaw (2017). A practical, measurement-driven approach to capacity planning for cloud-era web operations. Replaces the older Menasce-style theoretical models with what real teams actually do.
Site Reliability Engineering
by Betsy Beyer, Chris Jones, Jennifer Petoff, and Niall Richard Murphy (Google, 2016). Free online at sre.google. The book that introduced SLOs and error budgets to the broader industry — the modern framing for "is the site fast enough for users."
The data layer
High Performance MySQL: Proven Strategies for Operating at Scale, 4th Edition
by Silvia Botros and Jeremy Tinley (2021). The current edition. Covers schema design, InnoDB tuning, replication, and cloud-hosted MySQL (Aurora, Cloud SQL). The database is usually where load tests find their first ceiling.
Designing Data-Intensive Applications, 2nd Edition
by Martin Kleppmann and Chris Riccomini (2026). The modern classic on data systems — how databases, queues, and streams behave under load, and how to choose between them. The second edition refreshes the cloud and streaming coverage.
Database Internals
by Alex Petrov (2019). A deep dive into storage engines (B-trees, LSM), replication, and consensus. Useful when load testing pushes a database past where surface-level tuning stops mattering.