Engineering leadership, systems craft, and the human side of building software.

Tracing a Request Through a Distributed System

When something goes wrong in a distributed system, the hardest part isn’t fixing it — it’s understanding what happened. This post walks through a technique for tracing a single request across multiple services.

Welcome to Retry Until Success

Most things worth doing fail the first time. Systems, ideas, habits, teams - they all iterate toward something better, or they don’t. This blog is about the iteration.