Fireside Chat with Jesse Robbins and Kolton Andrus • Failover Conf 2021

Gremlin

Gremlin by Gremlin · · Video · 28:23

At Gremlin's Failover Conf 2021, Kolton Andrus and I covered GameDay origins at Amazon, the evolution of chaos engineering, and where reliability practices were headed.

Kolton Andrus built Gremlin into the chaos engineering company. I started GameDay at Amazon. The fireside chat at his Failover Conf was the two of us comparing notes on the discipline across the years that connect those two pieces of work.

We covered the early days of breaking production on purpose, the cultural resistance you hit when you propose simulating catastrophic failures, and how those exercises changed the way Amazon thought about availability. We also looked forward at where reliability engineering needed to go next.

At Gremlin’s Failover Conf 2021, Kolton Andrus and I sat down for a fireside chat on GameDay’s origins at Amazon, how deliberate failure injection evolved into the discipline the industry now calls chaos engineering, and where reliability practices needed to go next.

We covered the early days of breaking production systems on purpose, the cultural resistance you hit when you first propose simulating catastrophic failures, and how those exercises changed the way Amazon thought about availability. We traced the shift from ad-hoc failure testing to systematic chaos engineering platforms, and dug into what separates teams that recover well from incidents from teams that struggle.

The session closes on the future of SRE: what engineering leaders should prioritize and how the chaos engineering community can continue raising the bar on production resilience.

Topics