---
title: Resilience Engineering
description: Interviews, articles, podcasts, and talks featuring Jesse Robbins, tagged Resilience Engineering.
doc_version: "1.0"
last_updated: 2026-05-31
---

# Resilience Engineering

Interviews, articles, podcasts, and talks featuring Jesse Robbins, tagged Resilience Engineering.

## [Investing in Vibrant Labs: AI Agent Simulation Infrastructure](https://jesserobbins.com/mentions/investing-in-vibrant-labs-ai-agent-simulation.md)

*2025-12-03*

Heavybit announcement of our investment in Vibrant Labs, which builds production-grade simulation and verifier-driven evaluation for long-horizon AI agents.

## [Generative AI in DevOps and Incident Response: What the Experts Actually Think](https://jesserobbins.com/mentions/2023-10-12-generative-ai-devops-incident-response-heavybit.md)

*2023-10-12*

I interviewed Nora Jones, Jeremy Edberg, Mandi Walls, and Brent Chapman on what generative AI actually does in incident response, and where humans have to stay in the loop.

## [Fireside Chat with Jesse Robbins and Kolton Andrus • Failover Conf 2021](https://jesserobbins.com/mentions/fireside-chat-jesse-robbins-kolton-andrus-failover-conf.md)

*2021-04-29*

At Gremlin's Failover Conf 2021, Kolton Andrus and I covered GameDay origins at Amazon, the evolution of chaos engineering, and where reliability practices were headed.

## [Incident Management for Operations (foreword by Jesse Robbins)](https://jesserobbins.com/mentions/incident-management-for-operations-schnepp-vidal-hawley-oreilly.md)

*2017-07-01*

I wrote the foreword to Schnepp, Vidal, and Hawley's O'Reilly book bringing fire-service incident command into IT operations. The lineage runs from my work at Amazon as Master of Disaster through the first Web Ops/Fire Ops summit I convened in 2012.

## [Resilience Engineering: Learning to Embrace Failure](https://jesserobbins.com/mentions/resilience-engineering-learning-embrace-failure-acm-queue.md)

*2012-09-12*

Jesse Robbins (Amazon), Kripa Krishnan (Google), and John Allspaw (Etsy) discuss how they built organizations that deliberately trigger failure to get stronger: powering off data centers, running 96-hour disaster simulations, and transforming blame cultures into learning cultures.

## [GameDay: Creating Resiliency Through Destruction](https://jesserobbins.com/mentions/gameday-creating-resiliency-through-destruction-usenix.md)

*2011-12-20*

My USENIX LISA'11 talk on GameDay: deliberately inject failures into production to build organizational resilience before real outages happen. I had been running these exercises at Amazon since 2003.

## [MIT Technology Review TR35: Innovators Under 35](https://jesserobbins.com/mentions/tr35-jesse-robbins-technology-review.md)

*2011-01-01*

The MIT Technology Review TR35 listing for 2011, citing my work on web operations, cloud, and resilience engineering at Amazon and Opscode.

## [Five Whys: Try to Learn a Dollar's Worth of Lesson for Every One You Spend in Failure](https://jesserobbins.com/mentions/five-whys-jesse-robbins-quote-venturehacks.md)

*2008-11-17*

Eric Ries quoted me in his Venture Hacks guide to Five Whys: try to learn a dollar's worth of lesson for every dollar spent in failure. The line came from Amazon GameDay practice.

## More pages

- [Page 2](https://jesserobbins.com/topics/resilience-engineering/2.md)

## Sitemap

See [sitemap.md](https://jesserobbins.com/sitemap.md) for the full list of pages on this site.