"Chaos Engineering"

Protocol

An oral history of #hugops: How tech's first responders built a culture of empathy

Protocol's oral history of

“I've got to change the way that I approach this entirely and make it safe to experiment.”

— Jesse Robbins
O'Reilly Radar

Tim O'Reilly on Why We Started the Velocity Conference

Tim O'Reilly's retrospective on the origins of the Velocity Conference explains why the event was launched and how web operations emerged as a strategic discipline, with Jesse Robbins as co-founder and conference chair.

InfoQ

Jesse Robbins on the Rise of DevOps (InfoQ Interview)

Jesse Robbins talks with InfoQ about how DevOps started, why infrastructure as code changed operations, and what it actually takes to get developers and ops teams working together.

ACM Queue

Resilience Engineering: Learning to Embrace Failure

Jesse Robbins (Amazon), Kripa Krishnan (Google), and John Allspaw (Etsy) discuss how they built organizations that deliberately trigger failure to get stronger: powering off data centers, running 96-hour disaster simulations, and transforming blame cultures into learning cultures.

“You can't choose whether or not you're going to have failures — they are going to happen no matter what — but you can choose in many cases when you're going to learn the lessons.”

— Jesse Robbins
O'Reilly Velocity Conference

Changing Culture & Being a Force for Awesome

· Video · 34:28

Jesse Robbins on how to change engineering culture from the inside. Start small, build champions, use metrics to create confidence, and exploit compelling events. The biggest barrier to operational improvement is not technology. It is organizational resistance.

“Don't fight stupid. Focus on where you can make more awesome.”

— Jesse Robbins
▶ YouTube
USENIX

GameDay: Creating Resiliency Through Destruction

· Talk · 52:50

In this USENIX LISA'11 talk, Jesse Robbins explains GameDay: deliberately injecting failures into production systems to build organizational resilience before real outages happen.

“You don't choose the moment, the moment chooses you. You only choose how prepared you are when it does.”

— Jesse Robbins
▶ YouTube
The Register

Ex-Amazon 'Master of Disaster' Animates Server Chef

The Register profiles Jesse Robbins as Amazon's former 'Master of Disaster' and covers his co-founding of Opscode and the launch of Chef, tracing the line from his reliability engineering work at Amazon to the infrastructure-as-code movement.

O'Reilly Radar

Understanding Web Operations Culture (Part 1)

Jesse Robbins draws on his firefighting background to define web operations culture — the mindset, habits, and discipline that separate teams who handle incidents well from those who don't.

“You don't choose the moment, the moment chooses you. You only choose how prepared you are when it does.”

— Fire Chief Mike Burtch