"Chaos Engineering"
An oral history of #hugops: How tech's first responders built a culture of empathy
Protocol's oral history of
“I've got to change the way that I approach this entirely and make it safe to experiment.”
Tim O'Reilly on Why We Started the Velocity Conference
Tim O'Reilly's retrospective on the origins of the Velocity Conference explains why the event was launched and how web operations emerged as a strategic discipline, with Jesse Robbins as co-founder and conference chair.
Jesse Robbins on the Rise of DevOps (InfoQ Interview)
Jesse Robbins talks with InfoQ about how DevOps started, why infrastructure as code changed operations, and what it actually takes to get developers and ops teams working together.
Resilience Engineering: Learning to Embrace Failure
Jesse Robbins (Amazon), Kripa Krishnan (Google), and John Allspaw (Etsy) discuss how they built organizations that deliberately trigger failure to get stronger: powering off data centers, running 96-hour disaster simulations, and transforming blame cultures into learning cultures.
“You can't choose whether or not you're going to have failures — they are going to happen no matter what — but you can choose in many cases when you're going to learn the lessons.”
Changing Culture & Being a Force for Awesome
Jesse Robbins on how to change engineering culture from the inside. Start small, build champions, use metrics to create confidence, and exploit compelling events. The biggest barrier to operational improvement is not technology. It is organizational resistance.
“Don't fight stupid. Focus on where you can make more awesome.”
GameDay: Creating Resiliency Through Destruction
In this USENIX LISA'11 talk, Jesse Robbins explains GameDay: deliberately injecting failures into production systems to build organizational resilience before real outages happen.
“You don't choose the moment, the moment chooses you. You only choose how prepared you are when it does.”
Ex-Amazon 'Master of Disaster' Animates Server Chef
The Register profiles Jesse Robbins as Amazon's former 'Master of Disaster' and covers his co-founding of Opscode and the launch of Chef, tracing the line from his reliability engineering work at Amazon to the infrastructure-as-code movement.
Understanding Web Operations Culture (Part 1)
Jesse Robbins draws on his firefighting background to define web operations culture — the mindset, habits, and discipline that separate teams who handle incidents well from those who don't.
“You don't choose the moment, the moment chooses you. You only choose how prepared you are when it does.”