Meet 2011 TR35 Winner Jesse Robbins

MIT Technology Review by MIT Technology Review · December 2, 2011 · Video · 04:16

The original 2011 MIT Technology Review TR35 honoree page. Portrait by John Keatley. (MIT Technology Review)

MIT Technology Review interviewed me as a 2011 TR35 honoree, recognizing the work on web operations, infrastructure automation, and reliability at Opscode.

Watch at MIT Technology Review

MIT Technology Review filmed this short interview as part of the 2011 TR35 recognition. In it I walk from the fire service to Amazon to Opscode in four minutes: failure is inevitable at scale, resilience is a function of people and culture, and Chef was the open-source tool we built so other teams could operate at that scale without holding their tools as guarded secrets. Velocity, Opscode, and the Chef community were already underway by the time this video was filmed.

From the MIT Technology Review TR35 honoree page

Category: Internet & web Year Honored: 2011 Organization: Opscode Region: Global Focus: Fault-tolerant online infrastructure

Biography

Jesse Robbins applied for two jobs in 2001: a Seattle bus driver position and a backup systems engineer role at Amazon.com. Amazon’s offer came first, beginning a decade of work on how web companies operate complex server and software networks at scale.

Drawing from his background as a volunteer firefighter, Robbins brought crisis management principles to infrastructure design. He recognized that massive global operations inevitably experience failures and built systems to withstand them safely. Rather than preventing failures, he made Amazon resilient to them through architectural fault tolerance and live operational drills that tested teams by temporarily taking entire data centers offline, without affecting customer experience.

After leaving Amazon in 2006, Robbins shared his methodologies through blogging. In 2007, he cofounded Velocity, now an annual conference where major competitors openly discuss infrastructure management.

Robbins cofounded Opscode in 2008. The company’s flagship product, Chef, is an open-source framework for cloud-based infrastructure automation. One notable application involved scientists using Chef to deploy a 10,000-processor supercomputing cluster in 45 minutes on Amazon’s cloud, completing complex protein-binding research in eight hours, then shutting down operations, all at a fraction of traditional supercomputing costs.

Full Transcript (AI-generated)

Hi, I'm Jesse Robbins. I'm co-founder of Opscode, the leader in cloud infrastructure automation, and part of a large community of people that build and operate the websites that we all depend on every day. Which may seem like very strange work for a firefighter, but I promise you it isn't. In 2001 I joined Amazon, really as a day job while I was making a transition to the fire service. I ended up with the title of Master of Disaster, which sort of foreshadows what I'm going to talk about. We found at Amazon, and most websites at any significant scale find, that as complexity and operating size increase, so do the number of failures and outages. And this is on the test: failure happens. So in the past decade, the conventional wisdom was: spend a whole lot more money to make the systems more reliable. We found that instead, focusing on resiliency of both technology and operational culture is the way that you build successful websites. And I realized as we were discovering this that my experiences in the fire department directly correlated to building sites at scale. So with some support of Amazon executives, I began turning Amazon into something of a fire department. I began training software developers using fire-department-style incident management techniques and doing things like fire drills — a program we called GameDay, which gave people an opportunity to learn how to deal with failures at scale, learn from those in stressful situations, and eventually work up to full-scale exercises where we were able to turn off data centers with no notice to developers and no impact to customers. As a result of this operational culture — many people know that Amazon's success is a direct result of its operating culture — it was a privilege to be able to contribute something unique from my background to that. These evolved into safety standards and building codes, much like the things that protect you in this room right now: technical controls that allow people to be safe even though the environment is significant and complex. I left in 2006, and I realized that the skills and operating capabilities that we had at Amazon were really necessary for any organization that depended on the web. There was no community, no culture, no way to share this. And so I, and a group of other crazy people, founded a conference called the Velocity Web Performance and Operations Conference, which is now in its fifth year and teaches about 2,000 people every year how to operate and succeed. One of the things that we learned from that conference was that there was both a cultural component and a tool component that was required to succeed. All the big companies had built their own tools over the years, but they had held them as very tightly guarded secret sauce, which left every other startup in a really bad position — poorly experienced and not equipped to be able to operate at the scale that they wanted to. And so we founded Opscode. Opscode does cloud infrastructure automation. We provide a tool called Chef, which is an open-source framework for systems integration. It's like a little sysadmin robot. It uses recipes and cookbooks, which are very easy to share, which allows people that are new to operating at scale to stand on the shoulders of other giants and then contribute back themselves. Chef is one of the most successful open-source projects in infrastructure history — over 450 contributors, and it is used by over 6,000 organizations, including many very relevant to other speakers today. We're hiring, and if you know people that care about infrastructure — send 'em our way. I'm Jesse Robbins, thank you very much.

Also Mentioned

Jesse Robbins

Cofounded Chef and the DevOps movement. Created GameDay at Amazon. Investor in AI developer tools.

MIT Technology Review

MIT Technology Review

Topics

Awards DevOps Chef Cloud Infrastructure

More Mentions

GameDay: Creating Resiliency Through Destruction

December 20, 2011 · Talk · 52:50

My USENIX LISA'11 talk on GameDay: deliberately inject failures into production to build organizational resilience before real outages happen. I had been running these exercises at Amazon since 2003.

“You don't choose the moment, the moment chooses you. You only choose how prepared you are when it does.”

— Jesse Robbins

The Chef, the Puppet, and the Sexy IT Admin

October 26, 2011

Wired Enterprise covered the rivalry between Chef and Puppet as infrastructure automation went mainstream, placing Jesse Robbins and Opscode at the center of the industry's shift to infrastructure as code.