Changing Culture & Being a Force for Awesome

O'Reilly Velocity Conference by Jesse Robbins · · Video · 34:28

"Don't fight stupid. Focus on where you can make more awesome."

— Jesse Robbins

Jesse Robbins lays out his framework for changing engineering culture from the inside: start small, create champions, use metrics to build confidence, celebrate successes, and exploit compelling events. Drawing on his experiences at Amazon and as co-founder of Chef, he argues that the biggest barrier to operational improvement isn't technology — it's organizational resistance — and shares specific hacks for overcoming it.

At the O’Reilly Velocity Conference in 2012, Jesse Robbins — co-founder of the conference itself and of Chef (Opscode) — delivers a practitioner’s guide to changing engineering culture from the inside. The talk distills years of hard-won lessons from Amazon, Chef, and the broader DevOps community into a repeatable five-step framework.

The Framework

Jesse’s model for culture change has five steps, each building on the last:

  1. Start small. Pick the smallest possible project with receptive people. Call it an experiment. Don’t trigger the organizational immune system.
  2. Create champions. Get your boss on board first. Then spread credit as widely as possible — let others feel ownership of the change.
  3. Use metrics to build confidence. Find a number that supports your change (time from commit to deploy, cost of an outage) and use it ruthlessly to build the business case.
  4. Celebrate successes. Tell the story with data. Be positive about people. Leave room for resistors to come around without losing face.
  5. Exploit compelling events. When the site goes down or a compliance mandate lands, use that moment to push for the change you’ve been building toward.

The Origin: Amazon and GameDay

Jesse illustrates the framework with its first major application: Amazon’s availability program. He created GameDay — exercises where teams would deliberately inject large-scale faults into critical infrastructure, including powering off data centers. The trick was starting small: the smallest groups of developers who were receptive, with achievable exercises, building trust and competency before scaling to full-scale disaster simulations.

“Basically GameDay is an exercise where we injected large-scale faults into critical components of the infrastructure — in some cases pressing the big red button.”

Permission and the Katrina Lesson

Jesse closes with a story from his deployment as a task force leader during Hurricane Katrina. A volunteer kitchen staffed by anarchists was feeding thousands of people a day, but FEMA kept trying to shut them down because no one was “in charge.” The solution: make every volunteer a “site director.” When FEMA asked who was in charge, someone would answer “I’m a site director” — and FEMA would deliver supplies.

The lesson: “Most of the time when people are saying no, what they’re really saying is, I don’t know how to say yes.” Jesse applied this at Amazon by typing “Master of Disaster” into a form as his job title — and it stuck.

Jesse’s Rule

The talk’s through-line is a single principle Jesse calls his “rule of happiness and survival”:

“Don’t fight stupid. Focus on where you can make more awesome.”

Full Transcript AI-generated
so this is the first time in four years that I have had stage fright um and it's at my own conference it's actually really terrifying so um uh I'm really glad to be here today uh I'm a co-founder of Ops Code and obviously um I helped get this conference started with a lot of other people um and uh this is going to be a talk about culture and culture hacking and uh Ops Code and velocity are probably my two biggest culture hacks um but I'm going to focus a lot on what you guys uh you uh need in order to be productive um uh hacking the cultures that you're going to go back to when you're when this conference is over um just a couple of quick things so hopefully this should one of these three slides will sound familiar um either you're this girl um who tested forked fine but it stops problem now or um you're this sedman I was that sedman or you're this uh Dev or business person uh who is making Steve very sad um so the interesting thing about being here together is uh this is an event where when we talk about um uh the various components that make up performance and operations we talk talk a lot about technology but like uh all spa and I will both say really probably to an absurd level it's really culture and so you're going to spend like or you've already spent a day and you'll have another half day of like getting this injected into you this is the original devops uh love poster that uh that they put up um you will have learned all of Steve's 28 rule how many know all of Steve's 28 rules raise your hands what about the four secret ones that he can't ever tell you he's sworn to secrecy oo you have to ask him about that um you will definitely have automated all of the things um I hope you use chef but you might use something else to do it um how many here believe in metrics at this point raise your hands okay who put your hands down now who doesn't believe in metrics who thinks that that's a stupid idea you shouldn't measure stuff any okay you so you will you'll go back and you'll be like oh man our metrics totally suck right now but we can use ganglia we use uh Theo's stuff with circonus it is going to be incredible um some of you are going to begin continuously deploying code before it is even written like it is 10 deploys a day no continuous meaning it just comes from the future you will deploy all Spa's code to Etsy before they even think of it that's how cool you're going to be um if you've ever heard anyone of my people talk you're going to think about you know what we're not even afraid of a game day we I will press that button I will receive the candy so uh that candy that that thing does not dispense Candy by the way it's it's actually an emergency power off button um some people are confused by that it's weird I don't know um the most important thing that you should take away from velocity is um an idea which is we've kind of vectored towards the right shape the right culture for what is effective the uh to survive and thrive on the web um it's sort of a function of organizational constraints in many ways in the same way that uh a bird's Wing um is optimized for flight a bat's Wing is optimized for flight we kind of know uh generally what the operations culture should be and uh you're going to have this kind of deeply inside you you're going to be super pumped and then you are going to go back to the office which will suck um unless you work for Ops Code in which case it's it's awesome um but the uh so you're going to go back and you are going to know stuff that is going to change your life and change everybody else's life and the very first thing that you're going to want to do is completely Shake everyone up uh you know be like oh we're doing it all wrong and you know we've got to immediately make all these changes and then a little while later you're going to send me this note I get I get about 50 of of these every conference cycle um and uh and it's it's basically the um the you know oh uh I talk to people and they said there's absolutely no way this would work or I tried to implement and now I'm like on a some kind of personal Improvement plan or something um uh there's a big outage I don't know what happened the compliance people got involved and uh so uh the kind of sucky part is that uh changing culture actually takes time and the um I've Gotten Good at hacking culture mostly because I've made some really really stupid mistakes which I'm going to tell you about um and uh and the biggest being a belief that you know hey we're we're Engineers we're operators we're we're people that care about infrastructure um and the the desire to just rip it all out is super fun like you don't want to be stuck with the Cru um so well actually sorry anyone do you like being stuck with CR Mr no no metrics over there I'm I got my eye on you um so I in my career over the past decade have had a history of choosing battles extremely poorly um it's like almost weapons grade um I was the the guy that always said no to the cool new stuff and then once I got excited about actually saying yes to things I pretty consistently uh would fight over the stupidest things imaginable um one of my favorite things that I tried to do was kill ec2 um in its infancy uh because I was an Ops guy and it was um uh you know uh a waste of resources and and a security threat is how I perceived it um so I've I've been the Dr No guy but I've also fought every single one of the the stupid large organizational battles you can and lost almost all of them and one day I realized uh Jesse's rule of uh happiness and survival which is do not fight stupid focus on where you can make more awesome um and when I say that and I and I and I think about you guys going back for those of you that are in organizations that are making change quickly right now it's great for those of you that aren't and you're unhappy um the job boards are overflowing and the interesting thing to know is that uh you don't need to be stuck somewhere where you're fighting stupid there is plenty of room for more awesome for every single one of you so just keep that in mind when when you're uh when you're thinking about that don't go back and quit all quit jobs but uh uh but the the interesting thing is that we are in the middle of this massive change which it makes all of our Lives better so just remember that here is how you actually change culture effectively um so the first and again this irritates the crap out of me is you start small you start at the the least common most likely uh uh to succeed denominator um and I'm going to go through this as a list I'm going to give you some examples and then I'm I'm going to give you the hacks for them um the the second thing uh this is particularly hard for people who are uh sort of harder core Engineers who do not socialize well um you need to create Champions and by this when you're pushing these changes like if you're trying to talk about how awesome what etsy's doing is or uh you know any of any of the new things that you've learned about that you want to import into your environments uh it's going to need to come from more than you you don't want to be that one person who's trying to kind of be the mascot for that I was the mascot for availability in one of my jobs it was a terrible thing to do to myself uh and the uh the thing that you you gain power from doing is getting a lot of people excited about what you're seeing and that means getting them to see the world uh with them using whatever the new cool thing you want to implement is having their life be better them feeling better being more popular and uh you know getting raises and all kinds of other stuff um using metrics to build confidence so um one of the we're really one of the things we did early on with velocity was we made sure that lots of the large companies published useful data that you can use and take back to your Executives to to build a case for why you should be able to do something so um if you go back through you'll see like this is the cost of an outage this is the cost of one microsc of latency um and you can use that in order to build cases the Shopzilla uh example that they used um showed like uh you know a huge Improvement in Revenue as a result of uh of of improving front side performance um so you're going to need to to build a language of business metrics and Mandy walls is going to talk a little bit about this later today um uh in some detail she's an MBA I'm a firefighter um so uh you know your mileage may vary um but you want to use uh you want to use uh metrics in order to prove your case and more importantly allow others to subscribe to it you want to celebrate successes and I'll talk a little bit about this and um you want to do this one thing which people get a little weird when I say this you want to exploit compelling events so when the site goes down and everything is broken for a really long time and everyone's yelling at each other you have this unique opportunity to change your organization and the good news is is that those kind of things happen all the time and so you have so many chances to say you know what what we really need is a new Incident Management program or better metrics or I think we should try ganglia out or whatever it is and so um exploiting compelling events is a is a super uh trick which we'll we'll go into um for me the the where I applied this first was during my time at Amazon uh where I uh worked on this project called the availability program and I created something called game day um me how many of you have heard of game day as just the phrase or word okay some so um it's been spreading out more and more uh the other like Netflix has a version of this chaos monkey it's pretty cool um basically game day is an exercise where we injected uh large scale faults into critical components of the infrastructure uh in some cases pressing the big red button um uh which is pretty fun how how many of you have had a major data center failure by the way I love this question raise your hands yeah okay so you know everyone basically uh the uh so it's part of a larger discipline um it's not new to us this type of of work um but uh but it's definitely something that uh scares the living crap out of every single person you talk about so you say hey guys we know that we we want to be resilient to a single data center failure or multiple data center failures and to get there right now uh we're going to fail one just a little just a little bit we're to light a little fire and we're going to see what happens we're going to see how people work and perform so if you had started with the the fullscale exercise if this little kid had been uh pushing against a fullscale fire hose on his first day they pushed back with about 90 pounds uh of force um he'd be flying around and it'd be a big disaster and it make the news and everything else and there just no one would be happy about that um so uh you got to start small start something achievable um in in my case with that program starting with the smallest groups of developers who were receptive to the ideas um and who probably weren't going to destroy everything uh when we ran the exercise um as you get some early successes you want to build on trust and safety and this little girl is such a badass like that that person is going to be a firefighter someday um the uh so when you when you make these little structured exercises you start to build a competency and you're able to demonstrate your value to people um it doesn't matter what the program is again if it's if you're going to continuous uh integration continuous deployment if you're actually finally implementing uh Source control on your infrastructure environments if you're doing that big uh JavaScript refactor that you uh said that you were going to do four years ago and now you're finally getting to it or if you know stubborn Ella is yelling at you because your CSS actually makes her cry um uh whatever that project is you want to start small um you build up you you get some early successes and then you begin creating these Champions as I described so people like being smart people like being in the know people like having special knowledge people like kicking ass and getting things done generally and what I found is that when you can get people excited about uh what's going on uh and show that what you're doing makes a measurable impact you can start to spread that out virally by having them be pretty excited these kids uh you know go back to the kindergarten and they say you know what I play with the fire engine and I did all the things they're evangelizing how cool firefighting is to other people in the same exact way that I know every single one of the developers that I worked with uh went back and said you know what I really love is availability engineering and I love doing these resiliency they did not do this by the way this is a lie um but uh but but but they to some extent they did um and then um you uh move up uh uh to a little bit more training you increase the bar uh and so you say you know what we've gotten you to this place where you kind of know what you're doing and so now we're going to run full scale exercises we're actually going to burn a house down we're going to take a data center down we're going to push something into production uh slightly faster than we have before and we're going to begin measuring that and seeing the impacts on a team level but where you're able to compare one team to another over time you're able to see the Deltas between performance uh so this is a heat map um on a city showing where fire engines did not meet uh their response time obligations uh it's pretty clear where you don't want to live and it's pretty clear where there's problems um you can do this in the same way when you're presenting a case inside of a business about what you want to do you say look uh when we have people that went through this program we were we had way faster response times much shorter mttr and uh and you know what uh people were happier because they could deploy stuff on a regular interval you want to celebrate those successes so this is me you can see me there if uh uh this is uh after a shift and where we uh uh put a fire out uh it's pretty fun and um what's interesting is is that uh positivity ends up being a viral adoption uh Tool uh Coen Powell said um uh that uh optimism is a force multiplier and it absolutely is when people find um we're seeing this with Cloud adoption right now oh you mean I can just type in get an instance and deploy it um that makes people's lives so much better so quickly that uh it's impossible to repress and suddenly there's a you know big buzz spreading around it and the reason that it feels so good is because they had that early success and they keep on getting more power and then finally you can exploit those compelling events uh to do the hard work so uh most of you probably don't know this but the way we got fire sprinklers is lots of people died and uh there was all kinds of resistance to in putting sprinklers into buildings um but finally it got bad enough that we got a national standards body put together and people were willing to spend the money willing to spend the time willing to do all the building and everything else in order to make things safer now they couldn't do it prior to that terrible compelling event but they were willing to do it afterward and so when you're looking to make larger changes um or you're looking for those moments this is how you do that this is how you do a big scary program like powering off data centers um this is what gives you the cultural uh currency to make a big change which I covered this so just to review the start small create Champions use metrics celebrate your successes and exploit compelling events here are the hacks um starting small so the reason that starting small works the reason that doing a very small project is works is because it isn't a threat to the establishment within your organization um it's easy to ignore it's easy to pass off uh it's under the somebody else's problem field if you're a Douglas Adams fan um and uh you when you're first building this and you're super excited and uh and you encounter that first person that kind of wants to do battle with you and they're like no we totally can't ever do that because of compliance or you know we've got this weird sarbans Oxley requirement or PCI DSS or our security needs are so unique that we could never use JavaScript in that way or whatever it is right um so the trick is to to just call it an experiment minimize now that's not what you actually want to do because you know you're going to be running everything in production 100% within months but uh just say no no no it's just a short-term experiment um don't tell them the truth uh because it is an experiment um you know it might not work it it'll work um so the uh uh yeah so that's how you that's how you minimize the risk to the people that are going to do battle with you um creating Champions this is an area that I kind of suck at um and uh the one of the first things that I am terrible at and I imagine most of you are is you like getting heads down and you do a bad job of getting your boss on board with what you're doing um what you really want your first Champion you want is to get your boss on board you want to say hey you know what we're going to fix this thing it's been broken forever it's going to be great here's what I need to do it I'm going to take a little bit of risk and you want to get them on board so that they can represent and be that first Champion for you as you begin delivering um at Amazon I was lucky I had uh Verner as my executive sponsor for one of my projects and he was awesome cuz he would come in he's a giant um and uh and basically say you have to do what Jesse says and I loved that um and I try to do that now as an executive at my own company when I'm trying to support people in the projects that they're they're trying to do but the um interesting thing here is uh it's easy to forget this and it's easy to have that weird antagonistic relationship where your boss is like well what's it going to do you got to flip them and if you're not able to flip them don't fight stupid make more awesome go somewhere else this one's a little trickier give everyone else the credit at this stage like if you get a developer up and running with continuous uh integration and deployment or you implement some vastly improved uh frontend Library the best thing you possibly can do is spread the love as far away from you as possible so that as many people are like you know what I totally did that it was awesome and I want to do it again and again at Ops Code we do this um really clearly with our community um the uh it it's one of the best ways of activating people um you know we're always talking about what every one else is doing we make our contributions we try to be quite humble about it um this has been a huge Force multiplier for us and it will work for you every time um the last thing is if you're while you're giving everyone the credit give out special status so Google totally nailed this early on with the sres they gave out bomber jackets they did like uh special uh other kinds of coats they really guess like coats patches like they had like uh velcro patches anything you can do to make people stand out because they're a part of your program um and uh it's funny uh how little effort it requires to make them part of your tribe on your team and advocating for you consistently um it's a super hack um and uh make sure that people with special status brag about it um but also maintain an ER of exclusivity so uh it's not a program that's open to everyone so you know it's only like this a pilot blah blah blah and so create a little scarcity early on it's the best marketing you'll ever do for whatever it is you're trying to do inside of your organization metrics so um let me tell you the first thing not to do I have a terrible history of uh I love emailing metrics decks out to people without context um because I'm like I look I saw a thing it's to super awesome and uh that is the way to lose your Champions right away uh particularly when it's like the you know impending doom uh deck which I would Tye things like that I still do that from time to time um but uh the the thing to understand is that humans really need numbers to glom on to to compare with other numbers it makes people feel really safe so find a like a a number that makes sense mean time to deployment um uh uh how you know uptime if it's if that makes sense it probably doesn't at this stage um time for a developer to time taken for a developer to go from typing in commit to deployment cuz that's lost money right there right like code that's written and not deployed is wasted money find a number that supports your change um and then use it ruthlessly so you're going to f first you're going to show value you're going to say look this thing that we did this cultural change or this technical change we made has incredible value here's what it does and then later um it's going to end up being used as a weapon so you should anticipate this so you'll have one half of the organization who's using the new thing and you know deploying software in 6 minutes or uh or sub4 second or sub 2 second or sub 1 second uh you know first page load times and um and then you'll have this moment where they're the superstars and then anybody that doesn't do that is a jackass and so just anticipate that that's going to come so be prepared to use it ruthlessly the last thing is tell your story with data um so uh I'm a big fan of Hans rosling you have to narrate the data so people can understand it so when you shave off a couple of milliseconds off of a load time or do some other really powerful transformative thing for uh user experience don't just say we shaved 7 Seconds off take a a narration of a couple of sessions um show lots of different graphs make it a nice printed artifact that you can hand around again this is how you get that currency where people will believe in you because they can look at it and they could go wow that's amazing how did you do that and the answer is well one thing is that we're actually able to deploy the code that we write or uh we're actually able to make changes which we weren't able to do for 6 months so you know you can start working on this in six months or maybe you'll make this change um but a a story told with data uh provides a truly compelling way to force people to see uh the the light and um and it gives them something to hold on to that makes them feel safe and that is the most important thing you're going to need uh as as this sort of thing spreads out and you run into the resistors who show up and they're like no way we can never do that well here's the story what's your story no is that your story like is that how you want to be remembered for your contributions here as you said no a lot I hope not so um oh funny story so at uh at Amazon I said know so much that I signed the launch posters with no and then a little squiggle which is my signature so if you're ever over there interviewing or you work at Amazon and you see like an older uh launch poster and you see a big no that's me don't be that guy the celebrating success is thing so this is telling that powerful story you really want to get people in you want to pull them in you want to say I can't believe how much better it was now that we've used this particular technology at Ops Code we do a lot of this with our customer case studies um and we do that predominantly so that we can help people see the value of what we're doing um in your case like if you need help uh I'm you know email me uh I will I'm happy to help you craft that internally but what we do um uh as a community right now is we're we're doing really nothing but a lot of Storytelling um we're uh we're you know talking about what's happening a lot of the tools that are coming out are great but it's all culture first so um be prepared to tell a powerful story and use other ones all the videos that are available and everything else um always be positive about people um and uh how they came overcame the problem so people are always good even if they're terrible um do not attack people individually as tempting as it is because what happens is you bulane uh the the people that would be able to come and help you um so it should it should not be about the people who created the problem they had a reason why they created the problem there was something in that system uh some constraint that they were trying to live with just like you're now trying to overcome a constraint um uh but feel free to attack the problem itself and the underlying constraint or at least ask why you care about that at all um the last thing on on celebrating successes is um and this is frustrating when you've been a person who's been pushing an agenda for a long time and trying to move a large body of people over to your side um you have to leave room for people to come around so um I found early on in my career and sort of uh trying to make big changes um that uh I'd be pretty mad when you know you You' be arguing with someone for a year about why you should do something different and uh at the end of it there was so much weirdness between you that it became hard for them to actually admit that you were right the best possible outcome when you're taking the things that you're learning here and you're exporting them to the larger world is this uh it is that they can flip to your side without even knowing that it happened um it's it's just the simplest most obvious clear thing it is the right thing by default um and that is truly winning um the there's a great book called crucial conversations and crucial confrontations which I it's woow woo in some ways but not I recommend it um and it talks a lot about how to discuss this sort of conflict with people um but don't fight stupid don't and don't create it by you know needing to be right all the time give that credit away early so the last thing is um compelling events um so I said before just wait and you will have an opportunity uh to uh create a compelling event or sorry to exploit a comp compelling event uh that uh uh well I'll get there in a second you beat me to it man all right surprise is ruined um so uh stuff breaks all the time and you can use it but more importantly there are all these things that are used against us all the time to say no to block things to do really wasteful work and the best possible use of an internal compliance mandate that you're all going to suffer through is a way of being like hey you know what we could do that while we do continuous integration and deployment we could do that while we improve the front-end user experience um and I uh I know a lot of people who have been extremely successful in subverting what should be a terrible process and turning it into a great opportunity for change um cloud has provided this so every uh how many like CEOs and cios are like Cloud now raise your hands oh wow blessed few of you I love it um so the uh okay uh anyway these types of migrations you know seasonal scaling whatever it is these provide these great opportunities to be like you know what let's do it a little bit differently this time um and you should use those to the best of your ability um when it comes it's not I told you so like when the dev pipeline finally breaks down and you're like you know what you should have been using G instead of that other horrible thing um don't don't do that um I mean you can do that like with your friends and be like I told them so but honestly the most powerful thing is just asking what do we do now and uh again leaving them that room to come around to you because you'll find that people just glom on and say you know what I love that I want things to be better too almost nobody actually wants things to be as shitty as they can be be inside large organizations almost nobody there are some people that totally love it um I don't understand those people I do encounter them from time to time the uh the last thing to understand on the compelling event thing is um uh so this is one of Jessica hagg's graphs um so opportunity uh increases with level of upheaval so the big bigger the outage the bigger the thing that broke the better chance you have of uh of making sweeping changes inside of the organization because people are suddenly receptive to them in a way that they would would never have been before um I'm going to talk about one last thing um permission uh I'm a get forgiveness not permission kind of guy um and uh and and most of you if you're here are probably somewhat in the same boat um I had a profound lesson about this um which really shaped the way that I approached uh uh approached organizational systems and Dynamics um uh during Hurricane Katrina I was deployed as a task force leader um uh in Hurricane Katrina uh it's another very long story but um there was a a fascinating case study called the new Waveland Cafe and the new Waveland Cafe was staffed by an archists and FEMA desperately wanted uh so FEMA will call them the Enterprise and the anarchists being uh maybe the Velocity Community um uh fee was like well who is in charge here right and the anarchist would literally yell at the same time either nobody or no one's in charge and so then FEMA would be like you have to leave and uh we've got to set up a shelter like ignoring completely that there were thousands of people being served uh hot meals right there in front of them it's one of the craziest things ever but their process and their permission system and everything else said that they had to be within the system conforming and the anarchists uh refused to be a part of the system that gave any single individual more Authority than the other it was one of the craziest things I've ever seen um and uh and like it was really frustrating and there was this guy um uh uh with with one of the major emergency management agencies who simply said great let's make them all site directors and so then when FEMA would come uh and say who's in charge here someone would say I'm a site director and then they would give them supplies and materials and um uh and it worked incredibly well they ended up getting to about 20,000 people a day that they were serving um it was it was amazing it's a it's a a unique story but there's a lesson here um this is a a picture a very tiny fragment of it so those are FEMA provided medical supplies and a supermarket that they built that was free uh because they didn't believe in money um so it was a little weird there but uh as long as FEMA kept on bringing in stuff it worked great um it was really cool um the uh the interesting thing about this as a lesson for me was um most of the time when people are saying no what they really are saying is I don't know how to say yes and you find that when people have an A you know some reason or a belief where they're trying to do the right thing often times you can hack it just by finding a way to do something slightly different um one of these examples um and then I'm out of time is um most companies have a Wiki or an Internal Documentation tool um I find that simply documenting your Authority in that tool is a great way of granting yourself authority over a particular project um and so uh I recommend using titles like Zar or um even master of disaster which uh was what my business card read at Amazon um and it was appropriate to me and it gave me a lot of leeway but I just typed it into a form one day um and then it stuck um but the the point here is that the permission that you guys need in order to go out and actually do crazy awesome stuff in your organizations um and overcome a lot of the stupid um uh really is just going to happen because you will it to be so and then the occasional obstructions that you run into are usually overcomable with u well you know a little creative engineering and maybe a badge that says site director um and that's what it means to not be fighting stupid and to make more awesome thank you very much um it's been a pleasure to be here