DevOps is dead? Nope, it is maturing ft. Jesse Robbins

Rob Zuber

The Confident Commit by Rob Zuber · · Video · 37:57

"Organizations evolve like cities. You start with a few shacks in the woods. Eventually you have enough at stake that you need building codes, fire codes, a fire department, and someone who actually tests the sprinklers."

— Jesse Robbins

DevOps is not dead. It's maturing. Platform engineering is the next layer of the same idea, not a replacement for it. My conversation with Rob Zuber on what's actually changing and what isn't.

Rob Zuber hosted me on The Confident Commit in the middle of the "DevOps is dead" content marketing push. A lot of people asked me to weigh in on that narrative and I had mostly been rolling my eyes at it. This was the conversation where I said out loud what I actually think.

DevOps is not dead. It's maturing. Platform engineering is what happens when an organization gets big enough that some components of the stack deserve their own dedicated team. I've run platform teams my entire career. We did it inside Amazon and then started externalizing the components as web services. The shape of that is normal and good. What's not good is the clickbait around it, which takes a movement built on getting people to work together better and attacks them personally by declaring them dead. I find that shameful. Every CTO and CIO I know who is actually doing this work is turned off by it.

Rob and I also got into what's actually changing under the hood. Observability is still in early days, and better instrumentation lets us build more complex distributed systems, which then drives the next round of how teams organize around the pieces. AI is going to change things, but the recurring promise of fully autonomic, self-healing systems is the self-driving car of our industry. Every advance in compute follows Jevons paradox. We make a thing cheaper and easier to consume, and we consume more of it. Cheap S3 and EC2 let us build harder problems on top of the easier base.

The frame I keep coming back to is that organizations evolve like cities. You start with a few shacks in the woods. Eventually you have enough at stake that you need building codes, fire codes, a fire department, and someone who actually tests the sprinklers. The DevOps movement is the building-codes era of how we ship software. Platform engineering is the fire department. They're the same project.

Full Transcript (AI-generated)
hello and welcome to the confident commit the podcast for anyone who wants to join the conversation on how to deliver software better and faster [Music] you're listening to season three of we're all focused on teams right now I'm your host Rob Zuber CTO of circle CI the industry leader for all things CI and CD and today I'm joined by Jesse Robbins General partner at heavy bit but a lot more than that uh devops party starter uh industry luminary founder of Chef and so many other things I'll let Jesse introduce himself but first of all thanks so much for joining me Jesse I'm excited to have you here I am super happy to be here um you know my history uh I got my start uh really at Amazon um I was uh uh actually leaving the tech industry when I joined Amazon in 2001 I was gonna go be a firefighter and uh Amazon uh had a like Enterprise backup architect job open so uh join that got the title of Master of Disaster uh and started working on availability and sort of getting site reliability up to uh what I would think of as modern standards um and along the way I realized um starting about three years in that um we needed to shift fundamentally to a model of empowering people um to work faster sure do more but a lot of people on call originated the sort of pagerduty concept uh there that tool was built by a different team internally but uh ended up uh running large parts of that and creating Incident Management and game day which is now chaos engineering and then I left Amazon in 2006 and I went to Tim O'Reilly um and uh we pitched an idea of there being this new way that large successful organizations build and work and that way um uh you know we were calling it web Ops at the time we started the Velocity Conference and that's where basically the organizing ideas for me and a lot of uh other people contributing a tremendous amount began including uh you know that very first session uh that uh was uh Dev and heart and Ops love and um that is uh the beginning that very first Velocity Conference of uh the devops movement um you know which is a big inclusive amorphous uh movement around the idea of people uh working together rather than in silos um in order to uh build faster better and stronger um and that along with my company Chef which I was fortunate enough to uh co-found with Adam Jacob and some other folks who are uh just incredible um uh we really sort of started the party is the way that I uh I like to look at that and so um you know I've I've had my hand on and been involved with um sort of the primary software engineering uh and Architectural movements and their their shift over the last uh almost 20 years of my life um and it's pretty cool to now be an investor where um I basically get to work with the smartest people who are building uh new things in new ways um and uh that's what we do at heavy bed it's uh you know we're we're a group of Founders who have all built uh infrastructure uh Cloud paths Etc things um uh we like to say we help people turn uh uh Visions into movements um and uh products into platforms and uh you know contributors into large and thriving communities and um so you know that's what that's what we do all day and yeah um it was uh uh pretty cool to get to work with circle CI at the uh at the very earliest days we actually shared desk space um together with the original team um and uh uh and help kind of shape and drive a lot of how that evolved in the from the very earliest days well we obviously appreciate that and still do and um and are still close to many of the other sort of I guess members of the heavy bit community um and and so much has come out of that for us and then you know since then amazing companies um have participated so so quite a ride and you know one of the things we want to talk about today is there is our current Circle CIS um state of software delivery report and there's some some interesting takeaways are there but I'm actually going to jump to the end just based on a couple things you said there like the uh Dev heart Ops I'm going to start with that um and kind of the original notion and one one concept that I think about a lot is like the transitions that I rode through I understand much better than the ones that happened before my time if that makes sense so you know I I was part of Dev when Ops was our enemy you know what I mean versus like now when you you devs were breaking everything and right like all I need to do was slow you down so that the site would stay up like a real concrete understanding a true belief in you know I I see how things are better because I was there in the before um I would say that about about lots of different things and so what I'm curious about you're nodding and sort of it feels like that's resonating but then two like how people talk about devops today at some point we just you know we lose ownership if you will like we you know you get things like devops teams versus devs and Ops working together um but also there's been this recent Trend uh you know we talk in the report a lot about everyone go right there I I look well let's get into it this is exciting so there's this recent trend of saying you know devops is dead because now we have this new way we have platform teams whatever and I I'm curious for your take I mean are is this an evolution of the thing we've been doing all along or is there really something that we're fundamentally changing our perspective on all right so many people have asked me to weigh in on this and um and I uh I have uh really just mostly rolled my eyes when it started so let me start by saying that like uh Dev and Ops and devsecops the idea of having teams working together around common purposes um uh whether you're using tools or you know anything else um that idea is fundamentally a human practice and there are things that we've all built and tried to implement that uh that uh make that easier to do well um but uh the there's this Natural Evolution of uh of the way that Services get delivered um where you start uh at a certain scale you're a small team small organization and um you you know you you have uh multiple teams maintaining multiple components of a stack a platform if you will um and uh over time uh as you get bigger and more sophisticated um it becomes really obvious that there are certain either infrastructural or software stack or other components um that are part of that platform that that need their own dedicated team own Specialists Etc and so I have managed and organized platform engineering teams throughout my career um you know we did this internally at Amazon uh and then you know started externalizing that by off offering you know web service components externally so the the idea itself is very straightforward it's welcome it's normal it is literally the evolution of uh of a maturing model as organizations get larger um but but even at small startup scale you have people running and building and maintaining the platform components uh on which you have customers that that consume those that is that is normal and and great and um it is coupled with a very irritating content marketing strategy from driven by a single startup whose name I will not name um uh and I I truly am sickened by it and the reason I am sickened by the content marketing strategy um that that I you know will acknowledge has been effective is that it takes people it literally takes people and and this inclusive fundamental movement that has been about getting people to work together better and more collaborative collaboratively and it attacks them personally and says that they are dead and um I will will admit that I have been screen capping all this garbage that uh that these people have been putting out um it's clickbait it's that it's pure and simple and so um so on the one hand like what I totally agree and I'm excited to see is we're now seeing this entire concept mature around devops we're seeing now more people building and taking ownership of more of their infrastructure Stacks or software and delivery Stacks that's all great that is platform engineering yes absolutely and it's even a trend that like Gartner is uh supporting Etc and then there's this separate part um which is just offensive um uh it it literally is triggering for people who are in their careers trying to build themselves up trying to make things better like the devops movement is still just getting going it's not done um and uh and then there's these people that are just engaged in in terrible content marketing strategies that I'm sure are working for them but are actually poisoning the discussion um because uh uh what people want to hear and learn about it's like what am I supposed to do in my career and you've got you know someone with these repeated uh little click bait things popping up so um it's real simple um I think those practices around the devops is dead stuff is shameful um and every engineering CIO CTO Etc that I know every customer that I know that actually is doing this is really turned off by this and so maybe that will affect the the you know business strategy of a company that's pursuing it but the larger piece is great um the you know other than that I think it's awesome that we are seeing the maturing of uh of a platform form approach that is again the same thing that every large company does we're just now seeing it start to distribute out so there you go you got me to talk about about the well uh I appreciate that and I um well more so if I'm honest I appreciate the entire devops movement like speaking of having been in the before times and the after times um it's certainly a thing I would not want to move away from right like that return to silos great right yeah fantastic no but like I'm responsible for something and I can take away this thing that's high cognitive load for you and make it stable and reliable so you can just depend on it fantastic but let's also talk about your needs like I'm not just gonna go off and build a thing right so I think um the you know the cultural shift is all when anyone has tried to put that in a box it's all it's always frustrating being like no this is about changing the fundamental way that we work together right um and I think that we can we can stick to that so speaking of Revolution and sort of these like Made Up Revolutions I do think we have had some very significant changes recently but they haven't been our sort of evolution of how we build things but rather the impacts of changes in the in the outside world whether it's you know effectively starting from 2020 pandemic you know economic upswing economic downswing just a lot of uh a lot of forces that have impacted all of us what we've seen from teams as we look at this from our you know state of software delivery report is teams are getting better and better they're they're moving faster they're performing better through all of that and I'm curious is there anything in your mind from these outside impacts that's really driving the way that teams are thinking and and performing so again the most important part is that I think it's pretty conclusively proven that um that teams that operate uh together that are aligned around Mutual purpose that are focused on you know similar metrics and and outcomes uh perform better right this isn't controversial this is uh you know basic management going back 100 years the um the the most important shifts are um you know understanding that uh we build these highly complex interdependent systems and um successful organizations um are ones that can ship faster and deal with change better um and dealing with change includes you know existential uh threats and issues it includes its outages it includes you know uh major business changes um and so you know the ideas that we sort of uh started and organized around in the early days of devops you know the ideas of Incident Management and uh chaos engineering and resiliency built into the the framework and then resiliency built and knowledge built into the way people work um uh are the only way that organizations are going to survive in Modern Times And so you know the the example we used to give is uh you know Blockbuster versus Netflix right like being able to respond to fundamental changes in the industry um you know in the time that you're you're able to um I also want to return and just say um the fact is is that it's going to be harder for uh during periods of scarcity for people to operate at their best um uh the organizations that recognize that and and continue to invest in and understand how to build High performing cultures um that you know build great things and and keep them running um uh uh that is going to be a a survival imperative and so I think what we've been building sticks I think that there will certainly be some backslides and I am sad that in the past when you really could just go get another job somewhere better um now it's a little harder when you know when you're in an organization that maybe isn't working uh out so well and that you know people have become shitty and siled and difficult and like all the way the things that uh are why we started the movement and set of practices in the beginning um uh you know those those uh those regressions begin to occur but you know you may not be able to just quit your job that day and get another one the next day um uh you know uh but but in the end um I think that the the pattern sticks um and the best organizations Rise um out of uh helping great teams do really great work together absolutely I mean the the thing that I take from that is these practices were great in the good times they're even more valuable in the downtime like us supporting each other to solve like the common goal versus throwing things over the wall like this is what it really matters and it's a little it's a stress test right were you sort of just paying lip service to that goal or is that how you fundamentally built your organization and your culture and and to your point the ones that survived through these times are going to be the ones where it's truly the belief of the organization so you know another thing that we've noticed or I guess it sticks out for me as we look at our data you know people are getting better and better at this every year in terms of you know shipping velocity product there has to be a point of diminishing returns like at some point we're really great at moving software through the system saying this is the thing we're going to build we built it in high quality we pushed it out what's kind of the next Frontier for those companies that are at that at that level I mean there's many that are you know working their way in and sort of just starting but for the ones that are really pushing the envelope where's their next set of challenges in your mind so um I think more of them as opportunities right um so the um right now you know I think we're still in the early days of what what's happening with observability broadly and sort of the tectonic shifts um that have happened where um you know if you've been doing it long enough you're like well detriase had this and like I've you know Solaris had this whatever um but the the actual idea that you can insert do full scale end-to-end instrumentation um is absolutely um continuing to explode and uh and be really exciting and so you know uh what what we found is that like in the early days um you know as we were sort of getting cicd started and uh you know sort of this broader bucket of of approaches um you know we did like uh we did what we could with what we had but we didn't really have a lot of great understanding of the system as it was running right it's easy to I tell in in small ways not big ones um as we get better instrumentation it enables us to build more complex distributed systems um which uh which allow you to you know break things into smaller components uh build optimize and scale them better and so where I think that we're we're going is um you know we're going to start to see a step increase in complexity um with more things distributed to the edge with uh you know more shifts in the kind of client model um and uh and so what's going to start to happen is um you know all of the the new challenges that'll emerge when it's like well we can you know do we can we can build this app that runs at four times the speed in at a tenth the cost but it means we you know we now have to instrument a lot more client stuff we have to you know operate and scale differently and um that's going to then drive the next wave of collaboration and uh you know people working together as they build new things so I think observability drives complexity better tooling better you understand the machine the more sophisticated machine you can build um and then that comes right back into like how you're organizing your teams around those uh those individual components got it that makes the tennis says whatever anyone says things are going to get more complex I have to say I feel a little nervous but I'll be thankful for for the tooling that we're building to handle it so that's the point is like that nervousness you you immediately like I just need to be able to understand it right I still need to be able to have like you know time to recover really fast I need to be able to like you know reduce risk I have to do all these other elements right and like the reason that we have Gates currently are um are because we don't we are gated by our ability to understand and manipulate the system um and uh yeah that's what drives it right well so so you speaking of I guess people pushing the front uh of the envelope you had flagged this article um about Google's recent announcement um on their ratio of sres to to software developers and um you know there's talk of AI and Automation and other things that are making this problem go away and I'm curious um is that telling us something about a trend that we should all be thinking about or does this feel a little bit back to the previous question like a response to just macroeconomic and maybe moving away from from you know some of the things we've learned I mean um I don't know of a of a year where a hundred companies haven't said we're gonna build an autonomic self-healing self-managing system that fully does all like we there's been this promise just like self-driving cars for a really long time um I my personal belief is um you know the uh the the idea of like the the evolved SRE role um uh it gets really interesting as organizations get better at operating at larger scale um uh you know we did a paper um myself uh the Google team uh uh and um and another team a couple years ago in info queue and it was about like why we evolved our chaos engineering scale Day game day testing models and it was eventually you get so good at keeping things running that you actually forget how uh how to how operate when they don't and so you have to break things deliberately you know I don't think no if you can see it in my background behind me but like my I have a fire helmet there it's my fire helmet uh you know we have to drill and train as firefighters routinely in order to like understand when things Break um when we talk about Ai and we talk about you know these really exciting new models um certainly things are going to change and I think they're going to change in a uh in in significant ways um uh the idea however that um you know we're going to build fully autonomic systems that are just like a self-driving car um uh and you know like you're just going to express what you need and it's going to be you know automatically resolved um that would be awesome if that happens because it'll let us work on a new set of more challenging problems um I find it unlikely that that's actually how things are going to play through only because you know no other advance in history in human history has done that number one and number two humans are really good at building more complicated things the moment we get a thing right like uh you know when when S3 and ec2 launched and you know I was there for that um the uh uh and I was S3's first internal customer at Amazon um uh it's not like we suddenly said great we compute is really easy and now it's cheap and therefore um we'll just use exactly the same amount of it as we did before yeah good news is hey everyone it's easier to rack and stack servers now so there's nothing new to do um that is not what happened right yeah uh so uh there's this uh thing that drives a lot of understanding of like how the evolution of compute Works which is jevron's paradox uh which basically says um the easier you make to consume a thing the more you make it a utility the more people consume it and um I suspect it's going to be kind of like that where you know we we uh we solve one category of problem and we we create a bunch of new ones I mean right you know I would love nothing more for Circle CI to launch the auto build remediator tool that just like every time you know you're that something fails in the CI CD pipeline the AI goes in and fixes all of it for you and and then delivers it that it like like I when you have that I'm super excited I'm sure that that's 100 on your road map and uh you know like um there's probably some work to do between like that feature releasing itself and maintaining itself and uh and where we are right now yeah absolutely absolutely absolutely I think um well yes 100 the our willingness not just circle CIS but everyone's to say oh that's fantastic now I have a building block to build something new like yeah yeah to your point it's human nature we want to build the new exciting thing and now we're not limited I was never excited about you know flying across the country to install these like blade servers at a data center it was just necessary to allow us to do the next thing that just have no idea what he's talking about so instead of using ec2 before we had to go to a place where where the servers live and physically install them and like that was a part of your job as like a senior engineer yeah and I stopped having to do that in 2004. um and uh you know so that was that was good times for me but uh but man it was fun in those data center days that's right well I always got sick for some reason I think the somewhere between the Heat of the machine and the AC coming from the recycled air like it's it's not fun don't don't try it but somebody's doing it it's just that we're not doing it now right like I'm pretty sure that that Exodus data center is probably part like got rolled into US East one at some point anyway yep um awesome so what one last thing on the on the team front that uh that we saw uh out of this report that had left to get your take on before we start to wrap up is sort of this notion of critical mass and I mean you were at Amazon for the uh I don't know how many people it actually takes to eat two pizzas but you know that concept of we need a team that's sort of big enough to do the thing and I'm curious if you have any of your own insights in terms of what happens at that size is it just total capacity to get past the maintenance work is it that you have enough diverse perspectives to like really um solve all the problems like we just the sort of velocity of smaller teams relative to the velocity of larger teams is not linear right there's a point where it feels like it really tips over and teams start to fly and I'm I'm you know I I have my own insights based on teams I've seen but I'm curious what you've seen from a larger perspective in the industry all right well let's let's address the controversy first so what you're describing the Amazon two Pizza team model is uh you know uh and and like the idea that a small team of you know developers product people Etc can operate you know like a startup does um uh and uh you know inside of a large organization and what do they use in order to be able to do that is it a platform oh man who makes that right who makes that platform is it is it people that like vend a service to other people almost like a platform engineering team so um you know that a song is oldest time right um the uh so the the the interesting thing for me is um you know as a person who makes his living changing the way people work ultimately that's what I do um you know that's whether I'm doing it as an investor and helping me make those things or a founder or whatever um so uh organizations get real slow as um decision making slows down um because there's more stakeholders and so for me there are ideal team sizes um who do a couple of things pretty consistently so the first one is breaking uh uh pieces into uh the smallest possible uh layers and pushing uh complexity up the stack so you know what you want is lots of small simple components on which you build and operate a platform um and then uh you want to make it very easy to understand manage and scale that and for organizations to work well um it needs to be really easy to ship software and it's often easiest to do that by you know delivering small small pieces that then you know connect uh together farther up the stack um that's not new um you know that's a very well understood uh uh uh both engineering discipline and I mean I think it's proven in the mythical man month and the um you know every subsequent uh engineering uh treaty um what I'll say is um I watch teams and organizations struggle when um they accumulate a combination of organizational Technical and we'll call it cultural debt um and so you know when things are hard to ship um uh it usually requires a a very large mine mine shift to um uh to uh to survive and um often we see these calcified organizations that slow down and and uh you know forget what made them great um uh get slower and slower until either something breaks or they fail that's sort of the death spiral and that it doesn't necessarily have to be technical um it can just be like oh it's really hard to make product decisions like anything any organization that does things slowly um is is the worst kind of organization um uh if they uh well any organization that does things slowly better have a good reason for that slowness so it better be life safety regulatory it it has to be rooted um in a in a business constraint and so so like organizations should move slowly and carefully and deliberately when that's how they need you know need to operate uh to survive and thrive um but organizations that are in competitive industries that um that end up building this kind of calcification up and decision-making hierarchies and layers and layers of management um you know and they do that for no reason uh or they do that because you know that was what their hiring budget was those they become inefficient difficult to get things done and um you know people leave or they fail so I personally believe that the magic happens not with any specific team size um but but really just you know are you able to ship frequently easily and safely and what do you need to invest in in order to make that uh you know faster and wider so um sometimes that's internal platform teams sometimes that's using external Services um uh you know changing workflows and prac practices and sometimes it's just like saying we need to have fewer meetings right um like that's a you know why why are we having a meeting about a meeting for a meeting um you know the the revolution for me at Amazon began when it was like we want developers to be able to push code to production and I'm like okay the only way we we can do that and we can start building up the guardrails and you know test infrastructure and what we called you know cicd back then um uh but it means you've got to be on call for your own Services um and so you know our what what became SRE team was 14 people when I left at 14 for for that all of that and that was because we were then in the business of empowering others and then you know responding when uh it was more like a fire department call so um so I I tend to think things evolve frankly the way cities do you start off with you know a bunch of Shacks in uh in the woods and slowly you get you start building a civilization and eventually you start building enough risk that you need fire codes and building codes and a fire department and you know people that are going to test the sprinklers because they're critical Life Safety elements and no you can't build a skyscraper and have people in it if you don't have you know Sprinklers and and panic Hardware on the doors so that's kind of how it evolves it evolves like a city does uh you've established your credibility for the quality of your firefighting metaphors uh but I noticed they're definitely liberally uh included throughout all of that uh one way to um to tell like the joke is how do you tell someone's a firefighter how don't worry they'll tell you [Laughter] love it well to wrap up with our team Spotlight and you know this could be technology or something else you've done in your life but uh can you tell us about the best team that you've been on can't be the current one that you're working with and the specific moment where you realized this is a fantastic team like what was it about the team during Hurricane Katrina um I was a task force leader um on the ground um uh in Mississippi and um I was part of the FEMA response that was not working well and there were these incredible volunteer organizations that were also really problematic because they were outside of like the you know FEMA badge system and I had this moment where um uh this Anarchist Collective that had shown up and was feeding 5 000 people a day in Waveland and that included like the responders that were helping and National Guard like started hanging out and like this is like a hippie commune type thing it was uh it was like burning man uh style response and um so the problem was was that FEMA wanted to help them and uh and like give them supplies and resources and generators and all kinds of other stuff but they were a literal Anarchist Collective and so if you ask them who's in charge here we need someone to sign for this multi-million dollar piece of equipment that the government would like you to have and you know all this food and this airstrip and all this other stuff they would yell at the same time uh no one's in charge where an anarchist Collective and that like watching the National Guard try to interact with these anarchists um uh uh who you know like like and just try to solve the mission which is like they're a hungry and unhoused people and we need to solve that problem for them and so the there was this uh this senior leader within FEMA who was part of the response team and um I remember and he's like what are we like we have all this stuff that we can just give them and we will triple our our daily you know fed people in the middle of this crisis and he said okay he goes and meets with them and he goes um what if we make you all directors like every one of you rather than you so that when the National Guard whoever shows up so we're just going to make you a director of your thing we're going to FEMA badge you and when someone asks you who's in charge don't say no one is for anarchists just say well I'm a director and then you can sign for all this Aid and we can like help a lot more people and so I watched this entire organization shift um by uh they all said okay and then suddenly like helicopters are flying in and like they're unloading all these like the Army uh Corps of Engineers has like built a new kitchen for them fridges and new bathrooms like suddenly it it became like we went to serving like 20 000 people a day it was this huge deal and all it took was a leader who was willing to relax the part where it was all the silos and it was like here's all the rules and just say cool I get you guys want to organize yourselves and we just need you to be someone to sign for stuff and if you're not willing to have a single person I want to empower all of you to do that and it was truly one of the most inspirational moments in my life like there I am I've got PTSD you know like completely wrecked along with my the team that 26 person team that I'm leading like just banging our heads trying to solve this and it was just as simple as like look um you're all directors and when I think about like high performing teams it are the ones that have that degree of resilience and flexibility and understanding what the goal is the goal is not to have a unified command structure that conforms to the FEMA deployment plan it is to feed and House people and what do you need to do in order to get that done and if you think about that in terms of what we do it's the same thing like how do you achieve the mission how do you organize and support people in the right way um and how do you do it in a way that you actually get the job done and so that's the best team that I've ever worked well that's that is an amazing story I'm just gonna let it hang there for a second and and call it there Jesse I knew this was gonna be awesome but it's far more awesome than I even thought it was gonna be thank you so much for joining thanks to everyone for tuning in uh if you enjoyed the podcast share with your nets Network subscribe and your you know podcast provider of choice if you want us to talk to someone else talk about something else find us on Twitter Circle CI Jesse thanks again just all thank you and hey if you're building uh you know Enterprise infrastructure developer tools startup and you want to be as big and successful as Circle CI uh go ahead and reach out we're happy yeah we we appreciate you heavy bit and Jesse uh and highly recommend awesome thanks again [Music]

Further reading

Topics