Jason & Eiso talk about Velocity in Software Engineering, what it means, how to measure it, why data and metrics won’t magically solve all your problems, and what engineering leaders can do for their teams to go faster.
Engineering velocity is effectively this box in the middle of the mechanical parts that we have largely called CICD. With some inputs and outputs. I would expand that box from the moment a ticket gets picked up, the moment we decide to work on something in engineering, 'til it reaches that end user.
Certain organizations say they're agile, they'll say they're nimble, they'll say they're continuous, and what they have is a series of gated waterfalls.
You have to submit the PR, it's gotta get approved, it goes into CICD before or after, depending on which of the systems you're running. It goes from there, it gets set to pick up or automatically picked up depending on how sophisticated your system is. Then it goes to the next step, then the next step, then the next step. And each one of those steps either is a manual or automatic approval, but there's a some approval process to get this thing going.
The majority of tech companies today, even if they truly wanted to, can't actually automate everything because of compliance reasons. This is the part that we never speak about. If you are actually going through a SOC 2 Type II compliance or HIPAA there all of a sudden gates that you have to go through. And this means that some things will never become fully automated.
Stepping back, we gotta understand again what the goal is. The goal is to get stuff into customer's hands, and our goal is to get it into the customer's hands as quickly as possible, as safely as possible with as little bugs as possible. That's actually what we're trying to do. It's not, “we need to review each other's PRs and we need to have it go through the CI this way.” And- oh man, I could rant about that all day.
Everybody struggles because they assume that the moment we start measuring [velocity] and exposing the data to the engineering org, everybody all of a sudden is magically going to improve. And there's something that everybody misses, it’s that every engineering leader ever that I've come across anywhere is overwhelmed.
This mental model is mentioned by Eiso in the episode as an actionable way for engineering leaders to use metrics to continuously create positive changes in their organization.
You can read the full blog post on this process (which includes a real example) but here’s a quick rundown:
What to dig deeper into engineering metrics and are not sure where to start? Here are a some books, blogs, and newsletters we recommend:
📘 Accelerate: The Science of Lean Software and DevOps
Eiso: And there's something else like in those hyper or fast growth kind of organizations that I see time and time again, which I find really fascinating that the data tells us, is, you know, when they're, you know, five or 10 people, it's one other person reviewing the PR. But now they're growing and then it becomes two people.
And at almost every organization that recently grew to 100 engineers, all of a sudden you see it's three people on each PR. And if you ask why, it was actually for good reasons. Like, "Hey, we're onboarding new people. We want a knowledge share. We wanna get more people involved. We wanna get them like learning." But what ends up happening is all of a sudden you have what was supposed to be a temporary thing. You know, while people were onboarding, they were supposed to get in and, you know, and, and review some more PRs than usual.
Now all of a sudden, as an organization, it's two person, you know, should review becomes the policy and the default for every PR, which is complete madness because that holds true for the single, you know, character change of a typo all of a sudden. Like we become super dogmatic and automation doesn't always help here either, to be honest.
Narrator: Welcome to Developing Leadership, the podcast for engineering leaders where Eiso Kant and Jason Warner share their lessons on the ins and outs of managing software teams. On today's episode, Jason and Eiso go on a deep dive into engineering velocity, from the moment a ticket gets picked up, to when it impacts the end user. Keep listening to learn more about velocity, how to measure it, and what bottlenecks will inevitably emerge as you scale.
The guys also share their views on universal patterns that slow down engineering orgs and offer tips on how to improve the flow of your systems. As always, this episode comes with accompanying show notes, with a deep dive into the main topics, mental models and key moments from the episode. Find them at developingleadership.co and link in the episode description. [singing]
Eiso: Hi everyone. Welcome back to another episode of Developing Leadership. Today you just have Jason and I with you, uh, talking about engineering velocity. Jason, what sparked this conversation? Why did we wanna do this topic?
Jason: If I remember correctly, we got a Twitter ask about this one obviously it's a topic that you and I can go into at depth. So people, people ask us in various forms, but I think specifically we got a Twitter ask on one of our threads.
Eiso: So let's maybe, I mean, to be honest, my day to day is, is engineering velocity is probably the term I, I hear the most. And, and a lot of the companies that we work with our, our overarching goal ends up being with them to improve their velocity. Let's maybe start by defining it. How would you define it?
Jason: Oh wow, engineering velocity. I, it'd be hard for me to start calling it engineering velocity at the end of the day given like my past 10 years too, because I think of it as organizational velocity, uh, the ability for any organization to take an idea and get it all the way out to customers. I mean, that's how I start to really define it because I don't think that it's exclusively the domain of engineering. But let's get to engineering velocity real quickly.
Engineering velocity to me is effectively this box in the middle of mechanical parts that we have largely called CICD, et cetera, et cetera, with some inputs and the outputs. And inside there is where we're gonna break it down to like, what is engineering velocity?
Eiso: And to me, I would, I would expand that box even kind of from, to put it very practical, you know, the moment a ticket gets picked up, the moment we decide to work on something in engineering, 'til it reaches that end customer, that end user.
There's more to it if we get into organizational velocity from, you know, idea stage, et cetera, but like by the time something is ready for an engineer and like an engineer organization to work on and get out, to the moment it actually hits that, that customer.
Jason: And it's probably good for us to try, to try to constrain this conversation to the engineering side of the fence today because we can get into the organizational stuff, um, as well. But there's a lot to pick apart just inside the engineering boxes that we talked about. And if we wanna get into the other side, that's probably a topic for another day, but there's a lot to unpack on that side of the fence too.
Eiso: I agree. And, and as you know, like my jam is, you know, how do you help engineering orgs continuously improve and, and for many it's their velocity, and it often starts with measuring it. And so you, you said something really important, you know, inputs into the system versus outputs. I obviously have a lot of opinions on this, but talk me through some of the things that you believe are really necessary to like baseline understand what your velocity is and, and let's go from there.
Jason: So what I, I believe given what I, you know, just in various areas of life but specifically applied to engineering, is that the larger the change, the slower it will go by just nature. So, you know, and also increasing the risk, um, slowing it down, increasing the likelihood that it doesn't happen on time or to the satisfaction, I believe in just smaller changes. Now, this is the at heart the principle of more of the continuous delivery ethos, a lot of the agile philosophies and et cetera.
But in, in practicality it's about smaller changes that can be made, uh, and increasing speeds. So the inputs into that have varying parts as well, like what does the organization believe? But also, I'm gonna li- limit this to engineers at the moment, which is, "hey, when do you actually commit your code or when do you put a PR, up? Or what do your systems look like?"
But imagine theoretically, there's no world in which this works today, but theoretically you had a system that every time you pushed a key on your keyboard, you could release to production on the other end of of the pipe instantaneously if all the tests passed and everything worked and et cetera, et cetera, et cetera.
But theoretically that's like the perfect engineering system to a degree. There's no world in which we're anywhere close to that. In fact, on a spectrum of zero to 100, 100 being we have this, we're probably at 10% to be honest. But that's how I like to think about engineering velocity, it's how fast can an engineer go from putting something down on their keyboard, pressing a button or two or three to this thing getting released to production.
Eiso: Yeah, and, and I think it's a good way of thinking about it. And there's, there's an interesting guest by the way we should have on one day, Amjad from Replit who is essentially taking a completely different angle at trying to build that.
But that's, I mean it's not in the days yet that we're seeing it in organizations. I think in organizations what we're seeing right, is in a very practical sense that it starts with a ticket being picked up from a backlog, an initial commit being made on someone's local machine. We then end up at some moment with a pull request, either draft PR or an, or an actual like PR opened.
And that's when we start kind of seeing the first, you know, set of systems that start in play, the human side of things, code review, and then kind of the CICD side of things, which, you know, everything from running tests to running checks to potentially deploying to ephemeral environments or staging before merge even.
And then we kind of get to merge and then we get to really the more CICD to, to production. Let's focus maybe on that first part from, you know, an initial commit ' til, 'til something gets merged. What are some of the thoughts that you have on, on velocity there and in improving it?
Jason: Think about what you, you just described too, which is what we think about as what happens in these systems. Think about some of the ways that other organizations, I wanna say gate these.
What it looks like practically speaking inside of certain organizations is they say they're agile, they'll say they're nimble, they'll say they're continuous, and what they have is a series of gated waterfalls. It's exactly what they have. You have to submit the PR, it's gotta get approved, it goes into CICD or before or after, depending on which of the systems you're running. It goes from there, okay, it gets, it's there, it gets set to pick up or automatically picked up depending on how sophisticated your system is. Then it goes to the next step, then the next step, then the next step. And each one of those steps either is a manual or automatic approval, but there's a some approval process to get this thing going.
And in many organizations are still manual and there's an actual group or a set of people who are authorized to approve these things. So, you know, let's start there. Let's start looking at which one of these things are still manual to go do and what can we do to automate some of these things?
Eiso: And what's interesting like, 'cause I've seen this now at this point at hundreds of companies, there, there's a couple of things that always come up. First is that the majority of tech companies today, even if they truly wanted to, can't actually automate everything because of compliance reasons. All of a sudden, like this is like the part that we never speak about, But all of a sudden I was like, "Wait a second," if you are actually going through like a SOC 2, type two compliance or HIPAA or things like this, there all of a sudden gates that you have to go through that means that some of these things will never become fully automated.
But to me, the thing that always strikes me is, and it's one of the main bottlenecks that I see in velocity time and time again, and we see this when we look at like a cycle time of, of different teams, cycle time being from initial commit to production. What you end up seeing is that you, you see a team that has a cycle time of, say it's pretty fast moving team, 36 hours from initial commit to production. And then you look at the different code bases that they're contributing to. And then you break down the cycle time for those different code bases and you see, "hey, there's 12 hours on this code base, but there's four days on the other code base." And every single time that has to do with ownership and dependency.
I mean it's very rare these days that we actually get to fully own as a small team, right? Teams are never really more than five, six people like at the team level. All the code bases that we touch. And it seems that like one of the biggest bottlenecks is just the fact of waiting for the review, waiting for the, like the, you know, the merger approval from someone on another team, that isn't the person who sits, you know, these days virtually next to you.
I'm curious, why is that so big? Is this just because we're teaching people that, you know, your tasks take priority over everything else and we're no longer like understanding that we're all working towards the same thing? Where do you think this sits?
Jason: Well, wow, this is a, that's an interesting question that one probably can't give a fully articulated answer to, but I can give a semi articulated answer because I do get this question a lot.
But I get this question in the form of, "Hey, we're a small team. A small founded team. We grew to from three or four engineers to 15, 20, 25. All of a sudden our cycle time is terrible. We can't release stuff to production anymore. What do we do?" Okay, dive in. Where is this, where's the bottleneck? It seems to be in the review process. That's where it usually creeps in for the very first time.
And what happens is people submit their PR for review and immediately go on to their next task. And, but this is a uniform universal pattern that emerges in these organizations. So then who's doing the reviews? When are you doing the reviews? All that sort of stuff.
And I don't know if this is industry wide, but it seems to be a pattern that has happened. And I don't know why that's happened, but I would suspect it's because you just, if you're, if you're growing like that, you're starting to get too busy, you're starting to worry about your own, your own velocity, which impacts the company, which impacts the projects, but you're not thinking through, well it's still one chain in an entire or one bit in an entire chain. You know, one of the things I'd recommend in those cases, as odd as it sounds, is that you dedicate PR time a couple days a week to each person, each team and you say, "Hey, we're all just gonna go in and review PRs on Monday afternoon, Tuesday, Wednesday afternoons, Thursday afternoons, whatever it be for like three hours."
And that sounds heavy handed and it sounds kind of terrible. But instantaneously, as soon as you implement that, almost every single one of those organizations' velocity increases through the roof, because finally somebody's reviewing these, and you can't soft shoot it. You can't say, "We would love if you would all review each other's PRS, 'cause that will last for about a week, two maybe. And then you'll have to do it again.
Eiso: And there's something else, like in those hyper or fast growth kind of organizations that I see time and time again, which I find really fascinating that the data tells us, is, you know, when they're, you know, five or 10 people, it's one other person reviewing the PR.
But now they're growing and then it becomes two people. And at almost every organization that recently grew to a hundred engineers, all of a sudden you see it's three people on each PR. And if you ask why, it was actually for good reasons. Like, "Hey, we're onboarding new people, we want a knowledge share, we wanna get more people involved, we wanna get them like learning." But what ends up happening is all of a sudden you have what was supposed to be a temporary thing, you know, while people were onboarding they were supposed to get in and you know, and, and review some more PRs than usual. Now all of a sudden as an organization, it's two person, you know, should review becomes the policy and the default for every PR-
... which is complete madness because that holds true for the single, you know, character change of a typo all of a sudden. Like we become super dogmatic and automation doesn't always help here either, to be honest.
Jason: And this goes to what I was saying before too, which is like smaller atomic changes tend to go better, but I resist giving this advice to folks saying like, "hey implement like a review time." Because I don't want that to become law. All of a sudden you're a thousand people and say, well we do Tuesday afternoon review sessions. Like no, no... If we needed to understand the problem we were solving, this should not have been codified into law. You know, maybe as the company grows.
Stepping back, we gotta understand again what the goal is. The goal is to get stuff into customer's hands, and our goal is to get it into the customer's hands as quickly as possible, safely as possible with as little, uh, little bugs, you know, as high quality as possible. That's actually what we're trying to do. It's not, we need to review each other's PRs and we need to have it go through the CI this way. And oh man, I could rant about that all day.
Eiso: I mean dogmatic is the enemy of our industry. To me, another thing that I see all the time in the data and it, and this is almost by fill in any organization, is that one person like that one person of an org who reviews more than anyone else. And I mean if it's, you know, 20-person or like it's one, if it's, if it's a 50 or 100-person org like it's two or three people, but that are like a magnitude above everyone else in terms of the number of PRs that they review.
And there is usually an engineering leader there who's aware of this but isn't comfortable on saying to the most senior person who's been amazing, who's been doing all of these reviews and you know, contributing to so many great parts of the codebase like, "Hey, we need you to stop doing this, because, because that you are the bottleneck now." Right?
You're actually the part- but they're often the people who are most senior and most respected in the company, have also been in many cases extremely valuable. But now engineering leadership doesn't dare to say like, "Hey you need to stop doing this." And this is also like one of those quick fixes that I see time and time again like your, you know, review time, during the week-
Assign a person to reviews.
Yeah. Assign a person to reviews and make sure that it's like actually balanced, right? Like even if it's like pick a round robin or pick something like with some simple rules and automation, that it isn't like who picks it up but who actually like, like that you have a balance between who does it.
Jason: I was discussing this with let's call a mid stage company a while back, and one of the things that we were discussing was whether or not to make it so that if you were primary or secondary on call, you weren't coding that week and you were looking at the systems, but maybe you were doing reviews too. So maybe the application engineering teams on call 'cause that's the way that this organization had constructed it. Infrastructure had their primary on call and secondary and so did application engineering. You know, I, I kind of tend to like that approach for what it's worth.
Well then if you're in the application engineering side of the world, your on-call for that week or two was to do reviews. And I was like, "Yeah, that's, that's an actually more interesting way to do this." than to sign a singular person and say, "Thank you Mr. or Mrs. Senior Software Engineer, you are now reviewing PRs as your primary function every day."
And obviously like that doesn't, you know, assigning the call person-
Yeah.
... to it, doesn't fix the problem either, but it was at least a different approach to it that didn't box it in the same way. But at the end of the day we're talking about the exact same challenges, which is how do we make it so that parts of these systems, flow better? Effectively what we have here is we have this want, and I think GitHub is one of the main contributors to what I'm about to say next. Because GitHub started in a way that attracted open source, everything was fully 100% async. That's not how we want to operate 90% of the time inside small atomic units, inside companies. We don't wanna be fully async we kind of wanna be semi async particularly in reviews like, "Hey, I'm doing this. I'm gonna ping somebody to help me review this and they're gonna..."
But that doesn't work because if you're fully distributed in the world, I'm gonna interrupt their flow on Slack. So like we end up in this weird problem where we say we wanna be fully async and then what happens is async cycle times between each one of the various little components increases. And so the overall engineering velocity increases.
Eiso: At the end of the day where it just comes down to- so we set this goal with our customers, which are essenstially modern engineering orgs, right, usually between 30 and like 500 engineers. A lot of kind of companies that we talk about often they often come to us like, "Hey, we wanna measure velocity." And you know, and that's great, like absolutely, and you have to measure velocity to know where you are, but then it's like, "but do you wanna improve velocity?" It's like, "Yeah, that's why I wanna measure it." "Okay, great. what does that mean that what, what to do once you've measured it?"
And this is actually where everybody struggles. Everybody struggles because they assume that the moment we start measuring it and exposing the data to the engineering org, everybody all of a sudden is magically going to improve. And there's something that everybody misses is that every engineering leader ever that I've come across anywhere is overwhelmed, right? Like we're all, like the priorities and the pressure to ship is so high that unless there is really a clear both like communication and process from usually the head honcho, like or hancha at the, at the engineering org to say like, "Hey, you're allowed to like dedicate 20% of your effort to improving the way you work, to remove frictions in your process, and you're gonna set a goal now for the next quarter to improve velocity or SLA on quality." It never works, right? So much of this is process oriented, not even tool oriented.
Yeah.
Jason: So I think the challenge there, i- is exactly what you said, people are overwhelmed, but at the end of the day what you're effectively saying is X featured needs to be released, right? Or, or, you know, Y product or whatever. And what that materially looks like in practice at the C level is it's on a spreadsheet somewhere or a ticketing system, but really it's a spreadsheet and it says it'll be released in September, September 22nd or whatever it is.
But they don't know that because the system's so slow that that needs to be in the system final review three weeks ahead of time because of the way the systems work. They just know it's supposed to go out that day. The pressure on the CTO or the engineering organization then is to actually get this thing fully 100% done, you know, September one in reality. so it doesn't, it doesn't kind of pop to the top in that way and it get, it kinda get hidden to a degree.
And I've been in the CTOC, you can't fix that. That, I mean it's very difficult to go fix that. Exactly. You need more time. You got a production outage. Well that took precedence. Oh the sales team might have semi-promised something, the product organization all of a sudden said, "No, this all, this product feature popped to the top now too. Yes, we still need that other thing, but we're gonna de prior to something else." And this, it's a never ending juggle and I'm gonna bang on my same drum that I bang on all the time on this podcast, which is a lot of the stuff comes back to prioritization at the CEO product or CTO level. But in reality the organizational, the, the company, the C- CEO level and that, yes engineering philosophy needs to be on that list.
Eiso: And I think it, it doesn't like every time, so dozens of companies we work with where like their, their cycle time is sort of dropping, you know, they 50% after they started measuring. But it was never the act of just only measuring it was "now we've measured it and now we're actually goal setting." And goal setting is super interesting in engineering because it is new to almost every engineering org that I meet. Not goal setting as in like, "Oh we have a, a delivery date on a feature." Like you said, you know, 22nd of September we're getting out. But actually goal setting for like what we wanna improve.
We're setting our goal to reduce our cycle time. We're setting our goal to improve our review process by, you know, 50% in wait time for a first review or et cetera. Because this is one thing that I love about engineering organizations, once the goal is set and everybody knows it matters and that means the CTO and, and at the top level need to say it matters, we're incredibly fast to respond and inventive. You start seeing people change processes, hold meetings, do retrospectives, you know, work on the CI system, because we're now given permission to take away frictions that bothered all of us for a long time anyways, right? Like no one likes a slow velocity.
Jason: At the end of this podcast we should get to a whole bunch of things that folks should look at, um, if they're in this this moment. I think one of the keys here too is to communicate what a big thing. So engineering velocity so we don't get too focused on exactly specifically one area, though if one area, if we measure pops way to the top, it's very obvious that we should be talking about that first as a prioritization. But we don't wanna micro focus before we macro focus.
Absolutely.
And I think in that regard too, you know, where do most of these systems really slow down? Well there's a couple of easy ones that we can talk about. One is CI. CI tends to be a very slow process. The PR review tends to be a very slow process. And then the actual release to production itself, like in the system of chain, they tend to be slow. And not universal. People are gonna have different things in di- different systems, but as a, you know, kind of macro trend, that's where, where I've seen them in the past.
Eiso: I think it's what I'm seeing, I fully agree on review. I think CI becomes a problem usually at like 100-plus-
Yes.
... which is quite interesting. So like below 100-plus you'll see most CI times still being in the 10, 15, 20, 30 minute range, and 30 minutes is a problem when you have a hundred engineers who are making PRs the entire day, like it is absolutely an issue. But when you compare that to, you know, how long things are stuck in review or how long things are waiting to... So this is another thing, probably the biggest bottleneck I see is from merged to production.
Jason: Yep, merged to production is by far the longest period-
... and particularly-
... when you have stacked to produc- The, it gets really complicated really quickly all of a sudden, and we, and then we have in depth conversations. That is by far.
Eiso: Our industry's kidding ourselves that we do continuous deployment. I mean you're speaking to [laughs] for those who are listening, I've already spent time with 500 plus engineering orgs over the last years and I can probably count on my both hands the number of engineering orgs I've seen that actually do continuous deployment across their whole engineering org.
There's usually like a couple of teams that do continuous deployment. I that I do see like, but, but then it's like, is it truly continuous deployment if it's deployed in the production but it's not actually getting to the customer because you're still waiting on another team, et cetera?
Jason: I mean, as an industry we don't do continuous deployment at all. What we do is better deployment than we did 10, five, 10 years ago. But going back to what I said earlier about that theoretical press something on your keyboard all the way to it being released to production, et cetera, et cetera, that would be continuous deployment. And yes, that's a theoretical, almost impossibility to get to that point. But the point being that if again on a spectrum we're like 10% of the way there, we're not, we're not doing continuous deployment.
Eiso: But I mean even from merge to production-
No.
... like how many companies do you see that actually go from merge to production with production meaning, and I don't even mean production meaning the end user gets it, like, I mean production is fine 'til the feature flag.
Yeah.
Even there I see very, very few organizations that are actually able to do that. It's incredibly rare.
Jason: I think that everyone has this theoretical, going back to like this, this built in their mind mode of how, like how maybe Facebook or Google operate at this level too and they're like, hey, we gotta need to be more like Facebook's infrastructure or Google's infrastructure. But in reality they don't work that way either.
They don't, they don't fully have this. I don't want people to kid themselves that they can get to that point. So here's the danger here, here's what I want. If you're a CEO listening to this, possibly a CTO who wants to send this to the CEO, avoid this, which is a very real thing that I've seen, avoid standing on stage saying we're gonna have continuous delivery with millisecond deployment times or second or maybe even, you know, a sub hour deployment times. Every engineer in the room's gonna laugh you out, out of that.
They know it's, it's an impossibility. You'll lose credibility. It's, it's, it'd be the akin to standing on stage and saying "for the next "year we need to have zero seconds of downtime." every engineer in the room knows that you don't know what you're talking about instantaneously and you've lost 100% credibility.
So instead talk about improvement, talk about reducing cycle times. Talk about improving to a, a degree that looks a certain way and ask to, to understand a little bit more on that. That said, we absolutely need to push more, better, faster in this way for organizations to feel that they're not atrophying getting static and calcified. If every time I walk into an organization and it talks about cycle time and it's kind of through the roof about, "Hey, I was done on Monday. Three Fridays from now it's gonna hit production." That organization feels calcified and every engineer will tell you that, product managers will tell you that. Sales might not fully understand what they're saying, but they will tell you something similar.
Eiso: To me, so I kind of built up this mental model a couple of years ago related to this con- notion of continuous improvement because I think you hit the hammer on the head. It is not about, and I think this is kind of also we spoke at the last episode we did on related to Dora. It's not about saying, "hey, the whole org tomorrow is, you know, with X hours of cycle time. What matters is like can we get every team and every engineering leader in the organization to know that part of their job is to continuously improve. And for most of them that is velocity, and I think it's also worth noting that not everywhere, because velocity needs to link to the business, right? If you are dealing with a huge quality issue in your organization-
Yeah.
... then velocity is not where you should be measuring or spending your time. But there's this mental kind of model framework that I've seen that is like once the engineering leaders know that it's part of their responsibility, they're given the bandwidth and the permission from the VP of Eng or CTO to spend time on it. The following thing needs to always happen. Identify what you need to improve, right? Like we said, which part of your-
Yeah.
... which part of your step is the thing that you need to go for? Go and discuss and communicate it with anyone who's relevant, right? The team level, that might be all the engineers there, at the director level that can be, you know, different directors at EM's meeting, and then make a decision, right? Your ultimate role as a leader is to make decisions. I don't know how often we've said this on this podcast, and how often it goes wrong there. Like this is the part that still completely blows our mind because it's like, "Okay, we've identified it, we've discussed it, now we need someone to step up and say we're going to decide."
And I think what, what the part of decision making that we don't do well in engineering yet, but I can tell you when it's done well, it just blows my mind like how fast organizations start going, is to then actually align. And this to me is time and time again, you made the decision, you defined the goal, you've set the north star metric that you care about to improve. You've identified the inputs to it, right? Because cycle time is maybe your north star metric, but it could be for you, your review process or your PR size-
Yeah.
... or, you know, it could be lot's of things. Then actually make sure there's execution, right? There's initiatives and initiatives always come from engineers, right? No director is overhauling the CI system or is changing, like it always comes from engineers. And then actually measure the impact. And it's kind of identify, discuss, decide, align, act, and measure framework. When you get that loop going, that becomes really powerful.
Jason: One of the things that's important to also pull apart in this conversation, which is the stage of your company is gonna matter a lot on how you have this conversation too. So you're under 100 engineers, we already use that, that level here on this podcast. So let's just say, hey, you're under a hundred engineers, you're gonna look a certain way. You're under 25 engineers, you're gonna look and act a certain way. But you're over, you know, 500 engineers, it's gonna look very different too.
So, you know, part of this is understanding which stage of a company are you at and how are you, who's got responsibility for what part of the systems? How are you aligning those systems? How are you incenting them to understand what their role in this is too and what their ownership capacity is? But that exa- that loop that you talked about is so critically important.
Remember having this one discussion at GitHub, 'cause we were having some cycle time issues, we were also having some velocity issues at one point. And I remember saying to the infrastructure team and the application engineering team, two things that were 100% diametrically opposed. I said, "Application engineering team, I'm gonna measure you on how fast and how uh, the, the high quality your things are." The two, two things together, which is they can be diametrically opposed too, but that's why I bound them together. "Infrastructure team, I want you to create a system that is so incredibly safe, so incredibly stable and so easy to manage in production." Well in theory, the infrastructure team was basically given something to slow down. That's the better system for the infrastructure team to me- to, to build, is something that slows the application engineering team down. And what the application engineering nee- team needs out of the infrastructure team to achieve their goals is way less safety, no guardrails, like 100% just released to production, all that sort of stuff.
But the point is they have to work together to achieve the business outcome, and so what I told each of the organizations, and I, you know, rightfully so I didn't, I probably could do this better today than I did it four years ago when I was first having this conversation was, "Well, I'm gonna measure you individually and then I'm gonna measure you together. I'm giving you opposite goals, but you have to achieve them with each other." And that's also where we had an internal platform team emerge to like bind the two of those together.
Eiso: And I think you said something important like the platform team, right? So right now we have a lot of flavors of this. I was on a call earlier today and I heard for the first time the engineering enablement team platform-
Yep.
... you know, we have developer experience-
Jason: Engineering ops.
Which yeah,
Eiso: it's like-
It's, they're all the s- I mean they're all the same and not always all the same because platform still sometimes is just the name we give to the SRE side of the house. But like I agree there, there has to be a dedicated team once, and I'm pretty opinionated about this. I think once you know you're gonna be 100 engineers, that's when you need to build it. So if you're 50 and you're expecting to be 100 soon, build it. If you're, you know, like 75 and you're explaining to, you know, like slowly hire and you're interested, build it.
Jason: I like that. I, I hadn't thought about it in terms of a number like that, but I'd always said something similar, which is, you know, infrastructure teams emerge outta one person doing all the grunt infrastructure work for a 10 person organization. That's the very first time you ever said the import infrastructure. Same thing with platform, which is you've got an infrastructure team now, but that's not cutting it anymore, so you're gonna make it platform team.
Eiso: Yeah, and it's, I think I spent a lot of time with end jobs teams and they're first of all always incredible, because it's the people who truly are like, you know, like their, their full roles. Like how do we make everything better for all the other engineers in the org, right? They understand that that's like their number one goal, and I'm always blown away by the people who are gravitated to it as well because it's often like the earlier people at the company who are now like looking to have an impact but the number one challenge that every single one of them have is communication, is how do we get the rest of the org to align what we're saying needs to be done? Because they're understanding and they're often figuring out like, "Hey, if we did it in this way, everything would move faster."
And now they need to actually get everybody to move. And this is again where like the relationship between the engineering leaders and the, the eng jobs team needs to be very different than it is in many places. It shouldn't be the team that just sits on the side and like only speaks to the CTO it needs to be the team that is there to like, you know, sit in the, in the leadership conversations to come up with all the other engineering leaders and say like, "Hey, this matters. This is why we need to go after it. This is the, like the data that shows that we should be tackling it. Let's go run an experiment, let's go improve." And this is where I see a lot of them struggle. Like I've seen incredible eng jobs teams that, you know, end up just building tooling instead of changing process because they don't feel that they can communicate and get things out.
Jason: So I'm gonna agree with you heavily on this one, which is, I, I think it's the same challenge we see in other organizations but looks different. So good example here is that classic quality, quality organizations tend to be clicks and, you know, running of, of scripts and stuff like that. But when you build, you know, a quality group that's embedded inside other organizations and their engineers as well, you get a different outcome. And so in this case here, what classically happens for platform and jobs or whatever groups is that it emerges out of infrastructure, because what happens is you have infrastructure, you're gonna build it up. Their concerns look a little bit different than the application side of the house.
You want an application people on this team too. Application people tend to like change, start changing process a little bit more. It's the classic diversity of problem and the team. You have to have people from different groups who have traditionally different concerns or thought, ways of thinking about it. And again, it's not a universal, but it, it can be as simple as saying, "All right, you two created this team, you're very senior engineers, you have a passion for this. We need to grab two similar to you from the application engineering side of the world to go fix the whole thing and own it end to end."
Eiso: I think you say something really important there because, and it's kinda you said infrastructure while yes, is massively incentivized to move CICD as fast as possible, it doesn't think about usually, I mean, what happens before the CI part, right? And a lot of that you and I spoke about today is even before CI runs.
Yeah.
Right? so I know we have a couple of minutes here, some practical advice. So let's just, you do one, I do one and we go back and forth and throw some practical tips. I know we don't usually do this, but you brought it up before the episode and I think it makes sense.
Jason: I practically speaking, I think you just, you gotta find a way to measure your steps in the process. See what cycle times are in each one of the steps. I mean it's the first thing is just understanding what they currently look like. You have intuition, just understand it more in depth and see, see what's happening there. So find some system that allows you to measure that.
Eiso: I mean, you know, I couldn't agree more. It's probably the first time I actually plug Athenian our podcast. But you, Jason and I have been speaking about years for this, which led to building a company and, and building a product, uh, that, where I'm the founder that definitely something you're interested in. Come have a look. Promise, I won't make it too salesy.
But yeah, so after measuring it, it's to me it, it becomes goal setting, pure goal setting, like have goal setting become part of your organization. So outside of the, the measuring and goal setting, let, let's throw some things of like, you know, what are some of the, the quick wins that we see? What are some of the quick wins that you see, Jason?
Jason: So quick wins could be in a fully async organization, literally put time on the calendar for everybody to start reviewing stuff. And it, it could be as simple as one, invest in some automated review processy type of things for PRs of certain types or whatever, and you could flag them. And the other is, hey, every day we're gonna spend the last hour of a day reviewing everybody's code as much as possible to get through this. And in smaller organizations, that's a massive win that can happen.
Eiso: I couldn't agree more. I would say the next one is definitely, uh, your CI system, understanding because it's, while it looks, you know, even a 30 minute looks innocent, that 30 minute is getting replicated multiple times a day-
Yeah.
... for all of your engineers. I'll give you an example, um, if I'm looking at the data right now here, you know, we're pretty small org at our company and, and we run our, our CI suite run 6,000 times last month, right? And in some companies, you know, at reasonable size, that's hundreds of thousands or millions of times. So look into that, understand the biggest bottleneck. Don't go crazy and try to understand every single test. Just know like what's the two or three biggest bottlenecks that we have and let's just tackle them right now.
Jason: Yeah. And funny enough, you're gonna probably cost savings you can have in that too.
Yes.
... all of a sudden. And that's a-
CI isn't cheap. [laughs]
CI's not cheap. I would say another one would be look at your release to production, understand exactly what's happening there. Intimately understand the process. Intimately understand how it works, where the dangers are. I would also say don't over complicate it because this is where database changes come in and organizations at a certain size like really can get messed up. But in the early days, it's not that difficult for you to even do database changes at the release to production process. Don't over complicate this, understand what like happy path is, how to speed up happy path to production. Take the exceptionals as what they are, exceptionals, and then work your way through that.
Eiso: Let me throw one more out to top us here today. I would definitely say look at, and you s- you mentioned it a couple of times, your pull request size.
Yeah.
It's the, we have this chart in the product where it shows correlation between cycle time and pull request size. And for every single organization you just see that like triangle sloping down-
Yeah.
... where it's like, you know, 500-plus lines. There's two or three times as slow as between 105, 100 lines, et cetera.
Jason: I'll leave the podcast with this. We, when I first got to GitHub, I was working with one of our largest, large customers, and they actually built a system on top of GitHub to automatically not reject, but to suggest that any pull request over 350 lines get broken down. And the reason why was it was astronomically different. So I think it was anything under 350 lines, maybe it was 400 lines of code got accepted 85% of the time within two hours or three hours of back and forth, and then like maybe an update or two. Anything over 400 lines got rejected and never merged, and like, I mean, never merged. So if you think about the dichotomy of that, you know, that's, that's a pretty large savings if you can understand what it means to do smaller PRs.
Eiso: I couldn't agree more. Jason, this was a fun one. I think this is a topic we're definitely gonna have to get back to because there's a lot more to be said.
Jason: This is gonna be a fun one to keep going into.