We asked engineering leaders for their perspectives on what works when it comes to metrics.
1. Why do you use metrics?
Cody Lee (Engineering Manager, Splice): To continuously improve. Every metric we track is in service of improvement. We track business metrics to determine whether we’re growing a healthy business that provides artists with the ecosystem necessary to improve their craft. We track process metrics to create a development environment that enables progress. We track system health metrics to support our existing user base to ensure we are providing them with an ideal user experience in the midst of change. If a metric does not enable improvement, then by default I don’t want to spend time on it.
Leslie Cohn-Wein (Engineering Manager, Netlify): As a new engineering manager at Netlify, I’ve found that metrics are valuable in helping zero in on what might be getting in the team’s way. Instead of treating them as a way to judge performance, I believe metrics are most useful in bringing to light areas of opportunity. Are the team’s trend lines relatively consistent over time? Can significant fluctuations be explained through human factors, like on-call, vacation, or helping with code review for another team? If not, perhaps there is a blocker due to process, lack of alignment, or misunderstanding that needs deeper attention.
Abi Noda (Developer Experience Expert): First and foremost, I use metrics to help me identify what the biggest bottlenecks are across different teams. Without metrics, I’d be left to make this judgment based on hearsay instead of methodology. The second reason I use metrics is to understand changes over time, particularly as we undergo organizational changes or make investments in specific areas. Without metrics, there would be no way of understanding these changes or confirming that our efforts have led to actual improvement. The final reason I use metrics is they provide a way to communicate issues, opportunities, and successes with the rest of my organization. Without concrete data, it’s difficult to sound credible or convincing.
Jimmy McGill (VP of Engineering, Code Climate): Metrics make it possible for me to be a more informed engineering leader. They allow me to gain a more complete picture of what is happening in our organization, surfacing things I should be aware of, but might not otherwise be seeing or hearing about. Metrics make it possible to draw more accurate conclusions, by providing an objective balance to my own biases. I also use metrics to help create awareness on the engineering team, which is often the most effective way to drive improvement. For example, helping the team see the impact that faster PR reviews have on shipping can motivate team members to keep up their review speed.
2. Give an example of a specific metric that has worked well for your team – and one that hasn’t.
CL: Simple metrics have given me the most success. The team acknowledges that point estimates never satisfy individual contributors, but as a manager, I can create probability distributions using velocity that effectively convey development uncertainty to stakeholders. Though the engineers don’t value velocity, I can use it to convey information effectively. On the other hand, I’ve tried quantifying mental health and don’t think it’s helpful. Tracking in one-on-ones hasn’t yielded more information than what comes during retrospectives or in conversation. I don’t think a happiness number can lead to a conclusion and I need to look elsewhere to gain insight.
LC: On a remote team that spans continents, it’s been useful to track cycle time for pull requests. When this number starts rising, it’s a good signal to check in: are we committing to too much new work each sprint, so code reviews get pushed off? Are disparate team time zones creating a blocker to shipping? A metric that has been less helpful is throughput by type, which was meant to gauge balance between features, bugs, and chores. Caveats abound: feature PRs are often bigger than bugfixes, some PRs cross multiple type boundaries, and labeling the type leaves room for human error.
AN: One metric that worked for me in my previous company was NSAT. NSAT stands for “net satisfaction” and is similar to NPS. NSAT is a survey instrument created by Microsoft to help measure user satisfaction, but it can be used to measure satisfaction with just about anything. It uses a four-point Likert-type scale and uses the “net” score calculated by deducing the percentage of people who responded as “Dissatisfied” from those who responded with “Very satisfied”. We used NSAT scores to measure the success of major initiatives focused on improving internal developer tooling and productivity. We captured NSAT scores by regularly sending surveys to developers across the company using Google Forms. One metric that did not work was pull request throughput. Although this is a metric some companies pay attention to, we could draw no meaningful conclusions from the data.
Furthermore, developers at our company quickly pointed out that pull request throughput was misleading without factoring in the size of the changes. Also, different teams have very different code review and release processes, which make it impossible to normalize what “good” pull request throughput looks like across the organization.
I find the popularity of metrics like throughput and lead time to be fascinating. Anyone who’s been in the trenches of building software knows that shipping great software is not like a factory assembly line. Yet today, engineering organizations try to measure themselves primarily using manufacturing metrics from the mid 20th century. Lean metrics have their place, but they rarely help make your developers happier or more productive.
JM: Cycle Time is a metric that I consistently come back to — it’s a great proxy for engineering speed, and can be a useful high-level look at whether certain key decisions are having the desired impact. If we’re moving slower than we had been, I can then isolate parts of the engineering pipeline and investigate where exactly things are going off track. As for metrics that don’t work, I find that anything too specific can be misleading. If you’re looking at something as granular as lines of code, or the number of deployments in a day, it can be hard to see the forest from the trees.
3. How do you analyze the data collected by your metrics?
CL: For process metrics like throughput, velocity, cycle time, I watch the trend line and analyze it with other engineering managers. We do qualitative analysis against our release cycle to determine whether there are any obvious causes, then look at quantitative analysis and propose internal experiments based on that. For example, implementing a pull request review reminder correlated with a reduction in review cycle time. For those quantitative analyses, we’ll usually only go as far as comparing changes to past sprints to see if it comes within expectations, usually based on standard deviation.
LC: Engineering metrics cannot stand alone; they must go hand-in-hand with context. Before I review our team dashboard each Friday, I make note of what was ‘special’ about the week: was anyone on-call? Attending a conference? Engaged in interviews and hiring? Answering these questions first helps contextualize fluctuations in the metrics while highlighting unexplained changes that might be indicative of a problem. Team metrics are transparent across the company, but never punitive. Our former VP of Engineering, Dalia Havens, made it a point to include organization-wide metrics in our internal ‘engineering week in review’ newsletter to bring attention to team health.
AN: For survey metrics, we mainly used spreadsheets to quickly process results and generate reports that could be published and shared. Capturing telemetry metrics like lead time and pull request flow was a more difficult process. We ingested data using a data warehouse. We then used our existing BI tool for creating reports for various teams and organizations.
JM: When I analyze data, I’m looking for trends relative to our baseline, not looking at absolute values. Everything needs to be filtered through the context that I’ve developed as a manager, including what I know about the people on my team and how they work, as well as the problems that they’re solving at the moment. That context is critical, and I don’t draw any conclusions from metrics without it.
4. Which metrics do you use to communicate the impact of your teams to the wider business?
CL: Our team watches a set of business metrics that were selected to represent the health of the business. When we run experiments, a data scientist does a statistical analysis to generate information on what would represent a statistically significant result. Those results are then shared at a regular cadence with the wider business group. Having a strong connection between the data team and product managers results in very easy-to-follow metrics for validating impact. The difficulty then becomes ensuring we maintain the necessary analytics events to connect experiments to results.
LC: As a growth team, we’re typically more focused on measuring and communicating our impact through user metrics, not engineering metrics. Growth has taught me to celebrate learning above all. A failed experiment can be as valuable as a successful one as long as it’s provided us with data to better inform future decisions. I like to apply this same thinking to engineering metrics: they can and should fluctuate as teams learn, change, evolve, and grow! That means the numbers themselves don’t reflect our team’s impact; the learning and improvements we’ve made because of them do.
AN: At my previous company, we used NSAT scores as the success metrics for our initiatives to improve developer productivity. These metrics were utilized as part of our company’s quarterly OKR process, and served as the “north star” for identifying ways to improve tooling and processes for development teams. Some examples of areas we took action to improve are release infrastructure, development environment, and testing. On specific teams, we looked at primarily tracked progress against OKRs and epics to understand the health of overall projects and initiatives. These data points were reviewed in biweekly or monthly cycles at the team, department, and organization level.
JM: There aren’t any one-size-fits-all metrics for communicating engineering impact — it all depends on the particular goals of your organization. At a basic level, I use Cycle Time to demonstrate that we’re working effectively; infrastructure availability and performance metrics to show that the software we’re shipping is operating at a high level; and usage metrics to understand whether our work is having an impact on the bottom line of the business or the happiness of our users. Again, any time I’m communicating with metrics, I make sure to provide any necessary context so appropriate conclusions can be drawn.