Engineering metrics are both widely useful and widely misused in an engineering manager's toolkit. Used well, metrics give you visibility into how your team is performing, help you identify problems before they become crises, and provide the evidence you need to advocate for resources, push back on unrealistic demands, and demonstrate impact to leadership. Used badly, metrics destroy trust, incentivise gaming, and create a culture of surveillance that drives your best engineers away.
The difference between good and bad metric usage comes down to intent and design. Good metrics measure team-level outcomes, track trends over time rather than snapshots, and are used to inform decisions rather than to judge individuals. Bad metrics measure individual activity, are treated as targets rather than signals, and create perverse incentives that optimise for the metric rather than for the outcome the metric was supposed to represent.
This guide covers the metrics that matter for engineering managers, organised into four categories: DORA metrics for delivery performance, delivery metrics for workflow efficiency, quality metrics for reliability and correctness, and team health metrics for sustainability. It also covers the anti-patterns that turn metrics into weapons and provides practical guidance on building a metrics dashboard that serves your team rather than surveilling it. For frameworks that help you put metrics into action, see our engineering management frameworks guide.
Why Metrics Matter
Using Data to Lead
Without metrics, your understanding of team performance is based on gut feel, anecdotes, and the loudest voices in the room. This is unreliable and biased. You tend to overweight recent events, the problems you see directly, and the feedback of the people who talk to you most. Metrics provide a corrective: they show you patterns that are invisible in day-to-day experience, reveal trends that develop gradually over weeks, and give you an objective basis for decisions that might otherwise be influenced by recency bias or personal relationships.
Leading with data is not about replacing judgement with numbers. It is about combining your qualitative understanding - from 1:1s, observation, and experience - with quantitative signals that either confirm or challenge your assumptions. The best engineering managers use metrics as a starting point for inquiry, not as a substitute for it. A spike in cycle time does not tell you what is wrong; it tells you something has changed, and you need to investigate. That investigation - talking to your team, reviewing recent work, understanding the context - is where the real insight comes from.
Communicating Metrics to Stakeholders
Metrics are also your primary tool for communicating with leadership and stakeholders. When your VP asks "how is the team doing?" a response grounded in data is far more credible than one grounded in feeling. "Our deployment frequency has increased from twice a week to daily over the last quarter, and our change failure rate has dropped from 15% to 4%" tells a clear picture of improvement. It also gives you evidence when you need to push back: "Our cycle time has increased by 40% over the last two sprints because of the unplanned compliance work we absorbed - we need to either deprioritise something or accept that the Q2 roadmap will slip." Communicating clearly with data is a core skill for engineering manager responsibilities.
DORA Metrics
Deployment Frequency
Deployment frequency measures how often your team deploys code to production. It is a proxy for batch size and release confidence: teams that deploy frequently are working in small increments, which reduces risk and enables faster feedback loops. Elite performers deploy multiple times per day. High performers deploy between once per day and once per week. Low performers deploy between once per month and once every six months.
To improve deployment frequency, focus on reducing the friction in your deployment pipeline. Automate testing, automate deployments, implement feature flags so that code can be deployed without being released, and break large features into smaller, independently deployable increments. If your team deploys less than once a week, start by understanding why. Is it a tooling problem, a testing problem, a confidence problem, or a process problem?
Lead Time for Changes
Lead time for changes measures the time from code commit to code running in production. This captures your entire delivery pipeline: code review, CI/CD, staging validation, and deployment. Elite performers have a lead time of less than one hour. High performers are between one day and one week. Low performers take between one month and six months.
Long lead times indicate bottlenecks in your pipeline. Common culprits include slow code review turnaround (the most common cause), flaky or slow test suites, manual QA gates, infrequent deployment windows, and complex approval processes. Measure where time is being spent and address the biggest bottleneck first.
Change Failure Rate
Change failure rate is the percentage of deployments that result in a degraded service and require remediation - a rollback, a hotfix, or a patch. Elite performers have a change failure rate of 0-15%. High performers are between 16-30%. Above 46% is considered low performance.
A high change failure rate suggests gaps in testing, code review quality, or deployment safety mechanisms. Invest in automated testing (unit, integration, and end-to-end), improve code review standards, implement canary deployments or progressive rollouts, and ensure your monitoring can detect problems quickly after deployment. Interestingly, change failure rate does not correlate with deployment frequency - teams that deploy more often do not necessarily fail more often, because smaller changes are easier to validate and easier to roll back.
Time to Restore Service
Time to restore service (also called mean time to recovery or MTTR) measures how quickly your team can recover from a production incident. Elite performers restore service in less than one hour. High performers take less than one day. Low performers take between one week and one month.
Improving time to restore requires investment in observability (monitoring, alerting, and logging), incident response processes (clear runbooks, defined on-call rotations, practised incident management), and deployment safety (the ability to roll back quickly). Run regular incident reviews to identify patterns and invest in the reliability improvements that reduce both incident frequency and recovery time.
Delivery Metrics
Cycle Time
Cycle time measures the elapsed time from when work starts to when it is completed and delivered. Unlike lead time for changes (which focuses on the deployment pipeline), cycle time captures the full lifecycle of a work item, including design, implementation, review, and deployment. It is one of the most useful metrics for identifying process bottlenecks because you can decompose it into stages and see where time is being lost.
A healthy cycle time for a typical engineering team is two to five days for a standard feature ticket. If your average cycle time is longer than two weeks, investigate: are tickets too large, are code reviews too slow, is the team blocked on external dependencies, or is there too much work in progress? Use cycle time trends rather than absolute numbers - a consistent two-week cycle time may be fine for your context, but a cycle time that is increasing sprint over sprint signals a systemic problem.
Throughput
Throughput is the number of work items completed per unit of time (typically per sprint or per week). It is a straightforward measure of how much work the team is finishing. Throughput is most useful when combined with cycle time: a team with high throughput but increasing cycle time is likely starting more work than it is finishing, which eventually leads to a bottleneck.
Work-in-Progress Limits
Work-in-progress (WIP) is not a metric you track so much as a constraint you impose to improve other metrics. Limiting WIP - the number of items actively being worked on simultaneously - reduces context-switching, improves cycle time, and makes bottlenecks visible. If your team of six engineers has 15 items in progress simultaneously, everyone is context-switching constantly and nothing is finishing quickly. Start by setting a WIP limit equal to the number of engineers on the team (one item per person) and adjust from there.
Sprint Velocity and Its Limitations
Sprint velocity (story points completed per sprint) is the most widely used and most widely misused delivery metric in software engineering. Velocity can be useful as a team-level planning tool: if you consistently complete 30-35 story points per sprint, that gives you a reasonable basis for planning the next sprint. But velocity is deeply flawed as a performance metric. Story points are subjective estimates, not objective measures. They vary wildly between teams, making cross-team comparisons meaningless. And when velocity becomes a target, teams inflate estimates to hit the number rather than improving actual delivery.
If you use velocity, use it exclusively as a planning tool and never as a performance measure. Do not compare velocity between teams, do not set velocity targets, and do not report velocity to stakeholders as evidence of productivity. Throughput and cycle time are more reliable indicators of delivery performance.
Quality Metrics
Bug Escape Rate
Bug escape rate measures the number of bugs found in production versus bugs caught before production (in code review, testing, or staging). A high escape rate suggests gaps in your quality gates. Track this over time and look for patterns: are escaped bugs concentrated in a particular area of the codebase, a particular type of change, or work done under particular conditions (rushed deadlines, for example)?
Code Coverage and Why It Is Misleading
Code coverage - the percentage of code exercised by automated tests - is one of the most dangerous metrics because it creates a false sense of security. A codebase with 90% coverage can still have critical bugs if the tests are superficial (testing that functions run without asserting that they produce correct results). Conversely, a codebase with 60% coverage that focuses testing on the most complex and critical paths may be far more reliable.
If you track coverage at all, use it as a floor rather than a target. A coverage floor of 70-80% ensures that new code is generally tested, but do not incentivise teams to chase 100%. The marginal effort required to go from 80% to 100% coverage is enormous and is almost always better spent on other quality investments: better integration tests, improved monitoring, or more thorough code reviews.
Incident Frequency and MTTR
Track both the frequency of production incidents and the mean time to resolution (MTTR). These metrics together tell you about the reliability of your systems and the effectiveness of your incident response. A team with frequent incidents but fast resolution has a different problem (reliability) than a team with rare incidents but slow resolution (incident response maturity). Use incident reviews to drive systematic improvement in both dimensions. For tools to help calculate the ROI of reliability investments, see our ROI calculator.
Team Health Metrics
Engagement Surveys
Regular engagement surveys - whether organisation-wide or team-specific - provide structured insight into how your team is feeling. The key is acting on the results. If surveys consistently surface the same concerns and nothing changes, people stop responding honestly. Share the results with your team, identify the top two or three themes, create a concrete action plan, and report back on progress. Quarterly pulse surveys of five to ten questions are more useful than annual comprehensive surveys because they surface trends early enough to act on them.
Retention Rates
Attrition is a lagging indicator - by the time someone leaves, the problems that drove them away have been festering for months. Track voluntary attrition as a percentage of team size over rolling twelve-month periods. If your team's attrition is significantly above the industry average (roughly 10-15% for software engineering), investigate why. Exit interviews provide some data but are unreliable because people leaving tend to be diplomatic. Stay interviews - asking current team members what keeps them here and what might cause them to leave - are more valuable for prevention.
Developer Experience Scores
Developer experience (DevEx) scores measure how easy or difficult it is for your engineers to do their daily work. Questions cover tooling quality, build times, deployment friction, documentation completeness, and cognitive load. Poor developer experience is a leading indicator of both delivery slowdowns and attrition: engineers who are constantly fighting their tools, waiting for builds, or navigating unclear processes burn out faster and deliver less. Track DevEx quarterly and invest in the improvements your team identifies as highest impact.
Anti-Patterns
Metrics That Destroy Teams
Certain metrics are actively harmful when used to evaluate engineering performance. Lines of code measures volume, not value - the best engineering work often involves deleting code. Individual velocity (story points per engineer) incentivises working alone rather than collaborating and inflating estimates rather than estimating honestly. Hours worked measures presence, not productivity, and penalises efficient engineers while rewarding those who are slow or unfocused. Number of commits or pull requests incentivises splitting work into artificially small pieces rather than into logically coherent units.
If any of these metrics appear on a dashboard visible to your team, remove them. Their mere presence - even if you say you are not using them to evaluate people - creates anxiety and distorts behaviour.
Goodhart's Law
Goodhart's Law states: "When a measure becomes a target, it ceases to be a good measure." This is the fundamental challenge of engineering metrics. As soon as people know they are being measured on deployment frequency, they will deploy more often - even if that means deploying trivial changes to game the number. As soon as cycle time becomes a target, tickets will be scoped smaller and smaller until the metric looks great but the team is not delivering meaningful work.
The defence against Goodhart's Law is to use metrics as diagnostic tools rather than targets. Track them, discuss them, use them to identify problems and opportunities - but never attach rewards or punishments to hitting specific metric values. When you notice a metric improving, ask "is the underlying reality actually better, or have we just learned to optimise for the metric?" If the answer is the latter, you need different metrics.
Gaming Metrics
Every metric can be gamed, and intelligent engineers will find ways to game them - sometimes consciously, sometimes unconsciously. The solution is not to find un-gameable metrics (they do not exist) but to create a culture where gaming is unnecessary. If your team trusts that metrics are used for learning and improvement rather than for judgement and punishment, the incentive to game disappears. Be explicit about this: "We track these metrics to understand how our systems and processes are performing, not to evaluate individuals. If a metric looks bad, that is a signal that something in our environment needs attention, not that someone is doing a poor job."
Building a Metrics Dashboard
Choosing 3-5 Key Metrics
Start by selecting three to five metrics that cover the dimensions most relevant to your team's current challenges. A balanced set might include: deployment frequency (delivery speed), cycle time (workflow efficiency), change failure rate (quality), and a team health score (sustainability). You do not need to track everything from day one. Start with the metrics you can measure reliably with your existing tools and add more as your measurement infrastructure matures. Our tools collection includes calculators that can help you get started.
Setting Baselines
Before you can improve, you need to know where you are. Spend two to four weeks collecting baseline data for your chosen metrics. Do not try to improve anything during this period - just observe and record. The baseline gives you a reference point for measuring progress and helps you set realistic improvement goals. Without a baseline, you have no way of knowing whether changes you make are actually helping.
Using Trends Not Snapshots
A single data point tells you almost nothing. Metrics are only useful when viewed as trends over time. A cycle time of eight days in a single sprint might be fine (perhaps the team was working on unusually complex features) or it might be a problem (perhaps code reviews are slowing down). You cannot tell from one data point. But if cycle time has increased steadily from four days to eight days over six sprints, that is a clear trend that demands investigation.
Display metrics as time-series charts rather than single numbers. Add annotations for significant events (new team member joined, major release shipped, process change introduced) so you can correlate metric changes with environmental changes. This transforms your dashboard from a scorecard into a diagnostic tool.
Presenting to Leadership
When presenting metrics to leadership, narrate the context rather than just showing a dashboard. Start with the outcome ("the team shipped three major features this quarter"), then use metrics to provide evidence and context ("our deployment frequency increased from weekly to daily, which allowed us to iterate faster on customer feedback"), and finish with what you are investing in next ("we are focusing on reducing our change failure rate, which will improve reliability and reduce the time we spend on incident response"). Leadership cares about outcomes and trajectory, not raw numbers. Your job is to translate engineering metrics into a narrative that connects to business value. For more on this skill, see our engineering manager responsibilities guide.
Related Guides
Frequently Asked Questions
- What are the most important engineering metrics?
- The most important engineering metrics depend on your team's context and current challenges, but a strong starting point for most teams includes four to five metrics that cover different dimensions of engineering health. DORA metrics (deployment frequency, lead time for changes, change failure rate, and time to restore service) provide a well-researched, industry-standard view of delivery performance. Cycle time measures how quickly work moves from start to finish, revealing process bottlenecks. A team health metric - whether from engagement surveys, developer experience scores, or simple 1:1 sentiment tracking - ensures you are not optimising delivery at the expense of sustainability. Avoid picking more than five metrics. When you measure everything, you focus on nothing. Choose metrics that tell a coherent story about your team's ability to deliver value reliably, sustainably, and with quality.
- What are DORA metrics and why do they matter?
- DORA metrics are four key metrics identified by the DevOps Research and Assessment (DORA) team at Google through years of rigorous research across thousands of engineering organisations. The four metrics are: deployment frequency (how often code is deployed to production), lead time for changes (time from code commit to production deployment), change failure rate (percentage of deployments that cause failures requiring rollback or hotfix), and time to restore service (how long it takes to recover from a production incident). These metrics matter because the research conclusively shows they are predictive of both engineering performance and business outcomes. Organisations that score well on all four DORA metrics ship faster, with higher quality, and their teams report higher satisfaction. They also strongly correlate with commercial outcomes like revenue growth and market share. Crucially, they measure system-level performance rather than individual output, which makes them safe to use without creating perverse incentives.
- How do I measure developer productivity without micromanaging?
- The key is to measure team-level outcomes rather than individual activity. Metrics like cycle time, deployment frequency, and sprint throughput tell you how well the team is performing as a system without singling out individuals. Avoid metrics that track individual output: lines of code, number of commits, story points completed per person, or hours logged. These metrics are trivially gameable, do not correlate with actual value delivered, and send a clear signal that you do not trust your team. Instead, use 1:1 conversations to understand individual performance qualitatively. Ask about blockers, challenges, and what support people need. Review code review participation and quality, pull request descriptions, and contributions to team discussions - not as quantitative metrics but as qualitative signals of engagement and impact. The best engineering managers create an environment where productivity happens naturally through clear priorities, minimal interruptions, and well-designed processes, rather than one where productivity is surveilled and measured at the individual level.
- Should I track individual engineer performance metrics?
- No, you should not track quantitative performance metrics at the individual level. Research consistently shows that individual productivity metrics (lines of code, story points, commit frequency, pull request count) are unreliable proxies for actual contribution and create harmful incentive structures. An engineer who writes 50 lines of carefully considered code that solves a complex problem delivers more value than one who writes 500 lines of hasty, unmaintainable code - but the metrics say the opposite. Individual metrics also encourage competition rather than collaboration, discourage activities that are hard to measure but critical (mentoring, code reviews, architecture discussions, documentation), and create anxiety that reduces rather than improves performance. Instead, assess individual performance through qualitative observation: the quality of their technical decisions, their impact on team outcomes, their growth over time, and feedback from peers. Use your career framework and regular 1:1 conversations to track and discuss individual development, not dashboards.
- How often should I review engineering metrics?
- Review your core metrics weekly at a glance and monthly in depth. A weekly check takes five minutes and involves scanning your dashboard for anomalies: has cycle time spiked, has deployment frequency dropped, has the change failure rate increased? These are early warning signs that something has changed in your team's environment and needs attention. Monthly, spend 30 to 60 minutes analysing trends. Look at how metrics have moved over the past four to six weeks and ask why. Has a specific type of work consistently taken longer than expected? Has a process change improved or worsened delivery flow? Use these monthly reviews to decide whether your current priorities are working or whether you need to adjust. Quarterly, do a deeper review that you share with your team and your stakeholders. This is where you connect metrics to outcomes: what did the team deliver, how did the metrics trend, and what does that tell us about where to invest next? Avoid daily metric reviews unless you are actively investigating a specific problem - they create noise and encourage reactive management rather than thoughtful analysis.
Engineering Management Frameworks
Put your metrics into context with proven management frameworks. Learn how to use OKRs, DORA, RACI, and other models to structure your leadership and communicate effectively with stakeholders.
Learn More