Skip to main content
50 Notion Templates 47% Off
...

Change Failure Rate: Measuring and Reducing Deployment Failures

Master change failure rate as a DORA stability metric. Learn how to measure, benchmark, and reduce the percentage of deployments that cause production issues.

Last updated: 7 March 2026

Change failure rate measures the percentage of deployments to production that result in degraded service and require remediation. As one of the two DORA stability metrics, it provides critical insight into the quality and reliability of your release process.

What Is Change Failure Rate?

Change failure rate (CFR) is calculated as the number of failed deployments divided by the total number of deployments, expressed as a percentage. A 'failed' deployment is one that results in degraded service requiring remediation such as a rollback, hotfix, emergency patch, or incident response. This metric directly reflects the quality of your release process.

CFR is one of the two DORA stability metrics, alongside mean time to recovery (MTTR). Together, they answer two fundamental questions: how often do things go wrong, and how quickly can we fix them? A team with low CFR and low MTTR has a mature, reliable delivery process that stakeholders can trust.

What makes CFR particularly valuable is that it connects engineering practices directly to user impact. Every failed deployment represents a period of degraded service for your users. By tracking and reducing CFR, you are directly improving the reliability of your product and the experience of your customers.

How to Measure Change Failure Rate

Measuring CFR requires two data points: total deployments and failed deployments. Total deployments should come from your CI/CD pipeline. Failed deployments require correlating deployment events with incidents, rollbacks, or hotfixes in your incident management system.

The trickiest part of measuring CFR is defining what constitutes a failure. Establish clear criteria with your team. Typically, a deployment failure includes any deployment that triggers an incident, requires a rollback, necessitates a hotfix within a defined window (e.g., 24 hours), or causes measurable user impact such as increased error rates or degraded performance.

  • Define clear, consistent criteria for what constitutes a deployment failure
  • Correlate deployment events from CI/CD with incidents from your incident management tool
  • Include rollbacks, hotfixes, and emergency patches in your failure count
  • Calculate CFR monthly for trend analysis, but track incidents in real time
  • Exclude planned maintenance and expected disruptions from failure counts

Change Failure Rate Benchmarks

According to DORA research, elite and high performers maintain change failure rates between 0% and 15%. Medium performers typically see rates between 16% and 30%, whilst low performers may experience failure rates of 46% or higher. These benchmarks reveal a dramatic quality gap between performance tiers.

A CFR above 30% indicates systemic issues in your testing, review, and deployment processes. At this level, nearly one in three deployments causes problems, which erodes team confidence and stakeholder trust. If your team is in this range, prioritise quality improvement over speed improvement.

It is encouraging to note that CFR tends to improve alongside deployment frequency. Teams that deploy more frequently deploy smaller changes, which are inherently less risky. This means that investing in deployment frequency can simultaneously improve your change failure rate, creating a virtuous cycle of improvement.

Strategies to Reduce Change Failure Rate

Comprehensive automated testing is the single most effective way to reduce CFR. Invest in unit tests, integration tests, and end-to-end tests that cover your critical paths. Ensure tests run automatically in your CI/CD pipeline and block deployments when tests fail. Test coverage alone is not sufficient; test quality and relevance matter more than raw coverage numbers.

Progressive deployment strategies such as canary releases, blue-green deployments, and feature flags allow you to limit the blast radius of any single deployment. By rolling out changes to a small percentage of users first and monitoring for issues, you can catch problems before they affect your entire user base.

  • Invest in comprehensive, high-quality automated tests across all testing levels
  • Implement progressive deployment strategies (canary, blue-green) to limit blast radius
  • Conduct thorough code reviews with a focus on edge cases and failure modes
  • Run blameless post-mortems after every failed deployment to identify systemic improvements
  • Use feature flags to decouple deployment from feature activation

Building a Culture of Quality

Reducing CFR is not just a technical challenge; it requires a cultural shift. Teams must feel empowered to push back on rushing changes to production without adequate testing. Engineering managers play a crucial role in establishing quality norms and protecting their teams from pressure to ship untested code.

Blameless post-mortems are essential for learning from failures without creating fear. When deployment failures are met with curiosity rather than blame, teams are more willing to report issues quickly and invest time in understanding root causes. This transparency drives systemic improvements that reduce future failures.

Track CFR visibly and celebrate improvements. When your team reduces CFR from 25% to 15%, that is a significant achievement that deserves recognition. Make quality metrics as visible as delivery metrics to reinforce that both are valued equally in your organisation.

Key Takeaways

  • Change failure rate measures the percentage of deployments causing production issues requiring remediation
  • Elite performers maintain CFR below 15%, whilst low performers may exceed 46%
  • Automated testing and progressive deployment strategies are the most effective levers for reducing CFR
  • Higher deployment frequency often leads to lower CFR through smaller, less risky changes
  • Building a blameless culture is essential for learning from failures and driving systematic improvement

Frequently Asked Questions

How do we define a 'failed' deployment?
A failed deployment is one that results in degraded service requiring remediation. This includes deployments that trigger incidents, require rollbacks, necessitate hotfixes, or cause measurable user impact such as increased error rates. Define clear criteria with your team and apply them consistently.
Is a 0% change failure rate realistic or desirable?
A sustained 0% CFR is neither realistic nor necessarily desirable. It may indicate that your team is being overly cautious and not deploying frequently enough. A small number of failures, handled quickly, is a natural part of software delivery. Aim for consistently low CFR (under 15%) rather than zero.
How does change failure rate relate to deployment frequency?
Counter-intuitively, teams with higher deployment frequency tend to have lower change failure rates. Frequent deployments involve smaller changes that are easier to test and less risky. This creates a virtuous cycle where deploying more often actually improves reliability.

Download Post-Mortem Templates

Get our blameless post-mortem templates to systematically learn from deployment failures and reduce your change failure rate.

Learn More