How do we define a 'failed' deployment?

A failed deployment is one that results in degraded service requiring remediation. This includes deployments that trigger incidents, require rollbacks, necessitate hotfixes, or cause measurable user impact such as increased error rates. Define clear criteria with your team and apply them consistently.

Is a 0% change failure rate realistic or desirable?

A sustained 0% CFR is neither realistic nor necessarily desirable. It may indicate that your team is being overly cautious and not deploying frequently enough. A small number of failures, handled quickly, is a natural part of software delivery. Aim for consistently low CFR (under 15%) rather than zero.

How does change failure rate relate to deployment frequency?

Counter-intuitively, teams with higher deployment frequency tend to have lower change failure rates. Frequent deployments involve smaller changes that are easier to test and less risky. This creates a virtuous cycle where deploying more often actually improves reliability.

Change Failure Rate: Definition, Formula & Benchmarks

Every failed deployment erodes stakeholder trust and costs your team hours of unplanned remediation. Change failure rate puts a number on how often that happens - and, more importantly, reveals whether your testing, review, and deployment practices are actually catching problems before users do.

What Is Change Failure Rate?

Change failure rate (CFR) is calculated as the number of failed deployments divided by the total number of deployments, expressed as a percentage. A 'failed' deployment is one that results in degraded service requiring remediation such as a rollback, hotfix, emergency patch, or incident response. This metric directly reflects the quality of your release process.

CFR is one of the two DORA stability metrics, alongside mean time to recovery (MTTR). Together, they answer two fundamental questions: how often do things go wrong, and how quickly can we fix them? A team with low CFR and low MTTR has a mature, reliable delivery process that stakeholders can trust.

What makes CFR particularly valuable is that it connects engineering practices directly to user impact. Every failed deployment represents a period of degraded service for your users. By tracking and reducing CFR, you are directly improving the reliability of your product and the experience of your customers.

How to Measure Change Failure Rate

Measuring CFR requires two data points: total deployments and failed deployments. Total deployments should come from your CI/CD pipeline. Failed deployments require correlating deployment events with incidents, rollbacks, or hotfixes in your incident management system.

The trickiest part of measuring CFR is defining what constitutes a failure. Establish clear criteria with your team. Typically, a deployment failure includes any deployment that triggers an incident, requires a rollback, necessitates a hotfix within a defined window (e.g., 24 hours), or causes measurable user impact such as increased error rates or degraded performance.

Define clear, consistent criteria for what constitutes a deployment failure
Correlate deployment events from CI/CD with incidents from your incident management tool
Include rollbacks, hotfixes, and emergency patches in your failure count
Calculate CFR monthly for trend analysis, but track incidents in real time
Exclude planned maintenance and expected disruptions from failure counts

Change Failure Rate Benchmarks

According to DORA research, elite and high performers maintain change failure rates between 0% and 15%. Medium performers typically see rates between 16% and 30%, whilst low performers may experience failure rates of 46% or higher. These benchmarks reveal a dramatic quality gap between performance tiers.

A CFR above 30% indicates systemic issues in your testing, review, and deployment processes. At this level, nearly one in three deployments causes problems, which erodes team confidence and stakeholder trust. If your team is in this range, prioritise quality improvement over speed improvement.

It is encouraging to note that CFR tends to improve alongside deployment frequency. Teams that deploy more frequently deploy smaller changes, which are inherently less risky. This means that investing in deployment frequency can simultaneously improve your change failure rate, creating a virtuous cycle of improvement.

Strategies to Reduce Change Failure Rate

Comprehensive automated testing is the single most effective way to reduce CFR. Invest in unit tests, integration tests, and end-to-end tests that cover your critical paths. Ensure tests run automatically in your CI/CD pipeline and block deployments when tests fail. Test coverage alone is not sufficient; test quality and relevance matter more than raw coverage numbers.

Progressive deployment strategies such as canary releases, blue-green deployments, and feature flags allow you to limit the blast radius of any single deployment. By rolling out changes to a small percentage of users first and monitoring for issues, you can catch problems before they affect your entire user base.

Invest in comprehensive, high-quality automated tests across all testing levels
Implement progressive deployment strategies (canary, blue-green) to limit blast radius
Conduct thorough code reviews with a focus on edge cases and failure modes
Run blameless post-mortems after every failed deployment to identify systemic improvements
Use feature flags to decouple deployment from feature activation

Building a Culture of Quality

Reducing CFR is not just a technical challenge; it requires a cultural shift. Teams must feel empowered to push back on rushing changes to production without adequate testing. Engineering managers play a crucial role in establishing quality norms and protecting their teams from pressure to ship untested code.

Blameless post-mortems are essential for learning from failures without creating fear. When deployment failures are met with curiosity rather than blame, teams are more willing to report issues quickly and invest time in understanding root causes. This transparency drives systemic improvements that reduce future failures.

Track CFR visibly and celebrate improvements. When your team reduces CFR from 25% to 15%, that is a significant achievement that deserves recognition. Make quality metrics as visible as delivery metrics to reinforce that both are valued equally in your organisation.

Key Takeaways

Change failure rate measures the percentage of deployments causing production issues requiring remediation
Elite performers maintain CFR below 15%, whilst low performers may exceed 46%
Automated testing and progressive deployment strategies are the most effective levers for reducing CFR
Higher deployment frequency often leads to lower CFR through smaller, less risky changes
Building a blameless culture is essential for learning from failures and driving systematic improvement

Frequently Asked Questions

How do we define a 'failed' deployment?: A failed deployment is one that results in degraded service requiring remediation. This includes deployments that trigger incidents, require rollbacks, necessitate hotfixes, or cause measurable user impact such as increased error rates. Define clear criteria with your team and apply them consistently.
Is a 0% change failure rate realistic or desirable?: A sustained 0% CFR is neither realistic nor necessarily desirable. It may indicate that your team is being overly cautious and not deploying frequently enough. A small number of failures, handled quickly, is a natural part of software delivery. Aim for consistently low CFR (under 15%) rather than zero.
How does change failure rate relate to deployment frequency?: Counter-intuitively, teams with higher deployment frequency tend to have lower change failure rates. Frequent deployments involve smaller changes that are easier to test and less risky. This creates a virtuous cycle where deploying more often actually improves reliability.

Run Better Post-Mortems

Our blameless post-mortem templates help you turn each deployment failure into a systemic fix - not just a hotfix.

Learn More

Change Failure Rate: Definition, Formula & Benchmarks

What Is Change Failure Rate?

How to Measure Change Failure Rate

Change Failure Rate Benchmarks

Strategies to Reduce Change Failure Rate

Building a Culture of Quality

Key Takeaways

Frequently Asked Questions

Run Better Post-Mortems

Related Articles

MTTR: How to Measure & Improve Recovery Time

Engineering Throughput: How to Measure It Responsibly

Engineering Velocity: How to Measure It Properly

Code Review Time: How to Measure & Set SLAs

Incident Rate: How to Measure & Reduce Outages

Bug Rate: How to Measure by Severity & Component