The Time to restore service indicated the time it takes to restore service following a failed deployment. This is essentially the time it takes your team to analyze, prepare and deploy a fix to production.
As part of the DORA metrics (link to the official page), this metric and the Change failure rate allows for measuring the stability of a development team and the quality of the delivered code.
You can display the change time for the last month, the last three months, or the last 6 months. You can also specify the environment to display, either production or staging.
Note that you must configure one (or more) webhook, incident source, as well as your deployment environments in order to synchronize the data needed to measure this data.
Reading the graph
For the following graph, the time period displayed is the last three months and overall we can see that the team has an average restoration time of 4 days and 5 hours for this period. According to the 2022 State of DevOps report, this restoration time is in line with the evaluation criteria for a medium-performance team.
This graph is very similar to the service level expectation (SLE) for items (process) or code reviews (technical), both in its format and in the data presented when you hover over it. For this example, we can see that the PAN-3630 bug was resolved 2 hours after it was reported (thus created in Jira for our example), with the resolution corresponding to the deployment made at 12:44 on February 8, 2023.
A high time to restore service indicates that the team is taking time to correct the reported problems. Generally, we aim to have a short recovery time, perhaps even similar or equivalent to the lead time for changes. A failed deployment results in an investment of time and effort to restore the situation. Therefore, a shorter recovery time allows the reallocation of these resources to value creation.
When correlated with other DORA metrics, it can reflect inefficient incident management processes. For example, a reluctance to change focus to resolve the incident, dependencies, or other elements preventing the team from deploying quickly to production.
Psst! Want to learn more about potential solutions to reduce your time to restore service? Check out our comprehensive Guide to Understanding DORA Metrics!
Calculating the metric
Axify identifies and measures the time to restore service as follows:
- We consider
- Any bug-type item created following a deployment
- Any item of a specific type created following a deployment, when a specific incident source has been added to Axify (only available for some users as of April 10, 2023)
- The creation date of the item represents the moment when the failure was detected, therefore when the incident began
- The deployment date (or when the bug was moved to "done" in the board) represents the time when the issue was resolved and the incident ended