It's Friday evening. You get an automated text message. "URGENT: Production outage detected! Error 503: Service Unavailable at endpoint /api/prd/payments. Immediate attention required!"
When you look at the screen and try to compose yourself, you get another "Boss... I was testing the latest patch on tst env. Can't connect to db.tst.payments. What should we do?"
A handy tool to help you handle the situation would be handy, right? Fear not, my fellow manager. There is the good old 8D report!
Eight Ds of crisis handling
An 8D report is a refined tool that originated in the automotive industry. It can take the form of a simple note or an ominous Excel form with dozens of required fields. For our use, we will adhere to the KISS principle. A simple text file containing the following eight paragraphs should be sufficient.
D1: Team formation
The focus is on assembling a cross-functional team tailored to tackle the problem. This step is crucial as it brings together individuals with diverse expertise and perspectives, ensuring a comprehensive approach to problem-solving. The team typically includes members from different departments or areas of specialization relevant to the issue, such as IT, quality assurance, customer service, and others as needed. The objective is to ensure that all aspects of the problem are considered and addressed by the team members, leveraging each team member's unique skills and insights to devise effective solutions. This collaborative approach enhances problem-solving efficiency and fosters a culture of teamwork and shared responsibility.
Pro tip: Assemble a team large enough to cover all necessary skills and perspectives but lean enough to remain agile and efficient. Due to coordination challenges, large teams can increase costs and diminish efficiency. Also, clearly define roles and responsibilities from the outset to avoid duplication of effort and ensure that team members focus on areas where they can add the most value.
D2: Problem description
The emphasis is on clearly and precisely defining the problem. This involves a detailed characterization of the issue, including its symptoms, the conditions under which it occurs, its impact on operations or services, and any relevant data or evidence. The goal is to ensure that all team members have a unified understanding of the problem, enabling focused and effective problem-solving efforts. This step lays the foundation for identifying the root cause and developing appropriate solutions, making it essential for the team to invest time in accurately describing the problem, avoiding ambiguities, and setting the stage for the subsequent steps in the 8D process.
Good practices include:
Be specific: Clearly define the problem with specific details, avoiding vague descriptions.
Use data: Incorporate relevant data and evidence to describe the problem accurately.
Customer perspective: If applicable, include how the problem affects customers or end users.
Visual tools: Utilize relevant diagrams, flowcharts, or images to illustrate the problem, making it easier to understand. Using a meme with a burning trash can is not a good practice.
Language clarity: Use clear and concise language understandable by all team members, regardless of their technical or cultural background.
Boundaries: Define the scope and limits of the problem to focus efforts and avoid scope creep.
Consensus: Ensure all team members agree on the definition of the problem to avoid misunderstandings later on.
D3: Implement and verify interim containment actions
Immediate actions are taken to temporarily contain the problem and prevent its impact on further processes or products. Containment is crucial for mitigating immediate risks and maintaining operational stability while the team develops a permanent solution. The actions taken during this phase should be:
- Targeted: Directly addressing the problem to contain its effects effectively.
- Reversible: Designed to be easily removed or adjusted once a permanent solution is in place.
- Monitored: Closely observed to ensure they effectively control the problem without introducing new issues.
- Documented: Thoroughly recorded for accountability, future reference, and to inform the development of the permanent solution.
The success of these actions is verified through testing or real-world application, confirming that they effectively contain the problem until a more sustainable solution can be implemented.
D4: Root cause analysis (RCA)
The team investigates to identify the root cause(s) of the problem. This step is critical for developing effective, lasting solutions rather than merely addressing superficial symptoms. Key aspects include:
- Methodical approach: Structured techniques, such as the 5 Whys, Fishbone (Ishikawa) diagram, or Fault Tree Analysis, are employed to trace the problem back to its root cause.
- Data-driven: Relying on evidence and data collected during the problem description phase to support findings.
- Collaborative analysis: Engaging team members from various functions to leverage diverse perspectives and insights.
- Documentation: Meticulously record the analysis process and findings to ensure transparency and facilitate future reference.
RCA aims to pinpoint the fundamental reasons for the problem, setting the stage for developing solutions that effectively prevent recurrence.
Within the IT domain, professionals often employ a range of methodologies tailored to the complexities of technological environments. These include:
- 5 Whys: This is a simple yet effective technique that involves asking "Why?" repeatedly (typically five times) to drill down to the root cause of the problem, starting from its symptoms.
- Fishbone (Ishikawa) diagram: This visual tool helps categorize potential causes of problems into various branches, facilitating the identification of the root cause.
- Fault Tree Analysis (FTA): A top-down, deductive failure analysis that uses a diagrammatic approach to break down the causes of a failure into constituent components, often used in complex systems.
- Pareto analysis (80/20 Rule): This approach involves identifying the few critical issues that contribute to most problems, based on the principle that 80% of problems are often due to 20% of causes.
- Change analysis: an investigative technique that analyzes changes that occurred when the problem was first noticed. It can be particularly effective in IT environments where updates, patches, and configuration changes are frequent.
D5: Develop permanent corrective actions (PCA)
Based on the root cause analysis, permanent solutions are devised to correct the root cause of the problem and prevent recurrence. This phase involves:
- Solution design: Crafting solutions directly tied to the root cause, ensuring that the problem is resolved at its source.
- Feasibility assessment: Evaluating the practicality, cost-effectiveness, and potential impact of the proposed solutions, taking into account available resources and constraints.
- Stakeholder input: Engaging relevant stakeholders to gather insights and feedback on the proposed corrective actions, ensuring alignment and support.
- Risk analysis: Anticipating potential side effects or new problems that the corrective actions might introduce and planning accordingly to mitigate these risks.
D6: Implement and validate permanent corrective actions
The permanent solutions are implemented, and their effectiveness is validated to ensure the problem is resolved. This phase involves:
- Implementation planning: Developing a detailed action plan for rolling out the corrective measures, including timelines, responsibilities, and resources required.
- Execution: Carrying out the planned actions according to the implementation plan, ensuring adherence to the defined processes and standards.
- Monitoring and measurement: Establishing metrics and monitoring systems to evaluate the effectiveness of the implemented solutions over time, ensuring they effectively address the root cause.
- Adjustments: Making necessary adjustments based on the outcomes and feedback, ensuring the corrective actions are optimized for maximum effectiveness.
The goal is to ensure that the implemented solutions effectively resolve the problem and contribute to the long-term improvement of processes and systems. Most long-term improvements may be considered standard road map items that contribute to the increase in product value. From the team's perspective, permanent corrective actions are no longer a firefighting effort but daily development.
D7: Prevent recurrence
Changes are made to systems, policies, procedures, and practices to prevent the same or similar problems from occurring in the future. This may involve:
- Best practices: Documenting and sharing the insights and best practices derived from the problem-solving process across relevant teams and departments.
- Policy and procedure updates: Revising existing policies, procedures, and standards to incorporate the changes made and lessons learned during the corrective action process.
- Training and awareness: Conducting training sessions and awareness programs to ensure all relevant personnel are informed about the changes and understand the new practices. It’s good practice to include past experiences in knowledge transfer across other teams or onboarding training for newcomers.
- Continuous monitoring: Establishing ongoing monitoring mechanisms to quickly identify and address any recurrence of the problem or emergence of similar issues.
The focus is on embedding successful strategies into the organization's culture and operations to safeguard against future problems.
D8: Congratulate your team
The final discipline involves recognizing and celebrating the team's efforts that contributed to solving the problem. Publicly acknowledging the team members' hard work, dedication, and success reinforces their value to the organization and boosts morale. Celebrating successes motivates the team and the wider organization, encouraging continued engagement and effort. Recognition events foster a sense of unity and camaraderie among team members, strengthening the team dynamic. Highlighting the successful resolution of a problem and the effective collaboration of the team promotes a culture of continuous improvement and learning within the organization.
Utilizing D8 effectively to ensure employee or team visibility involves strategic recognition that highlights individual and team achievements and showcases their contributions to the broader organization. Here are some approaches:
- Company-wide communication: Share the team's success story through company-wide emails, newsletters, or meetings, detailing the problem faced, the steps taken to resolve it, and the impact of their work.
- Recognition in meetings: Acknowledge the team's achievements in high-visibility settings, such as all-hands meetings or departmental gatherings, allowing leadership to publicly commend the team's effort.
- Awards and certificates: Implement formal recognition programs that include awards or certificates for teams that successfully solve significant problems, enhancing their visibility across the organization.
- Spotlight stories: Feature the team and their project in internal channels, such as the company intranet, blogs, or bulletin boards, providing a detailed narrative of their challenge, approach, and outcomes.
- Peer recognition platforms: Use internal platforms where employees can give and receive kudos, allowing team members to highlight and appreciate each other's contributions, further enhancing visibility.
- Leadership acknowledgment: Encourage leaders to personally thank the team through direct communication or public acknowledgment, reinforcing the importance of their contributions from the top down.
By strategically employing these recognition methods, you celebrate your team's achievements and significantly raise their visibility within the organization, fostering a culture of appreciation and high performance.
Beyond D8
The D8 report is a tool. It is worth addressing people’s need for closure and leveraging our hunter instinct in favor of engagement and learning. You can utilize standard tools from the manager’s toolbox like retrospectives, documenting lessons learned, or some less standard ones. For tackling a challenging issue, it might be a good idea to organize an informal team gathering, disconnected from the workspace, to allow everyone to unwind and strengthen interpersonal bonds. This could be anything from a team lunch, an outdoor activity, or a virtual game night. Use this time to acknowledge the collective effort and resilience, reinforcing that setbacks are part of the process and don't define the team's value or capabilities. Such a meeting may be an excellent occasion to give your team a sense of closure. Mark the conclusion and celebrate the team's accomplishments with a distinct celebration or symbolic act. This signals a clear end, allowing the team to mentally and emotionally move forward. This approach helps build a resilient team that views each project, regardless of its complexities, as an opportunity to enhance their collective expertise and fortitude.
Theory in practice
Ok... we have our 8Ds. Now, how do I use them in real life?
Please write it down
Quite obvious yet quite important. It is a generally good idea to use a format that will be easy to distribute among all stakeholders and enable future use. Good examples are:
- JIRA/Asana/Trello ticket
- G-Doc/G-Sheet template
- Confluence page
- Slack canvas
- GitHub doc repo
- anything that your organization is familiar with
Make sure to be able to learn from it
A good database or history of encountered crises lets you learn and prevent future crises. Reviewing the effectiveness of undertaken preventative safeguards may give you insights into the source of problems that may not surface after analyzing a single failure.
A growing database of 8D reports may evolve into a knowledge database that your colleagues and successors will analyze. Remember that one day, someone will try to scrape those documents or feed them to an LLM.
Distribute to stakeholders
In the fever of problem-solving, it is worth remembering that your report will reach high-level managers who have no context for the situation you are dealing with. Viewing the report from their perspective and including information that will answer most of their questions may be beneficial.
Enable followers
Your work may be used as inspiration by other organization members. It might be worthwhile to make your report reusable. A good practice may be linking this post as a guideline. ;)
Adapting the 8D report to the IT industry is about more than just problem-solving. It's about constant learning, organizational culture improvement, and ensuring outstanding team member feats are seen and heard. Embrace this method with joy and professionalism, and lead your team to triumph against the odds. After all, in IT, every problem is an adventure waiting to be tackled!