Using 5-Whys Will Make you Wiser

5 Whys Notebook

Summary

By following a structured and disciplined methodology you can achieve a deeper understanding of the root causes of any problem. The “5-Whys” Root-Cause-Analysis (RCA) methodology was invented by the founder of Toyota, used in their manufacturing process, and continues as one of the tools in the modern lean methodologies tool kit.

Why Do Fools Fall in Love?

The “5-Whys” embody the natural curiosity of humans that is obvious in children (and romantics), but somehow lost as we become "responsible" adults.  “Why does it get dark at night”?  “Well, because, the earth spins away from the sun at night.”. “Why does the earth spin?”  “Well, this is kind of hard to explain.” “Why?” (Inevitably, the answer becomes “It’s just the way it is”)

Blame it on the O-Rings

Let’s apply “5-Whys” to a real life example.  If you ask anyone familiar with the details of the 1986 space shuttle Challenger disaster, what caused the shuttle accident?  They will likely say “the O-rings”.  While the “O-rings” is a necessary part of the explanation, it is insufficient in providing a complete picture of what led up to that fateful moment.  You may say “that’s all I want to know”, and that’s fine; there’s nothing that says that all of the details must be understood by everyone.  But for those who want to know the complete story, “5-Whys” can help.

The [Report of the Presidential Commission on the Space Shuttle Challenger Accident], provides the facts used to develop this example.  Though investigators may have used “5-Whys”, there is no specific evidence that this was the selected method.  Following is the “5-Whys” analysis tree based on this information.

  1. Why did the space shuttle Challenger explode? Because the external hydrogen tank ignited due to hot gases leaking from one of the Solid Rocket Motors.
  2. Why did hot gases leak from one of the Solid Rocket Motors? Because the seal between the two lower segments of this motor failed to prevent a leak.
  3. Why did this seal fail? Because the O-Ring intended to compensate for variations in the seal between the segments failed due to effects of extreme temperature.
  4. Why did the O-ring fail its intended purpose? Because of a known design flaw with the seal.
  5. Why did the mission proceed with a known design flaw? There were serious flaws in the launch decision making process. Including, failure to adequately address problems that require corrective action, both NASA and Thiokol accepted escalating risk apparently because they "got away with it last time".

Note that the structure of the analysis tree consists of a “Why” question followed by a theory that answers that question. Based on that theory, a new “Why” question is formulated and a theory about that question is stated.  This process continues for as long as the question remains relevant (5 times is just stretch goal to prevent a premature decision from being made).

It is important that the theory be stated in terms that would enable a “Why” question to follow.  For example if the question asked earlier about what caused the accident, and the answer “the O-rings” were left unchallenged, we would be left without any new insight.  The challenge should be “what about the O-rings?” and the response might be, “well, they failed”.  Now we can ask the question “Why did they fail?”  This is where the process usually breaks down.  We allow our impatience to outweigh our curiosity, and don’t want to annoy anyone with obvious and obnoxious questions.  But as you can see in the analysis, persistence revealed the fact that this was a known problem, and the mission was allowed to proceed anyway.

A Comprehensive Explanation

The explanation in the analysis actually provides a more comprehensive explanation for the O-ring failure, because we also challenged the upstream premises. For example, the question should also be asked “How could an O-ring failure cause the shuttle to explode?”.  “This caused a seal failure in the segments of one of the solid rocket motors”.  “Go on”.  “That caused hot gases to escape onto the external hydrogen tank”.  Eureka!  Now we have a much richer explanation for the whole episode.

Note that while the “Why” question is used to traverse the tree in a downward direction to get to the ultimate root cause, we can use “Because” to help explain the conclusion from the bottom up. For example: Because there were serious flaws in the launch decision making process, the mission was allowed to proceed with a known design flaw.

With this more comprehensive picture of what really happened, the person who only cared that it was the “O-Ring” that caused the failure, now may want to know more about the decision making process at NASA.  In other words, this issue now becomes relevant to them.

Understand the Pitfalls

The “5-Whys” tool has been criticized, even by former Toyota executives, for being too basic to analyze root causes to the depth needed to ensure causes are fixed.  To me, this is like saying:

“I’m disappointed in the new screwdriver set I got for Father’s Day because I can’t use them as chisels”. 

Every tool has a purpose as well as limitations and both need to be well understood.

Some things to keep in mind while implementing a “5-Whys” Root Cause Analysis:

  • Management support - Make sure that your management fully understands and supports this approach. This will avoid unrealistic directives such as: don’t spend more than one hour on any problem. Also, it’s a good idea to measure the effectiveness of the exercise by measuring time spent.
  • Avoid confusing symptoms with causes – This is a common mistake that is made while doing root cause analysis.  Symptoms tell you there is a problem, but do not tell you the cause.  The oil light on your dashboard goes on telling you there is a problem, but until you investigate further you won’t know the cause.  A simple solution is to keep asking “Why” until the answer becomes irrelevant.  For example, does it really matter to you why the oil pump failed if this is the first time it failed in a 15 year old car; however, you would probably want to know why if it failed a week after you left the showroom.
  • Avoid focusing on a single root cause – Often there is the temptation to ignore other potential root causes once a favorite emerges.  This is dangerous because you go down a path that solves only a minor part of the root cause problem.  Consider all possible root cause until fully exhausted.
  • Avoid jumping to quick conclusions – There will be the tendency for the team to lose interest and say “alright, enough is enough. I think we found the root cause of the problem”.  This is where management support is critical.  Once again, keep asking “Why” until the answer becomes irrelevant.
  • Focus on systemic issues not individuals – Blame should be placed on the system, not individuals.  Even if the root cause is that a person made a mistake, we ask “Why”.  The absolute root cause may be a lack of training, in appropriate hiring standards, etc.
  • Avoid bias caused by current knowledge – The team in place may not have the right expertise and knowledge to adequately perform the analysis.  Make sure the knowledge experts are available and assigned to the team.
  • Be aware of politics – This goes hand in hand with management support.  In my experience "other" departments reject the conclusions from an analysis, even though my manager fully supported the conclusions.  (If you want to really experience the effects of politics try taking a problem, such as “Why did the real estate bubble occur and subsequently burst?”, and do a “5-Whys” with some of your friends.)

The Objective is Root Cause Analysis

This paper talks about a specific technique for performing Root Cause Analysis (RCA).  RCA is often is treated as busy work or something that needs to be done to satisfy the VP of Quality.  But RCA is only one step towards the broader goal of implementing permanent corrective actions, which is beyond the scope of this discussion. (Additional discussion will be offered to discuss 8D, or 8 steps in problem solving, of which step 4 is RCA). 

The important point here is that RCA must be treated as a necessary step in product development and support. You wouldn’t think of shipping a software release if it doesn't pass its testing criteria; nor should the RCA process be short cycled.

I believe this is more of a problem when dealing with processes rather than automated systems.  Politics become more of a potential issue in process discussions and agreement on the root causes become more arguable or more subjective.  This is where “5-Whys” shows itself as a powerful tool; when you hear “That’s just the way it is” to your why question, you know you’ve stepped on a sensitive issue, but at least you opened the discussion to discover the true root cause.

Conclusion

“5-Whys” is a structured methodology for finding the root cause of a problem, whether in an automated system or a process. The power of this approach lies in the fundamental question “Why”?  “What” happened is usually a much easier question to answer than “Why” it happened.  But asking too many “tough” questions may cause the team or organization to become annoyed.  However with management support and a disciplined execution, “5-Whys” could be one of the most powerful tools in your toolbox.