5 Steps to Make Troubleshooting Less Troublesome

There’s an old, not so funny joke about troubleshooting electrical devices with a punch line that ends with “is it plugged in?”

The reality is that it is easy to overlook basic or simple issues, especially when troubleshooting mechanical, electrical or software problems isn’t part of your regular routine. But following the basic troubleshooting steps listed below can prevent much frustration and lost time. (To be suggestive, many of these steps can be applied to our everyday lives, not just at work.)

There is a scientific and philosophical rule known more commonly as Occam’s razor that states that entities should not be multiplied unnecessarily. In layman’s terms, the simplest explanation is usually the best one. Occam’s razor is often stated as an injunction not to make more assumptions than you absolutely need to. In other words, do not over complicate things. This is especially important when beginning the troubleshooting process.

Here are five general steps to consider when troubleshooting in manufacturing (and in general):

  1. Identify the problem
    • Take the time to understand the malfunction. Look at the problem from where you believe it starts, not necessarily from the end effect you may be witnessing. Sometimes what you observe is a symptom of the problem but not the problem itself. This is the first critical step and usually dramatically reduces the steps required to diagnose the culprit causing the problem. This may also require checking even the simplest things like whether you have power. (Sorry, couldn’t resist.)
  2. Establish a theory of probable cause
    • This is where Occam’s razor should come in. Start by considering the most obvious things first, whether it be a power supply, a sensor, a cable(s) or even a connector, (especially field attachables). Then work your way to the more complex if needed, from network wiring in networks like Ethernet/IP or Profinet, to network traffic or ladder code sequencing. You shouldn’t start examining the more complex until you have eliminated the most obvious. Sometimes a poor performing sensor cable can mimic code problems. Be sure to make a list so you can easily remember your thoughts and probable causes to prevent covering things twice; that is a huge time waster.
  3. Establish an action plan and execute the plan
    • Start testing probable cause theories to try to determine the actual cause or root cause of the problem. Remember to always consider what you understand as the problem and your theories, then start executing your testing from the simplest possible cause to the more complex (if needed). Be careful not to get distracted by issues you find along the way, like something unrelated you remembered you wanted to take care of but is not related to the current problem. (This is where your written list really comes in handy.) Start examining methodically, don’t jump around and don’t repeat steps you’ve already eliminated.
      Hints: Try swapping components when possible and see if the problem corrects itself. And check that someone didn’t change something recently from the original design. This can many times manifest itself as the proverbial “ghost in the machine” syndrome. Consider this process a ladder you are climbing from the simple lower steps to the higher more complex steps. Using this analogy, why climb higher if you don’t need too.
  4. Verify full system functionality
    • Once you have found what you think may be the problem and corrected it, be sure to validate the system after the repair or replacement and make sure it is functioning as it should. In some rare cases, one root cause can cause other problems or damage, so it is important to ensure the system is functioning as it should before returning it back to service. This may lead to some pushback because of the additional time needed, but it could take the system off-line again even longer if unresolved problems are overlooked.
  5. Document the process.
    • Finally, be sure to document what you found and maybe even how you found it in a log or service documentation system. This is especially important if the problem was caused by a part wearing out from normal wear, as it is likely to happen again. If you can categorize the problem, this will make it easier for you and other staff to detect and remedy if it arises again. You may want to consider reviewing your findings at intervals to see if there are possible improvements or changes, like routine maintenance or more reliable components, that could minimize these problems in the future.

Establishing a good process like this will help you more quickly troubleshoot your application or machine, and even help with home projects. Critical thinking like this helps eliminated wasted time, frustration and most importantly, unplanned down time.