Every automation system eventually develops a situation requiring advanced engineering support. This type of break-fix support could be due to any number of causes—power outages, server maintenance, operator error, etc. But no matter what the root issue turns out to be, sooner or later, every system will need it. And that’s why it’s equally sure that, here at Avanceon, all our engineers will at some point find themselves helping to support customers to keep their manufacturing processes running.
Troubleshooting, like coding, is a unique and special set of skills, and each person might have a slightly different approach to resolving an issue. When I find myself in a break-fix situation, I tend to follow a regular procedure to try not only to fix the problem but also determine the root cause of the issue.
Step 1: Ask questions
Begin by discussing the symptoms of the issue with the person reporting it. If you think about it, how can you solve a problem if you don’t know what the problem is? Asking the right questions in this first phase of the support process is vital to enabling a successful resolution.
Step 2: Replicate the issue yourself
Sometimes the information you’ve gathered in the first step might not quite paint the full picture of the situation. When I try to replicate the issue, I often gain insight into what the user is actually reporting.
Step 3: Check the log files
A well-built system will provide evidence of what is happening in the event something is not working properly. If you’re lucky, error messages will provide the context for understanding the actual problem. Even if the system hasn’t generated any error messages, the system logs can often provide details regarding behind-the-scenes issues in a script or database transaction. Analyzing these messages can often reveal the issue at hand.
Step 4: Trace backwards
Start at the point in the system where the issue has been reported and trace backwards. For example, let’s assume the user is experiencing an issue on a specific application screen. Begin drilling down into the specific elements of the screen that are not working—a button, for example. Dig into the code/function behind the button to see how it’s supposed to work. Perhaps the button triggers a script that queries a database for data, but that data isn’t displaying on the screen. Tracing through these individual elements/functions can often help to understand where in the process the malfunction occurs.
Step 5: Restart/redeploy the system
Usually, it’s not going to be possible to restart servers in a manufacturing system without taking down other, still functional parts. However, I find it amazing how often simply turning it off and on again will fix a system when some underlying aspect gets out of sync.
Step 6: Document the findings
It’s always good practice to document the issue, both for the customer’s benefit and to provide insight to the support team. One of the main benefits of documentation in a support situation is to provide some guidance should the same situation reoccur. You don’t want to spend valuable time trying to reanalyze an issue if you don’t have to.
There’s nothing revolutionary in my six-step process, but I find it’s a workable model for helping me find, analyze and correct system issues. If you have a similar best practice, please share it with us!
Find more information about how Avanceon approaches engineering projects.
Ed Miller is as engineer at Avanceon, a certified member of the Control System Integrators Association (CSIA). For more information about Avanceon, visit its profile on the Industrial Automation Exchange.