Alarm issues in a production environment are something that every operator knows they have to deal with. Just as death and taxes are an inevitable part of life, process alarms are as much a part of the production business as downtime, maintenance and the occasional bottleneck.
Considering its integral role in production operations across the discrete manufacturing, continuous process and hybrid process industries, and wanting to get to some real insight into how production operators and engineers deal with these ubiquitous alarms, Automation World asked its subscribers to comment on a series of questions related to alarm management best practices. To help ensure an accurate representation of the state-of-the-state in alarm management, the survey was anonymous, with no data collected on respondents other than what industry they worked in and the size of their company.
The nearly down-the-middle split across practically every question on the survey highlights the persistence of alarm management problems across industries.
The baseline
To establish a framework for the survey, the questions focused on the basic best practices of alarm management across seven areas. The areas addressed in the survey included:
- Creation and adoption of an alarm management philosophy for the business;
- Alarm performance benchmarking;
- “Bad actor” alarm resolution;
- Alarm documentation and rationalization;
- Alarm system audit and enforcement;
- Real-time alarm management; and
- Control and maintenance of alarm system performance.
Respondents represented a good cross section of industries, with 52 percent coming from the process industries (primarily petroleum and utilities), 33 percent from discrete manufacturing (machinery and automotive being the two largest industry sectors responding in this group), and 14 percent from hybrid industries, such as food and beverage and pharmaceuticals.
With such broad representation from these industry sectors, an initial review of the results proved surprising: Roughly 50 percent of respondents do not follow any of the basic alarm management best practices.
Yes, you read that right—about 50 percent of end users have no guiding philosophy or practice when it comes to something as ubiquitous as alarm management. Yet, nearly 70 percent of respondents say that alarm overload affects their ability to properly operate the production process.
So, what gives?
Not my job, man
Unfortunately, it seems that many operators and engineers in production operations are disconnected from an overarching alarm management practice because they don’t see it as being part of their day-to-day job responsibility or part of the application they are working on at a given moment, says Rich Chmielewski, Simatic PCS 7 marketing manager for Siemens Industry Group, Alpharetta, Ga.
In his job, Chmielewski visits plants of all types on a regular basis and even conducts alarm management training programs at various facilities. He says the results of the Automation World survey coincide with what he sees in the real world.
“We typically do not see customers defining specifications in terms of deployments or ongoing maintenance of systems following any sort of best practice or standard,” Chmielewski says. “If an alarm is in the system, most operators and engineers figure it must be there for a reason, so they just keep it in. The result is that more and more alarms keep getting put into the system and there’s no comprehensive understanding or ownership of what they’re doing.”
As dire as the alarm management situation is at many production facilities, some progress is being made on a site-by-site approach.
David McCarthy, president and chief executive officer of TriCore Inc., a systems integration firm based in Racine, Wis., says that more systems designers are looking to head off the overabundance of nuisance alarms in facilities they work in by first providing a functional specification as a foundation document detailing the operation of plant floor software.
“Embedded in this document, on a function-by-function basis, is all the critical fault-response behavior of the system software (in addition to all items related to the functional behavior),” says McCarthy. “In this document, all alarms and associated system responses are defined, but in the context of the individual functional operations in which they might occur.”
These specifications are vetted not only with technical staff, but with operations staff as well, McCarthy adds. This is done to ensure the automated system meets operational requirements and that all safety issues are adequately addressed.
When this is done, “process shutdowns, pauses or stops occur automatically in response to critical alarm conditions, not manually — as is the case with older systems,” says McCarthy. “Safety of people, followed by equipment and product, are all vetted in these specifications.”
Good system designers can no longer be casual in their deployment of non-critical alarms, according to McCarthy.
Operators just “get numb” to all the nuisance alarms after awhile, adds Chmielewski, who says he has seen operations where standing alarms have been flashing for weeks and months unaddressed.
Operations realities
If there’s been any bright side to the down economy over the past few years for manufacturers and processors, it’s that many engineers have not been able to retire as expected. As a result, the brain drain that companies have long feared has not come to pass at a high level. The benefit to this on the alarm side is on-hand experience in dealing with nuisance alarms and knowing which ones can be safely ignored.
The real issue, however, is that alarm management problems are relatively cheap and easy to fix. The problem is that they are simply not being addressed.
“Depending on the industry and corporate culture, many businesses do not want to invest the time or energy required to identify and correct bad actor alarms (the small number of alarms that create most of the alarm problems),” say McCarthy. “I suspect that, for many companies, the return on investment is simply not compelling enough.”
Chmielewski concurs: “Bad actor alarm management is an easy problem to deal with, but it’s just not being done on a widespread basis, from the Tier One companies on down to the smaller operations. At bigger companies, the problem isn’t encountered quite as often, as there are more people involved via committees to implement best practice applications. But at smaller operations, the guy who’s in charge of the production systems today is up on the roof tomorrow dealing with HVAC issues. That’s a big part of the problem of why the alarm issues stay at the process level and don’t get resolved at a higher systems level by the operations team.”
On a macro level, alarms are just one small part of the problem when it comes to having a documentation and operational philosophy at production facilities. According to Chmielewski, the documentation at many plants for the DCS (distributed control system) are often not kept up, and the process and instrumentation diagrams as well as the process narratives are often out of date. Ultimately, the problem is that DCSs are easy to change. Simulation codes are put in and forgotten about, old products no longer produced are still in the system, but no one wants to delete them in case the product comes back.
“There are just too many cooks in the kitchen,” Chmielewski says. “In essence, there’s a lack of understanding of the philosophy about what it costs to put a point of I/O into a system and how to actually execute it.”
Acting on the problem
All modern DCSs provide end users with the tools to adequately address alarm management problems. And with 70 percent of respondents noting nuisance alarms as a production problem, the need to fix the problem is clearly evident. Given that the technology is in place in most facilities, the real problem is the operational culture and how educated the operators are.
“Whether it’s a safety scenario when something’s gone awry, or when its an electromechanical issue or a shutdown scenario, operators often don’t understand all the aspects of the process,” says Chmielewski. “So they just tweak the alarm to get it behaving within a band they understand.
“It’s hard to teach someone about the philosophy behind the process. They just want the alarm that’s bothering them to go away. The tools may be in place and the operators can be taught, but it has to be enforced throughout the organization.”
This brings the discussion back to the documentation and philosophy issue. Like many production issues, alarm management at its core is directly affected by company culture and expectations. If good alarm management is not required of engineers and operators, it likely won’t take place.
Addressing the issue from a systems integrator viewpoint, TriCore takes several steps to mitigate alarm overload on the operations staff.
Alarm management tips
“On individual workstations, we display only those alarms associated with the default operational area of that workstation,” says McCarthy. “While it is still possible to view and manage all alarms from any workstation, by default only those pertinent to the area are visible.”
TriCore also provides a way to temporarily suppress nuisance alarms. McCarthy offers an example of how they do this: A sensor on a valve is providing independent feedback on the valve’s actual position. If a discrepancy is sensed between the actual and expected position of the actuator after allowing for valve actuation travel time, an alarm is typically generated (fault response behavior might also be initiated). Most of time when this occurs, however, the fault is actually in the sensor, not the valve. Until the sensor can be recalibrated, repaired or replaced, TriCore allows for the ability to suppress the alarm in a visible, traceable manner. Displays and reports are available to indicate what alarms are suppressed, when each was suppressed, and by whom.
On non-critical alarms, TriCore offers an interface where only alarms that can be changed by the operator are shown. This helps reduce the volume of alarms that the operation staff has to manage.
“For alarms that involve threshold levels,” says McCarthy, “We lock down who can perform threshold set point changes. We also put passive track-and-trace software in our systems so management can account for who did what, where and when. This added accountability helps to better manage the overall process.”
Click here to read the results of an Automation World survey on alarm management best practices.
Click here to read about the low operator involvement in setting alarm management policy.
Leaders relevant to this article: