December CRYPTO-GRAM newsletter
Posted Dec 16, 2003 0:56 UTC (Tue) by sandy_pond
In reply to: December CRYPTO-GRAM newsletter
Parent article: December CRYPTO-GRAM newsletter
I take this back. The causes are detailed on pages 24-33. They have nothing to do with any worms/viruses. I think the author is somewhat disingenuous to report different. Key points made in the report:
Regarding the central alarm system:
the alarm process essentially stalled while processing an alarm event, such that the process began to run in a manner that failed to complete the processing of that alarm or produce any other valid output (alarms).
Although the alarm processing function of FE s EMS failed, the remainder of that system generally continued to collect valid real-time status information and measurements about FE s power system, and continued to have supervisory control over the FE system.
Regarding the remote consoles (RTUs) used to collect alarms and send them to the central unit.
this occurred because the data feeding into those terminals started queuing and overloading the terminals buffers.
Regarding the backup central alarm server:
all other EMS software running on the first server automatically transferred ( failedover ) onto the back-up server. However, because the alarm application moved intact onto the backup while still stalled and ineffective, the backup server failed 13 minutes later, at 14:54 EDT. Accordingly, all of the EMS applications on these two servers stopped running.
Regarding the correction of the problem:
it was only during a post-outage support call with GE late on 14 August that FE and GE determined that the only available course of action to correct the alarm problem was a cold reboot 17 of FE s overall XA21 system. In interviews immediately after the blackout, FE IT personnel indicated that they discussed a cold reboot of the XA21 system with control room operators after they were told of the alarm problem at 15:42 EDT, but decided not to take such action
to post comments)