LWN.net Logo

December CRYPTO-GRAM newsletter

December CRYPTO-GRAM newsletter

Posted Dec 16, 2003 0:56 UTC (Tue) by sandy_pond (guest, #9734)
In reply to: December CRYPTO-GRAM newsletter by sandy_pond
Parent article: December CRYPTO-GRAM newsletter

I take this back. The causes are detailed on pages 24-33. They have nothing to do with any worms/viruses. I think the author is somewhat disingenuous to report different. Key points made in the report:

Regarding the central alarm system:

the alarm process essentially stalled while processing an alarm event, such that the process began to run in a manner that failed to complete the processing of that alarm or produce any other valid output (alarms).

and

Although the alarm processing function of FE s EMS failed, the remainder of that system generally continued to collect valid real-time status information and measurements about FE s power system, and continued to have supervisory control over the FE system.

Regarding the remote consoles (RTUs) used to collect alarms and send them to the central unit.

this occurred because the data feeding into those terminals started queuing and overloading the terminals buffers.

Regarding the backup central alarm server:

all other EMS software running on the first server automatically transferred ( failedover ) onto the back-up server. However, because the alarm application moved intact onto the backup while still stalled and ineffective, the backup server failed 13 minutes later, at 14:54 EDT. Accordingly, all of the EMS applications on these two servers stopped running.

Regarding the correction of the problem:

it was only during a post-outage support call with GE late on 14 August that FE and GE determined that the only available course of action to correct the alarm problem was a cold reboot 17 of FE s overall XA21 system. In interviews immediately after the blackout, FE IT personnel indicated that they discussed a cold reboot of the XA21 system with control room operators after they were told of the alarm problem at 15:42 EDT, but decided not to take such action


(Log in to post comments)

December CRYPTO-GRAM newsletter

Posted Dec 16, 2003 16:13 UTC (Tue) by sphealey (guest, #1028) [Link]

the alarm process essentially stalled while processing an alarm event, such that the process began to run in a manner that failed to complete the processing of that alarm or produce any other valid output (alarms).
Which tells you absolutely nothing about what went wrong or the root cause. That paragraph describes 99.995% of all computer-related problems.

sPh

December CRYPTO-GRAM newsletter

Posted Dec 16, 2003 22:57 UTC (Tue) by sandy_pond (guest, #9734) [Link]

But obviously not related to an MS worm/virus. Unless you put your tin foil hat on and ignore the 10 pages of text describing the problem and fix.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds