LWN.net Logo

December CRYPTO-GRAM newsletter

December CRYPTO-GRAM newsletter

Posted Dec 15, 2003 23:11 UTC (Mon) by sandy_pond (guest, #9734)
Parent article: December CRYPTO-GRAM newsletter

Also from the report:

There is also no evidence, nor is there any information suggesting, that viruses and worms prevalent across the Internet at the time of the outage had any significant impact on power generation and delivery systems. SWG analysis to date has brought to light certain concerns with respect to: the possible failure of alarm software; links to control and data acquisition software; and the lack of a system or process for some operators to view adequately the status of electric systems outside their immediate control.

This sound like they know what the problem was but are not being specific. The exact failure mode may not be interesting enough to detailed in the report, but, given the coincidence it would have been better if they had delineated the exact cause of the failure.


(Log in to post comments)

December CRYPTO-GRAM newsletter

Posted Dec 16, 2003 0:56 UTC (Tue) by sandy_pond (guest, #9734) [Link]

I take this back. The causes are detailed on pages 24-33. They have nothing to do with any worms/viruses. I think the author is somewhat disingenuous to report different. Key points made in the report:

Regarding the central alarm system:

the alarm process essentially stalled while processing an alarm event, such that the process began to run in a manner that failed to complete the processing of that alarm or produce any other valid output (alarms).

and

Although the alarm processing function of FE s EMS failed, the remainder of that system generally continued to collect valid real-time status information and measurements about FE s power system, and continued to have supervisory control over the FE system.

Regarding the remote consoles (RTUs) used to collect alarms and send them to the central unit.

this occurred because the data feeding into those terminals started queuing and overloading the terminals buffers.

Regarding the backup central alarm server:

all other EMS software running on the first server automatically transferred ( failedover ) onto the back-up server. However, because the alarm application moved intact onto the backup while still stalled and ineffective, the backup server failed 13 minutes later, at 14:54 EDT. Accordingly, all of the EMS applications on these two servers stopped running.

Regarding the correction of the problem:

it was only during a post-outage support call with GE late on 14 August that FE and GE determined that the only available course of action to correct the alarm problem was a cold reboot 17 of FE s overall XA21 system. In interviews immediately after the blackout, FE IT personnel indicated that they discussed a cold reboot of the XA21 system with control room operators after they were told of the alarm problem at 15:42 EDT, but decided not to take such action

December CRYPTO-GRAM newsletter

Posted Dec 16, 2003 16:13 UTC (Tue) by sphealey (guest, #1028) [Link]

the alarm process essentially stalled while processing an alarm event, such that the process began to run in a manner that failed to complete the processing of that alarm or produce any other valid output (alarms).
Which tells you absolutely nothing about what went wrong or the root cause. That paragraph describes 99.995% of all computer-related problems.

sPh

December CRYPTO-GRAM newsletter

Posted Dec 16, 2003 22:57 UTC (Tue) by sandy_pond (guest, #9734) [Link]

But obviously not related to an MS worm/virus. Unless you put your tin foil hat on and ignore the 10 pages of text describing the problem and fix.

Copyright © 2008, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds