Sounds like double ungood day for Linode
Sounds like double ungood day for Linode
Posted Jul 28, 2025 14:19 UTC (Mon) by paulj (subscriber, #341)In reply to: Sounds like double ungood day for Linode by farnz
Parent article: LWN is back
Posted Jul 28, 2025 16:05 UTC (Mon)
by farnz (subscriber, #17727)
[Link] (5 responses)
For example, the door badge system might be configured to automatically sync access permissions from internal directories (so that you don't have to manually update DC access permissions as people join and leave), and fail closed if internal directories including all mirrors in other DCs are unavailable (on the basis that you don't want an attacker to be able to cut the OOB Internet link and get in), then you allow the badge system to use DC Internet (because it's more reliable than the OOB link, so you get fewer problems where it can't sync) and then you terminate the OOB Internet link for the badge system (on the basis that the DC has redundant fibres and routers, so won't go down, whereas the OOB link is a consumer link without redundancy). And then you update the config management system so that, for all but legacy systems (like the DC routers), if you don't confirm that a change is good to test within 5 minutes, it autoreverts, so people develop a habit of testing rather than carefully confirming that they've not made a mistake.
All these decisions sound reasonable in isolation, but they chain together to a world where a mistake with the DC's router configurations result in the door badge system locking you out.
Posted Jul 28, 2025 21:02 UTC (Mon)
by smurf (subscriber, #17840)
[Link] (4 responses)
The interesting part about cyclic dependencies is that you have customers, which need to be notified when their servers go belly-up for some reason (including when the router they're behind blows a fuse). Thus you need dependency tracking anyway. Which presumably should alert you when there's any cycles in that graph. Which should prevent this from happening. Famous last words …
Posted Jul 29, 2025 0:08 UTC (Tue)
by mathstuf (subscriber, #69389)
[Link] (3 responses)
Posted Jul 29, 2025 7:33 UTC (Tue)
by taladar (subscriber, #68407)
[Link] (2 responses)
Posted Jul 29, 2025 11:02 UTC (Tue)
by mathstuf (subscriber, #69389)
[Link] (1 responses)
It's obviously not going to know about problems like "Bob has the key, but it is lost on his 100-key keyring", but it (hopefully) can notice things like "it says to get the key, but not where the key lives" that might be implicit knowledge in the author's eyes but these instructions are those that really should have a thorough once-over by someone *not* intimately familiar with the process to help rid it of such implicit assumptions.
Posted Jul 29, 2025 11:29 UTC (Tue)
by farnz (subscriber, #17727)
[Link]
While the LLM made a mistake here (since there's two people you rely on, not one), it's still highlighted that a document saying that 1 of 4 roles is needed to recover has, in fact, become 1 of 2 people.
And I suspect that if you could review the chain of decisions that led to "door badge system depends on DC being up and running", every single one would make sense in isolation; it's only when you test "what happens when we take the DC down remotely? Can we get in?" that you discover that these decisions have chained together to imply "when the DC is down, door badge system is also down".
Sounds like double ungood day for Linode
Sounds like double ungood day for Linode
Sounds like double ungood day for Linode
Sounds like double ungood day for Linode
Sounds like double ungood day for Linode
It can also do things like decide that certain job titles imply particular people (not reliably, but often enough to be useful), and then highlight that your recovery plans depend on Chris being present at site, because Chris is the Site Manager, the Health and Safety Lead, and the Chief Electrician, and your plans depend on the Site Manager, the Health and Safety Lead, the Chief Electrician, or the Deputy Chief Electrician being on site.
LLMs looking over your recovery plan