|
|
Subscribe / Log in / New account

Fedora, UUIDs, and user tracking

Fedora, UUIDs, and user tracking

Posted Jan 18, 2019 9:56 UTC (Fri) by amarao (guest, #87073)
In reply to: Fedora, UUIDs, and user tracking by pabs
Parent article: Fedora, UUIDs, and user tracking

>> if a server,vm misbehave it's just shoot in the head

>I never understood this, wouldn't you want to at least diagnose the problem and fix whatever caused the problem so that the problem goes away for future server/vm/container instances?

As operator I can give you the reason. The key reason is a pride. We CAN shoot any node to the head and keep system running. So everyone talking about 'cattle' not with intention to cause a massive extinction in VM population but as a token of pride.

For the real cases it's mixed. As much as I love to work with Linux, I know how badly it can start to behave in some conditions. One unresponsive block device may brick any server to the level when everything is screwed. Some processes are in TASK_UNINTERRUPTIBLE, and this is the end of the game for the well-behaving system.

Replaced drive got a new letter instead of the old one because old one died with traces - old name is still used.

tmpfs is huge breach in a memory logic, as you have buffers which can not be discarded and all monitoring do not know how to detect 'low memory' condition anymore.

etc, etc. Some of bugs are so 'vendor fault' that it's easier to reboot than to report bugs (example: https://github.com/amarao/lsi-sata-fuckup, bug is now 7 years old, and this script is still hangs IO on whole enclosure).

Sometimes it's a well-known fault in application, but it's easier to cover it with load balancer than to debug it.

To deal with all this, there is the approach when individual server does not need have more reliability then a hard drive. We do 'raids' of buggy servers and this is fine.

If some fault types are often enough it may worth to investigate it. Rare ones are just silently ignored (can't reproduce). If operator have time, s/he can go to server and do some debug, but this is operators time, and there is a bug tech. dept in whole infrastructure to waste it in rare issue of incompatibility of beep with colctr.


to post comments

Fedora, UUIDs, and user tracking

Posted Jan 18, 2019 17:51 UTC (Fri) by jccleaver (guest, #127418) [Link] (2 responses)

> As operator I can give you the reason. The key reason is a pride. We CAN shoot any node to the head and keep system running. So everyone talking about 'cattle' not with intention to cause a massive extinction in VM population but as a token of pride.

Sure, I used to do this too with redundant HA systems. But once you've physically pulled the power plug on one box once and your service has kept working, you don't need to *keep doing that* to prove that to anyone anymore.

That's supposed to provide operational support for your service, not be part of the design whenever there's the slightest hint of something unusual.

Fedora, UUIDs, and user tracking

Posted Jan 23, 2019 18:52 UTC (Wed) by edgewood (subscriber, #1123) [Link] (1 responses)

You don't need to keep doing the test *if nothing changes*. If something does change, then you don't *know* that failover will still work the way you think it will, and the only way to be sure is to test it again.

Fedora, UUIDs, and user tracking

Posted Jan 23, 2019 19:38 UTC (Wed) by jccleaver (guest, #127418) [Link]

> You don't need to keep doing the test *if nothing changes*. If something does change, then you don't *know* that failover will still work the way you think it will, and the only way to be sure is to test it again.

The problem is that that leads to epistemological issues that fly in the face of real-world reasoning. Instead of worshiping at the altar of A/B tests, one can focus on the things that matter operationally. Stateless cattle are a tool, one of many, but they are not the end-all and be-all of operation any more than constant Gentoo-like recompilation for performance improvements "just in case" are.

Process matters, and the Operations side of DevOps have formed a new, artificially restrictive religion around this design methodology.


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds