Fedora, UUIDs, and user tracking

Posted Jan 17, 2019 18:28 UTC (Thu) by jccleaver (guest, #127418)
In reply to: Fedora, UUIDs, and user tracking by johannbg
Parent article: Fedora, UUIDs, and user tracking

> At this point, Fedora needs to go back to it's roots of being genering,clean the core/baseOS as should have been done from the start, provide all the Red Hat product stack ( or it's open upstream equallent ) as rpm in the distribution so the remaining admins dont vanish into the cloud and the distribution put on a serious diet by shaving off roughly 7k components from the distribution so it becomes agile enough again to deal with the changes in the linux environment and then just hope for the best...

I really couldn't agree with this more.

"Fedora Workstation/Desktop" can do what it wants -- which seems to be the reference implementation for GNOME and that's it -- but "Fedora", in the context of being the successor to RHL, the owner of rawhide, and the de facto upstream for the entire Red Hat ecosystem, *MUST* change. In trying to be all things to all people, while focusing on Lennart's laptop, it's consistently targeting questionable cadences and questionable technical movements, frustrating the wider Red Hat ecosystem and community, all while naively claiming to be doing something for Linux on the Laptop.

"Fedora Project" as a project should factor out everything that applies to all RPM downstreams and focus on the core, incremental infrastructure necessary to keep that running. Enforce RPM standards that *can* *now* be applied and used by all distributions. Enhance copr, pagure, and so forth to provide resources for the entire RH community.

"Fedora Project" can and should QA and release reference releases for this core functionality, but those are indeed intended to be reference releases, of interest mostly to Linux enthusiasts, distribution developers, and theoreticians, without intent of these *necessarily* being market winners in and of themselves. "Fedora Workstation" can be a market winner. CentOS can be a market winner, a RH clone that doesn't use systemd as PID 1 can be a market winner, a tiny RPM-distro for embedded can be a market winner, but Fedora reference releases shouldn't.

Once steps like this start being taken, the RH community as a whole will start functioning better again.

Fedora, UUIDs, and user tracking

Posted Jan 17, 2019 22:49 UTC (Thu) by johannbg (guest, #65743) [Link] (11 responses)

"a RH clone that doesn't use systemd as PID 1 can be a market winner"

I would argue quite the opposite as in an distribution ( not just RH ) that is not using systemd cannot be a market winner.

Modern infrastructure,cloud, container vendors and upstreams are heavily using systemd features to deploy,test and run on it.

Fedora, UUIDs, and user tracking

Posted Jan 17, 2019 23:12 UTC (Thu) by jccleaver (guest, #127418) [Link] (10 responses)

> I would argue quite the opposite as in an distribution ( not just RH ) that is not using systemd cannot be a market winner.

There's still a lot of EL6 out there in deployment

Fedora, UUIDs, and user tracking

Posted Jan 18, 2019 0:32 UTC (Fri) by johannbg (guest, #65743) [Link] (9 responses)

Based on my experience maintaining rhel and centos for a decade or so in various infrastructure setups, staying on older rhel release was done due to complications and or lack of time upgrading, rather than ease of use or api stability requirements.

Just this evening I did 10 fresh fedora installs ( install + test + delete ) and two debians for the same purpose ( confirm if bug was present there as well ) all in ca half an hour which is something I would never have done if either os or both would still be on legacy sys v init, let alone in half an hour.

Fedora, UUIDs, and user tracking

Posted Jan 18, 2019 6:47 UTC (Fri) by jccleaver (guest, #127418) [Link] (8 responses)

> Based on my experience maintaining rhel and centos for a decade or so in various infrastructure setups, staying on older rhel release was done due to complications and or lack of time upgrading, rather than ease of use or api stability requirements.

Well, one man's "complication" is another man's API stability requirement, or general shared library conflict (ditto), or if-it's-not-broken-don't-fix-it decision. CentOS and RHEL 4 -> 5 was rough at times (especially if you were trying to use Xen), but EL5->EL6 was incredibly smooth. EL6->EL7, definitely not so much. Many I know who've been staying on EL6 were doing it explicitly and consciously, which didn't seem to be the case for EL5.

> Just this evening I did 10 fresh fedora installs ( install + test + delete ) and two debians for the same purpose ( confirm if bug was present there as well ) all in ca half an hour which is something I would never have done if either os or both would still be on legacy sys v init, let alone in half an hour.

If it's a "fresh fedora install" then you're either composing via new anaconda run, using kickstart, or cloning. I'm assuming by "install" you mean a kickstart, but having just done seven automated EL6 VM builds earlier today I'm hard pressed to think of why that would take so long or why you wouldn't be able to too. Kickstart should take a few minutes at most. Boot takes <1m. Do it in parallel depending on your provisioning system. Automating deployments is not difficult, and nothing about regular init blocks this.

Fedora, UUIDs, and user tracking

Posted Jan 18, 2019 8:22 UTC (Fri) by johannbg (guest, #65743) [Link] (7 responses)

The method you described was something I setup in the old days when servers where named after planets in the solar system,greek gods or something and administrators talked to them and routinely had hug them to keep them running. Expensive infrastructure model in todays datacenters.

Today if a server,vm misbehave it's just shoot in the head and one or more new instance is born to take it's place. systemd spawned containers only live for as long as they are needed and created with a git commit.

The time it takes to generate an image using official mirrors on this host with the method I used is 1m50.053s or so per host so roughly 15m went on building the images and another 15m in testing.

Would be faster probably if I bothered to run internal mirrors but then I would have to mirror 4 distro's ( Arch,Debian,Fedora,OpenSuse ) and different releases of those distro's but thats not high on my priority list thou that might change if fedora is about to track it's user base...

Fedora, UUIDs, and user tracking

Posted Jan 18, 2019 9:33 UTC (Fri) by pabs (subscriber, #43278) [Link] (5 responses)

> if a server,vm misbehave it's just shoot in the head

I never understood this, wouldn't you want to at least diagnose the problem and fix whatever caused the problem so that the problem goes away for future server/vm/container instances?

Fedora, UUIDs, and user tracking

Posted Jan 18, 2019 9:56 UTC (Fri) by amarao (guest, #87073) [Link] (3 responses)

>> if a server,vm misbehave it's just shoot in the head

>I never understood this, wouldn't you want to at least diagnose the problem and fix whatever caused the problem so that the problem goes away for future server/vm/container instances?

As operator I can give you the reason. The key reason is a pride. We CAN shoot any node to the head and keep system running. So everyone talking about 'cattle' not with intention to cause a massive extinction in VM population but as a token of pride.

For the real cases it's mixed. As much as I love to work with Linux, I know how badly it can start to behave in some conditions. One unresponsive block device may brick any server to the level when everything is screwed. Some processes are in TASK_UNINTERRUPTIBLE, and this is the end of the game for the well-behaving system.

Replaced drive got a new letter instead of the old one because old one died with traces - old name is still used.

tmpfs is huge breach in a memory logic, as you have buffers which can not be discarded and all monitoring do not know how to detect 'low memory' condition anymore.

etc, etc. Some of bugs are so 'vendor fault' that it's easier to reboot than to report bugs (example: https://github.com/amarao/lsi-sata-fuckup, bug is now 7 years old, and this script is still hangs IO on whole enclosure).

Sometimes it's a well-known fault in application, but it's easier to cover it with load balancer than to debug it.

To deal with all this, there is the approach when individual server does not need have more reliability then a hard drive. We do 'raids' of buggy servers and this is fine.

If some fault types are often enough it may worth to investigate it. Rare ones are just silently ignored (can't reproduce). If operator have time, s/he can go to server and do some debug, but this is operators time, and there is a bug tech. dept in whole infrastructure to waste it in rare issue of incompatibility of beep with colctr.

Fedora, UUIDs, and user tracking

Posted Jan 18, 2019 17:51 UTC (Fri) by jccleaver (guest, #127418) [Link] (2 responses)

> As operator I can give you the reason. The key reason is a pride. We CAN shoot any node to the head and keep system running. So everyone talking about 'cattle' not with intention to cause a massive extinction in VM population but as a token of pride.

Sure, I used to do this too with redundant HA systems. But once you've physically pulled the power plug on one box once and your service has kept working, you don't need to *keep doing that* to prove that to anyone anymore.

That's supposed to provide operational support for your service, not be part of the design whenever there's the slightest hint of something unusual.

Fedora, UUIDs, and user tracking

Posted Jan 23, 2019 18:52 UTC (Wed) by edgewood (subscriber, #1123) [Link] (1 responses)

You don't need to keep doing the test *if nothing changes*. If something does change, then you don't *know* that failover will still work the way you think it will, and the only way to be sure is to test it again.

Fedora, UUIDs, and user tracking

Posted Jan 23, 2019 19:38 UTC (Wed) by jccleaver (guest, #127418) [Link]

> You don't need to keep doing the test *if nothing changes*. If something does change, then you don't *know* that failover will still work the way you think it will, and the only way to be sure is to test it again.

The problem is that that leads to epistemological issues that fly in the face of real-world reasoning. Instead of worshiping at the altar of A/B tests, one can focus on the things that matter operationally. Stateless cattle are a tool, one of many, but they are not the end-all and be-all of operation any more than constant Gentoo-like recompilation for performance improvements "just in case" are.

Process matters, and the Operations side of DevOps have formed a new, artificially restrictive religion around this design methodology.

Fedora, UUIDs, and user tracking

Posted Jan 18, 2019 10:23 UTC (Fri) by johannbg (guest, #65743) [Link]

you ofcourse have centralized syslog server to diagnoze problems afterwards but you dont go down the rabbit hole of chasing every issue with every host,vm or container.

Pattern needs to have emerged before issue is looked at otherwize you could be waisting time, resources thus money chaising down anomaly, that corner case bug every one love...

Fedora, UUIDs, and user tracking

Posted Jan 18, 2019 17:47 UTC (Fri) by jccleaver (guest, #127418) [Link]

> The method you described was something I setup in the old days when servers where named after planets in the solar system,greek gods or something and administrators talked to them and routinely had hug them to keep them running. Expensive infrastructure model in todays datacenters.

Yes, I quite remember the "old days", but this entire model being pushed is a pendulum too far in the opposite direction. Very, very few entities have the actual need for cattle (c.f. https://xkcd.com/1737/ ) If you're Netflix and you have 500,000 hosts all decrypting H.264 or something and data locality concerns are handled elsewhere, then fine -- good for you. The rest don't have "cattle" like that, but they're not "pets" except for key systems... they're "fleets". And as any auto-mechanic knows, just because everything is the same model doesn't mean the workload has the same effect on a car.

> Today if a server,vm misbehave it's just shoot in the head and one or more new instance is born to take it's place. systemd spawned containers only live for as long as they are needed and created with a git commit.

That's not engineering, that's playing with tech.

> The time it takes to generate an image using official mirrors on this host with the method I used is 1m50.053s or so per host so roughly 15m went on building the images and another 15m in testing.
> Would be faster probably if I bothered to run internal mirrors but then I would have to mirror 4 distro's ( Arch,Debian,Fedora,OpenSuse ) and different releases of those distro's but thats not high on my priority list thou that might change if fedora is about to track it's user base...

Well, in that case you're just shooting yourself in the foot. Local mirrors for OS data is about as fundamental a thing as you'd want for provisioning, and in this day of TB of space everywhere there's no reason not to use one.

I haven't timed installs because I don't have split-second needs, but 2m sounds about right except for any local logic being doing in %post.