Yet another systemd fiasco

Posted Nov 18, 2014 10:43 UTC (Tue) by nim-nim (subscriber, #34454)
In reply to: Yet another systemd fiasco by Cyberax
Parent article: Russ Allbery leaves the Debian technical committee

> Journald is about the only pieces of systemd that is non-optional (apart
> from the core). And it works well where the previous logging systems
> failed us.

That's a bit rich. When a service fails systemd still tells you "there is a problem somewhere, go comb the logs, I won't tell you what's broken". Journald has not fixed anything there. What is broken is not the log format, but that systemd changed the way services are launched, so error handling needs to be redone. But it's more rewarding to write a new log facility than trudge through existing services to make them systemd-friendly.

Moreover last I've seen journald still could not cope with partial writes, so any time systemd gets stuck and you have to press reset journald will helpfully bin all the logs. It's too much work to try to rescue the logs files in their wonderful binary format (you have to press reset since the same laptop crowd which is enthusiastic about systemd decided to remove the acpi power binding that used to permit graceful reboots when the console was lost. Some laptops have badly placed power buttons so the "solution" was to make power buttons useless for everyone.)

I don't even try to fix my systems anymore since systemd made them a windows-like blackbox. On one of them, successful booting requires adding rescue to the kernel CLI, log in as admin, exit immediately (type exit, do nothing) because otherwise systemd will forget to launch gdm or any other console. And of course it would be too much to ask for it to display why it is stuck when it is stuck (sysVinit managed it fine, as did /var/log/messages, the last lines before a reboot almost always matched the problem. With systemd? It could be anywhere!)

Or I could write about the way it fails to restore ethernet links when you plug the wire back in (no wifi! no firmware! just respond to the link active event!). But you get a nice gnome notification about it not working (I prefered the old no notification and working system)

The whole thing is tremendendously brittle and only works in perfect conditions (and someone not even then). The emphasis on restarting stuff blindly instead of diagnosing is telling: systemd people prefer working on feature creep rather than trying to untangle the spaghetti boot they've inflicted on everyone.

Or should I write about how systemd ends up in a strange state every time it is updated? The solution? Reboot! The systemd masterminds don't want to deal with live updates so they just declared they are wrong and should not happen.

Anyway that's why systemd is such a sore. The people who write it seem to have decided they are so cool they can't be wrong, that they badly need to replace all the other software they don't agree with (actually, they don't agree with the other software maintainers, easier to replace them with their own partial implementations than work with existing projects), and in the meantime the actual problems systemd created are not worked on, they are handwaved away (just like the audio problems pulse triggered were declared someone else's problem and you had to dump all the affected hardware since it had been rendered useless).

Pity anyone who hits a systemd problem that does not fit in systemd's master plan.

Yet another systemd fiasco

Posted Nov 18, 2014 10:57 UTC (Tue) by niner (subscriber, #26151) [Link] (8 responses)

"so any time systemd gets stuck and you have to press reset journald will helpfully bin all the logs"

Wrong. Factually wrong. And here's why:

Lennart Poettering 2014-10-08 20:27:49 UTC
Since this bugyilla report is apparently sometimes linked these days as an example how we wouldn't fix a major bug in systemd:

Journal files are mostly append-only files. We keep adding to the end as we go, only updating minimal indexes and bookkeeping in the front earlier parts of the files. These files are rotated (rotation = renamed and replaced by a new one) from time to time, based on certain conditions, such as time, file size, and also when we find the files to be corrupted. As soon as they rotate they are entirely read-only, never modified again. When you use a tool like "journalctl" to read the journal files both the active and the rotated files are implicitly merged, so that they appear as a single stream again.

Now, our strategy to rotate-on-corruption is the safest thing we can do, as we make sure that the internal corruption is frozen in time, and not attempted to be "fixed" by a tool, that might end up making things worse. After all, in the case the often-run writing code really fucks something up, then it is not necessarily a good idea to try to make it better by running a tool on it that tries to fix it up again, a tool that is necessarily a lot more complex, and also less tested.

Now, of course, having corrupted files isn't great, and we should make sure the files even when corrupted stay as accessible as possible. Hence: the code that reads the journal files is actually written in a way that tries to make the best of corrupted files, and tries to read of them as much as possible, with the the subset of the file that is still valid. We do this implicitly on every access.

Hence: journalctl implicitly does on read what a theoretical journal file fsck tool would do, but without actually making this persistent. This logic also has a major benefit: as our reader gets better and learns to deal with more types of corruptions you immediately benefit of it, even for old files!

File systems such as ext4 have an fsck tool since they don't have the luxury to just rotate the fs away and fix the structure on read: they have to use the same file system for all future writes, and they thus need to try hard to make the existing data workable again.

I hope this explains the rationale here a bit more.

Yet another systemd fiasco

Posted Nov 18, 2014 11:51 UTC (Tue) by nim-nim (subscriber, #34454) [Link] (7 responses)

So it will only bin the part of the logs where the problem occurred. That's a great help!

Yet another systemd fiasco

Posted Nov 18, 2014 11:53 UTC (Tue) by niner (subscriber, #26151) [Link]

Exactly! Same as all syslog implementations that I know of.

Yet another systemd fiasco

Posted Nov 18, 2014 12:07 UTC (Tue) by tomegun (guest, #56697) [Link] (5 responses)

No, it will not bin any part of the logs. What it will do however, is possibly not be able to extract the last log entry which was (partially) written at the time you did a hard-reset. This is no different from any other log format, if you do a hard-reset whilst appending a string to a file you are not guaranteed that the whole string will be readable afterwards...

Yet another systemd fiasco

Posted Nov 18, 2014 19:13 UTC (Tue) by nim-nim (subscriber, #34454) [Link] (4 responses)

Actually, with text logging you *can* read logs till the problem occurrence (including the last partially written log entry, which is the most interesting one for diagnosis purposes). And you do not need any specialist tool for that.

Yet another systemd fiasco

Posted Nov 18, 2014 19:37 UTC (Tue) by cebewee (guest, #94775) [Link] (3 responses)

And the claim of the systemd people is that you can still read the journald logs, except for the corrupted part (which hopefully(?) includes everything up to the corrupted entry).

To prevent further corruption, the corrupted file is then moved away and new entries are written to a fresh logfile. When reading the logs, journalctl does on-the-fly fsck, but without ever writing the salvaged parts back to disk.

Yet another systemd fiasco

Posted Nov 18, 2014 19:40 UTC (Tue) by dlang (guest, #313) [Link] (2 responses)

it's only moved away if the corruption is detected. If it's only detected at a later time when the file is read, there can be a large delay and therefor a lot of logs lost.

If systemd could detect that the message was corrupted as it was writing it, why would it write the corrupted data in the first place?

Yet another systemd fiasco

Posted Nov 18, 2014 20:11 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]

You can fix journal files manually or in the worst case simply eyeball the binary. It's not like jurnald is done kind of over complicated database.

Yet another systemd fiasco

Posted Nov 22, 2014 4:34 UTC (Sat) by zuki (subscriber, #41808) [Link]

> it's only moved away if the corruption is detected. If it's only detected
> at a later time when the file is read, there can be a large delay and
> therefor a lot of logs lost.
The file is "moved away" (actually renamed) but it is still read. So journald will not write to the file, but the clients (e.g. journalctl) will still look at the renamed file and read as much of it as possible.

And "corruption" is detected when the file is opened for writing. Journald sets a flag in the header, and will unset it before closing the file. So "corruption" often simply means that the file was not closed properly, but there might be nothing wrong apart for the flag. But in this scenario it is safer to rename the file and create a new one. No content is lost either way.

I'll admit that this is not bullet-proof, and there were various bugs where corrupted files would throw journalctl off, but they are getting fixed. Such reports are definitely much less common than they used to be. Anyway, those are bugs in the implementation, nothing fundamental in the design.

Yet another systemd fiasco

Posted Nov 18, 2014 12:40 UTC (Tue) by smurf (subscriber, #17840) [Link] (3 responses)

It doesn't say "go comb the logs", it says "run journalctl", which has a lot of very helpful filtering options so that you can find the relevant parts more quickly than grepping through a random heap of syslog files with partly-duplicate (how many log files does a kernel error message end up in?) and partly-nonexistent (that heap corruption message your daemon wrote to stderr which under sys5init ended up in /dev/null or …/console) information.

What's more, "systemctl status" shows you which processes of your service are still running, and the last couple of relevant lines from its log.

Oh yes, and it prevents the log from ever filling the whole disk if a process ever runs amok.

Which focus on restarting services are you talking about? Restart=No is the default.

And for the record, I haven't pressed Reset in ages. You might want to add "HandlePowerKey=reboot" to your logind.conf (and/or figure out where the actual bug is which prevents logind from obeying that button) instead of complaining about how good the old days were, which (a) they were not and (b) doesn't help anybody.

Yet another systemd fiasco

Posted Nov 19, 2014 19:40 UTC (Wed) by nim-nim (subscriber, #34454) [Link] (2 responses)

Thank you for the powerkey tip

As for the rest, since systemd does not seem to extract meaningful entries in "systemctl status" , "run journalctl" means exactly "go comb the logs",(that's the eternal problem of pretending dumlping things in a database without organisation will solve anything). The "run journalctl" message is hardly helpful, except for shifting the problem on the user. It's author didn't even bother to indicate the "very helpful filtering options"

Yet another systemd fiasco

Posted Nov 20, 2014 8:37 UTC (Thu) by smurf (subscriber, #17840) [Link] (1 responses)

Of course it does, it filters the log by the cgroup the service is in. That's my definition of "meaningful entries", at least in the absence of an AI system that can _think_ of what else might be relevant (or not).

Yet another systemd fiasco

Posted Nov 22, 2014 4:39 UTC (Sat) by zuki (subscriber, #41808) [Link]

... and it also includes messages from init and other sources about that service. Privileged processes can tag messages to be shown in 'systemtl status' or 'journalctl -u' output for some service. Setroubleshoot makes use of this to label selinux avc messages this way.