LWN.net Logo

The Journal - a proposed syslog replacement

The Journal - a proposed syslog replacement

Posted Nov 18, 2011 21:07 UTC (Fri) by oak (subscriber, #2786)
In reply to: The Journal - a proposed syslog replacement by jubal
Parent article: The Journal - a proposed syslog replacement

Having done that kind of analysis twice for the same old computer (but for different disk) this fall, I can only concur.

A tool for which you could give a disk device and it would find all the syslog entries on that disk and order them correctly based on timestamps would be nice though. If they can provide that for "journald", then it might have some merit.

PS. you forgot from your list the fact that most this kind of binary formats take more space than text formats. When some program suddenly starts logging so much/fast data[1] to your syslog that your disk fills, it's nice if the syslog format overhead itself is minimum, it gives you more time to act.

[1] Syslog notices only repeats of previous message, when something logs two different lines constantly (from same or different, but related processes), that feature doesn't help.


(Log in to post comments)

The Journal - a proposed syslog replacement

Posted Nov 18, 2011 22:11 UTC (Fri) by raven667 (subscriber, #5198) [Link]

There is one area where this is not always true, storing pcap files of dropped packets will often take up less space than the iptables text logging. The raw pcaps also have more info than what is logged.

The Journal - a proposed syslog replacement

Posted Nov 18, 2011 23:14 UTC (Fri) by b0ti (guest, #81465) [Link]

>PS. you forgot from your list the fact that most this kind of binary formats take more space than text formats.
You might want to read the docs again:
"The fields, an entry consists off, are stored as individual objects in the journal file, which are then referenced by all entries, which need them. This saves substantial disk space since journal entries are usually highly repetitive (think: every local message will include the same _HOSTNAME= and _MACHINE_ID= field). Data fields are compressed in order to save disk space. The net effect is that even though substantially more meta data is logged by the journal than by classic syslog the disk footprint does not immediately reflect that."

The Journal - a proposed syslog replacement

Posted Nov 18, 2011 23:53 UTC (Fri) by jubal (subscriber, #67202) [Link]

That's grand. Now could you please tell me how would you recover that information in case of disk failure?

The Journal - a proposed syslog replacement

Posted Nov 19, 2011 1:40 UTC (Sat) by HelloWorld (guest, #56129) [Link]

> That's grand. Now could you please tell me how would you recover that information in case of disk failure?
Using the "cp" command and the backup, obviously.

The Journal - a proposed syslog replacement

Posted Nov 19, 2011 2:02 UTC (Sat) by jubal (subscriber, #67202) [Link]

I think that my sarcasm detector failed here. Surely you're not seriously telling me that – in the hypothetic case of a disk failure – the way to recover currently written log entries is to combine recovered, compressed and possibly damaged binary blobs with random compressed binary blobs? See, no matter if it was done intentionally or not, the redundancy of the syslog entries is a feature, not a bug.

The Journal - a proposed syslog replacement

Posted Nov 19, 2011 2:11 UTC (Sat) by HelloWorld (guest, #56129) [Link]

You can't trust data from a potentially damaged disk anyway. The only sane thing to do is make sure that your backup infrastructure is in place, so you don't need the data from the damaged disk. Or, if you can't afford to lose the data between two backups, use a RAID array.

The Journal - a proposed syslog replacement

Posted Nov 19, 2011 2:24 UTC (Sat) by jubal (subscriber, #67202) [Link]

You do realise that you might be looking for the data written directly before the crash? (Also: no, RAID is not a silver bullet, and please don't tell me that a SAN or NAS storage appliance will always prevent data loss – it should, but sometimes it can't). And remote log daemon won't help much if it's the remote log daemon's storage that just evaporated.

The Journal - a proposed syslog replacement

Posted Nov 19, 2011 3:16 UTC (Sat) by cwillu (subscriber, #67268) [Link]

"Trusted" is not a binary distinction.

Data off a broken disk isn't as reliable as data from a functional disk. However, data off a broken disk isn't completely unreliable, and data from a functional disk isn't completely reliable.

What matters is whether the data is useful, and in my experience, that data sitting on a broken disk is frequently useful.

The Journal - a proposed syslog replacement

Posted Nov 21, 2011 14:22 UTC (Mon) by nix (subscriber, #2304) [Link]

Quite. I have repeatedly in the past been able to get a lot of useful info from logs damaged by disk or fs damage. Sure, a few blocks were full of \0s or just plain garbage --- but the rest was readable. Who cares if the file as a whole no longer conforms to any formal grammar? Human beings don't need one!

But computers do. A binary->text tool would probably have given up in the face of such damage. At best it would go down a rarely-used hence buggy parser-error-recovery code path.

This depends on the design, actually

Posted Nov 21, 2011 18:27 UTC (Mon) by khim (subscriber, #9252) [Link]

But computers do. A binary->text tool would probably have given up in the face of such damage. At best it would go down a rarely-used hence buggy parser-error-recovery code path.

Computers do what they are programmed to do. If you'll write program which tolerates corrupt files then it'll just that. It's not as hard to do as you think: you don't need cryptohash for that, CRC32 is enough and it's exteremely fast novadays. But sure, you must plan for that in advance.

This is what Google does with it's petabytes of logs - works fine from what I've heard.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds