LWN.net Logo

The Journal - a proposed syslog replacement

The Journal - a proposed syslog replacement

Posted Nov 18, 2011 18:38 UTC (Fri) by gmaxwell (subscriber, #30048)
In reply to: The Journal - a proposed syslog replacement by jimreynold2nd
Parent article: The Journal - a proposed syslog replacement

You've got it backwards.

It's the most recent entry the confirms the ones before it. Taping something to the monitor won't help because I can just author a plausible history of what comes next.

Instead this makes every new entry confirm the history (like how bitcoin works), but in the case of journald there is nothing preventing you from rewriting the history after the most recent snapshot of it— and nothing to prove a particular snapshot is the good one except sending it off to a a secure location or using an external secure timestamping service.

And of course the attacker can still delete the logs unless you send all of them off to a secure location. ... and if you're doing that you really don't need any of this.

I fully agree with your point on undocumented binary formats. Thats about as anti-forensic as you can get. Though its not all bad, for example varnish uses binary logs but provides a cat tool that converts them into a normal text stream.. so your ability to grep them is not diminished. If handled well they could make the binary part only problematic for archival but not operations.


(Log in to post comments)

The Journal - a proposed syslog replacement

Posted Nov 18, 2011 18:40 UTC (Fri) by jubal (subscriber, #67202) [Link]

…then there's the question of recovering data from partially damaged files. This will *obviously* break the signature chain.

The Journal - a proposed syslog replacement

Posted Nov 19, 2011 7:05 UTC (Sat) by alankila (subscriber, #47141) [Link]

On the contrary, I think there's a significant degree of thought spent on especially forensic issues. Reading the blog post indicates that today, any tool can fake any PID for syslog, apparently, because syslog spends no effort validating the client-given PID value. There's apparently linux-specific way to find out the true PID of process connecting to the syslog facility, and systemd is using it.

Undocumented binary data doesn't mean it's somehow fundamentally unreadable. You just compile the library and use it to read the crap. And it's open source. Sheesh.

The Journal - a proposed syslog replacement

Posted Nov 19, 2011 7:27 UTC (Sat) by gmaxwell (subscriber, #30048) [Link]

It makes me sad that you appear to have not completely read my message.

I explicitly point out that you can use tools to read the logs, and that this works pretty well e.g. for varnish.

But your life will be very painful if you are trying to piece together data from hundreds of machines, and backups across long spans of time, with different and incompatible versions of the file format.

If the developers are not very careful about versioning you may find yourself unable to read data from backups, or worse getting silently corrupted or truncated results. This is a risk which is heightened by using binary logs. It's orthogonal to the PID smarts— which seems like a great idea even without the replace everything proposed.

The Journal - a proposed syslog replacement

Posted Nov 19, 2011 9:38 UTC (Sat) by alankila (subscriber, #47141) [Link]

Well, it is a well-understood worry at least.

Log files have a long life, potentially in order of decades, so that sets the level of backwards compatibility required. It is huge, and indicates that whatever the merits of not documenting the format, it will become set in stone anyway unless log conversion tools are provided which can perform the conversion and afterwards validate that every bit of the information is old version was preserved and correctly converted (which might be same as checking the hash value of the log entries).

Nevertheless, even if archived logs become unreadable, old versions of this software do not just vanish into the ether but remain runnable, at the limit through emulation of x86 instruction set and old linux kernel versions. So some solution will always exist.

Regardless, I'd say that the reasonable requirement is that every generated journald log file must remain readable forever, or a chain of provably non-lossy converters must be provided that can upgrade from the earliest version.

The Journal - a proposed syslog replacement

Posted Nov 19, 2011 16:59 UTC (Sat) by backslash (subscriber, #32022) [Link]

Nevertheless, even if archived logs become unreadable, old versions of this software do not just vanish into the ether but remain runnable, at the limit through emulation of x86 instruction set and old linux kernel versions. So some solution will always exist.

This is all open source and not binary only apple or windows.... Just recompile!!

The Journal - a proposed syslog replacement

Posted Nov 19, 2011 18:28 UTC (Sat) by alankila (subscriber, #47141) [Link]

Obviously you have not tried to recompile old software. There tends to be a significant porting effort because changes in build system (autotools, I hate you) and compiler code purity requirements may cause code to not compile anymore, or might segfault despite compiling. Additionally, any dependencies to libraries make things that much worse, because not only must that software compile but the old versions of the libraries must compile also.

Emulation at binary level through technique such as virtualization may therefore be far easier to achieve.

The Journal - a proposed syslog replacement

Posted Nov 21, 2011 14:28 UTC (Mon) by nix (subscriber, #2304) [Link]

It's the most recent entry the confirms the ones before it.
That's pretty much useless. Given that POSIX doesn't provide an API for inserting text in the middle of files, someone buggering the logs has to read() and re-write() all the data from the buggered point onwards (and is more likely to just copy-and-rewrite the whole file, for simplicity: it's not like the log buggerer is likely to care much about performance). At best you'll get a read() of the end of the log followed by a truncate() and re-write().

But if you do that, you're rewriting the end of the log anyway, so you can update all the hashes at the same time. The only way this will ever be secure is if the hashes are stored separately from the logs, streamed immediately over the network and stored on a non-connected box running a daemon which can answer the question 'what is the hash of message N' and 'what is the hash of the message immediately preceding message N'.

But there is no sign of such a scheme in journald: its design appears to militate against it much more than a straight-text logfile does, since you can rely on offsets in the latter remaining unchanged (so that an external file can point into them).

The Journal - a proposed syslog replacement

Posted Nov 21, 2011 15:47 UTC (Mon) by johill (subscriber, #25196) [Link]

This is an important observation -- I thought about this too (but never posted), especially wrt. the comparison they make to git. The thing is this though: in git, the HEAD is essentially recorded at many places around the world -- rewriting the tree will be detected by everybody. In a journal, such a forward-running checksum scheme is completely useless as you point out since nobody has a copy of the HEAD sha1sum.

Looks like either we're not being told the full picture or somebody got confused about why exactly this useful in git.

(To make it secure though you don't need to store *all* hashes elsewhere, you just need to send off the most current HEAD hash to secure storage, still the same problem though.)

The Journal - a proposed syslog replacement

Posted Nov 21, 2011 15:50 UTC (Mon) by johill (subscriber, #25196) [Link]

I note that they do say this though in their document, albeit a bit veiled (and the comparison to git was only made at KS I guess): "If the top-most hash is regularly saved to a secure write-only location, the full chain is authenticated by it."

It doesn't seem likely that anyone will ever have as easy ways to do that as with git.

The Journal - a proposed syslog replacement

Posted Nov 22, 2011 4:26 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]

Actually, you're on the right track!

Make a central PUBLIC server that simply accepts and stores triples of form: <host_id, timestamp, hash> (host_id is UUID).

That's it. You can use this public server to periodically send your hashes. You lose (almost) no privacy, since log messages themselves need not to be replicated.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds