That newfangled Journal thing

Posted Nov 20, 2011 19:55 UTC (Sun) by mordae (guest, #54701)
Parent article: That newfangled Journal thing

I really like Journal idea, but where ho got me is the absence of logging over network. You see, we have a bunch of machines with a small rootfs on a flash drive and tmpfs /var. The rsyslog sends all messages to a central machine using RELP.

Now, if I implement Journal, I can either create small logs in the /var and rsync frequently (bad if the machine dies, I don't have the latest logs) or continue using this and gather logs in Journal on the central server. But then I won't have correct metadata, so why even bother?

So... logging to a network file system? Seriously?

That newfangled Journal thing

Posted Nov 20, 2011 20:23 UTC (Sun) by intgr (subscriber, #39733) [Link] (13 responses)

FTFA: "In the initial version journalds network support will be very simple"

"In a later version we plan to extend the journal minimally to support live remote logging, in both PUSH and PULL modes always using a local journal as buffer for a store-and-forward logic"

The sky isn't falling. In fact, such a store-and-forward model sounds like a much more robust method than syslog over TCP or UDP.

That newfangled Journal thing

Posted Nov 20, 2011 21:13 UTC (Sun) by dlang (guest, #313) [Link] (11 responses)

If you use rsyslog (default on most linux distros), you already have store and forwarding delivery available to you, complete with delivery confirmation, encryption, etc.

Lennart could get almost everything he is lookin for out of rsyslog today.

The hashes would be trivial to add (and for that matter, you could use a database store and have it do the normalization and hashing as part of the insert today)

The ability to easily combine or split files of logs is a very important feature, changing to a binary format that compresses early, but prevents the functionality of simple tools like split and cat is a huge step backwards.

The only think that this proposal actually gains is knowing what pid generated the message, and that only works on a local machine (as soon as you send the log remotely, the remote machine can no longer trust the pid)

Yet again: there are no difference...

Posted Nov 20, 2011 21:23 UTC (Sun) by khim (subscriber, #9252) [Link] (3 responses)

As soon as you send the log remotely, the remote machine can no longer trust the pid

Again: there are no difference between local and remote case. You can trust pid till the "intrusion moment" even if you log remotely and after that moment even local log is suspect.

I still fail to see where you get this weird ideas about local/remote dichotomy.

Yet again: there are no difference...

Posted Nov 20, 2011 21:35 UTC (Sun) by dlang (guest, #313) [Link] (2 responses)

I am not saying that you can trust it locally, I am just pointing out that the remote machine has no way of knowing if what is sent to it is valid or not. The only way to have the remote machine be able to trust the data it's sent is to have the full TPM lockdown in place (and trust that there is never a flaw that allows it to be broken)

As soon as you go through a second daemon on a local system, you have to trust that that daemon hasn't been broken.

As soon as you read a message from disk you have to trust that the file hasn't been tampered with (and if you hash the file or the messages to try and prevent this, you now have to trust that your store of valid hashes hasn't been tampered with)

I don't know what you wanted to say, but I DO know what you said...

Posted Nov 20, 2011 22:14 UTC (Sun) by khim (subscriber, #9252) [Link] (1 responses)

I am not saying that you can trust it locally,

Rilly? Perhaps my English is failing me, but I thought and that only works on a local machine was quite unambigous...

I am just pointing out that the remote machine has no way of knowing if what is sent to it is valid or not.

That's fair. But as I've noted there a little difference between local and remote case: if you know daemon and kernel are Ok you can trust the logs, if you don't know if they are Ok then you don't. Since the usual way to see if something is broken is to analyze logs, again, and they are available on both local and remote system... no, I don't get your point.

What makes logging over network so special and why can you trust info about pid in local case but not in remote case?

I don't know what you wanted to say, but I DO know what you said...

Posted Nov 20, 2011 22:47 UTC (Sun) by dlang (guest, #313) [Link]

Ok, I should have worded it as "and that only has a chance of working on a local machine"

once you go to another machine, you no longer 'know' anything about what is really generating the message (unless you have crytographic authentication to the sending program, and event that only proves that the sender has access to the key)

That newfangled Journal thing

Posted Nov 30, 2011 7:34 UTC (Wed) by alison (subscriber, #63752) [Link] (6 responses)

dlang offers:
The hashes would be trivial to add (and for that matter, you could use a database store and have it do the normalization and hashing as part of the insert today)

Why not in fact use git (or similar) to frequently snapshot a flat ASCII log file, storing the deltas in a repo over which hashes are generated in the usual manner? The UUIDs need not be present in the ASCII log but can be git tags that are stored as part of the commit data. Then git can be used in its usual fashion to propagate the log to networked machines if desired. Furthermore, the generation of hashes and remote propagation provide the usual level of verifiability we associate with git.

In other words, why not keep the flat file, but generate a verifiable log in a standard machine-readable format from it on the fly? Is this not how we are already using DVCS to generate trees from programmer-generated source code? Why wouldn't git work just as well with programmer-formatted log messages?

That newfangled Journal thing

Posted Nov 30, 2011 8:49 UTC (Wed) by dlang (guest, #313) [Link] (5 responses)

git is horrible overkill and inefficient if all you are doing is dealing with one log file.

tripwire, ossec, and other similar tools already can track a log file and detect the difference between the file being extended and the file being modified.

Also, if you have another machine you can send data to, just send the logs using the standard syslog mechanisms. unless you are generating gigantic amounts (hundreds of thousands of logs per second) of logs or would be logging over a wan, the bandwidth needed for the logs is just not going to be significant

That newfangled Journal thing

Posted Nov 30, 2011 22:20 UTC (Wed) by elanthis (guest, #6227) [Link] (4 responses)

I love it. Bigotry and emotions run rampant in technical discussions more and more these days.

Argument: journald is more complex than syslogd, and complexity is evil!

Proposal: use the simpler syslogd and then add complex and error-prone log parsing toolkits to get 70% of the features of journald at 200% the complexity cost.

Rationale: Pulseaudio was buggy a couple years ago.

That newfangled Journal thing

Posted Nov 30, 2011 23:26 UTC (Wed) by dlang (guest, #313) [Link] (3 responses)

for most people, the new features of journald won't matter, they either don't have any worm device to store the hashes to to make things secure, ot they don't care about such features because they send all of the logs to a remote system.

he's 'solving' a problem that isn't really there, and he doesn't actually solve the stated problem.

Also, please point out anywhere that I have said anything about pulseaudio.

That newfangled Journal thing

Posted Nov 30, 2011 23:34 UTC (Wed) by anselm (subscriber, #2796) [Link] (2 responses)

Read the <expletive> proposal. Journald isn't just about the hashes.

That newfangled Journal thing

Posted Nov 30, 2011 23:44 UTC (Wed) by dlang (guest, #313) [Link] (1 responses)

I have read the proposal, and I still think that overall he is solving problems that don't exist in ways that don't really solve the stated problem

if you want to ignore the hashes part of things, we can talk about the structured log part of things. logs are only as structured as the programmer creating them makes them, if you have a super-detailed log structure available and the programmer creates a field "details" type "string" and puts everything into that field it is going to be just as unstructured as syslog traditionally has been.

He ignores or is ignorant of recent standards in syslog (some of which go back quite a few years)

he has a few new ideas buried in the proposal, but they are so overshadowed by misstatements and 'solutions' to problems that already have documented, standardized solutions (that are not compatible with the proposed solution other than running logging services in parallel) that it undermines the entire proposal

That newfangled Journal thing

Posted Dec 1, 2011 17:03 UTC (Thu) by kh (guest, #19413) [Link]

Thank you for being a voice of reason.

That newfangled Journal thing

Posted Nov 21, 2011 8:00 UTC (Mon) by mordae (guest, #54701) [Link]

Damn, I've missed that. I don't know how I've gotten the impression that they ar not going to bother with network logging.

Ad security: The security is not the only reason for logging! It's also about overall serviceability of the infrastructure. And in that case... who cares about trusting remote hosts. If libvirt crashes, I need to know why and in that moment I really don't speculate about incursion.