LWN.net Logo

The logger meets linux-kernel

The logger meets linux-kernel

Posted Jan 4, 2012 23:31 UTC (Wed) by neilbrown (subscriber, #359)
In reply to: The logger meets linux-kernel by dlang
Parent article: The logger meets linux-kernel

One thing that I think they want which current tools aren't so good at is speed.
It seems that the sort of logging that this is used for is debugging traces. i.e. hundreds of messages that you will almost never want to look at - so it needs to almost be free.
You only look when a problem occurs and then you want to see everything that happened in the last few minutes so that you can piece the history together.

So passing messages into the kernel needs to be fast and pulling them out simply should not happen in normal usage - they just get discarded.


(Log in to post comments)

The logger meets linux-kernel

Posted Jan 5, 2012 0:08 UTC (Thu) by cmccabe (guest, #60281) [Link]

I wrote a logging system that was designed to be truly "free."
See https://github.com/cmccabe/redfish/blob/master/util/fast_...
Also the .h, etc.

In order to make it truly fast, it had to be lockless (at least most of the time), thread-local (no cacheline ping-pong) and entirely in userspace.

I don't see how anything that involves a syscall can ever truly be "free." Even a system which just involves an mmap'ed ring buffer protected by a single mutex will tend to sharply limit scalability on the multi-core hardware of the future (and present).

Maybe Journald can accumulate a bunch of messages and send them to the kernel all at once. However, that begs the question: what happens to unflushed messages on a crash? They could use a signal handler to take care of this problem, but a lot of programs handle signals themselves.

The logger meets linux-kernel

Posted Jan 5, 2012 0:59 UTC (Thu) by neilbrown (subscriber, #359) [Link]

This is why I think it would be much more effective to focus on implementing a higher-level API.

Android can implement it with a trivial library that writes out to their in-kernel ring buffer. You can implement it using your awesome fast_log, and both can still run all the wonderful Android applications, each on the kernel of their choice.

Then when all the app developers start saying to Android "The log system from that McCabe guy is much better than yours - could you start providing that one instead", then we can all forget about the android-logger in the kernel and move on.

The logger meets linux-kernel

Posted Jan 5, 2012 1:27 UTC (Thu) by dlang (✭ supporter ✭, #313) [Link]

just make sure that this higher level ABI doesn't require ioctl calls for implementation.

The logger meets linux-kernel

Posted Jan 5, 2012 1:42 UTC (Thu) by cmccabe (guest, #60281) [Link]

Well, my log system doesn't address a lot of the issues that Android's does. For example, handling logs from different processes that don't trust each other.

It seems like the only reason they wanted this in the kernel in the first place was speed. Sending messages to a daemon over a socket or pipe was considered too slow. Unfortunately, they ended up adding a lot of policy like the log stream names "radio", "main", "events", etc. This was also the rationale for putting Binder and friends in kernel space-- to get that nice zero-copy performance along with the security they wanted. Maybe what we really need is a secure circular buffer IPC primitive rather than all these special-purpose widgets.

It's interesting that the Android guys went with unstructured logging. They control the entire userspace, so presumably it would be a lot easier for them to force developers to write structured log messages than it will be for Red Hat to get people to use structured Journal logs. There is also something bizarre about the fact that 1970s-era UNIX systems were fine using unstructured logging, and small cell phone systems are fine using it, but people claim that it is too inefficient to run on servers. However, maybe there's something I'm overlooking and the current focus on structured journald logging will be useful. Time will tell, I guess.

The logger meets linux-kernel

Posted Jan 9, 2012 15:07 UTC (Mon) by sorpigal (subscriber, #36106) [Link]

>There is also something bizarre about the fact that 1970s-era UNIX systems were fine using unstructured logging, and small cell phone systems are fine using it, but people claim that it is too inefficient to run on servers. However, maybe there's something I'm overlooking and the current focus on structured journald logging will be useful. Time will tell, I guess.

I chalk it up to simple Windows-envy.

The logger meets linux-kernel

Posted Jan 14, 2012 3:40 UTC (Sat) by swetland (subscriber, #63414) [Link]

Performance was definitely one concern, especially on earlier ARM9 devices where context switches involved cache flushes due to virtually tagged caches.

Unstructured (or really a bit semistructured -- timestamps, pid, loglevel are structured, message content is not) logs are partially to avoid having to have knowledge of the precise software running or maintaining an ever-expanding definition of log message formats to interpret the logs from an arbitrary device that's encountered. Very valuable in a large ecosystem with a huge variety of devices, software versions, etc.

The name/sizes stuff should just move to platformdata / module params / or the like. An oversight that's easily enough corrected.

The logger meets linux-kernel

Posted Jan 5, 2012 0:16 UTC (Thu) by dlang (✭ supporter ✭, #313) [Link]

how fast do you need to be? rsyslog has been measured at 1M logs/sec for fairly trivial situations (which this would probably be).

however, I'll bet that you are really talking about much lower data rates, I'd be surprised if you needed much over 100 logs/sec and amazed if you need over 1000 logs/sec (remember, we're not talking about logs from the kernel here, we're talking about logs from userspace, the type of thing that is sent to syslog on a traditional linux distro, or is written to a app-specific log file)

the fact that the kernel is involved at all is simply a desire from the android developers to avoid having a userspace daemon.

for the trivial capabilities that the logger currently has (listen to a filesystem name, accept anything written to it into a circular buffer, and regurgitate the buffer if requested), it seems to me that a pretty trivial userspace daemon should be able to do this. I still haven't seen any explanation (other than the "we didn't want another daemon running" one) for why the kernel should be involved at all.

The logger meets linux-kernel

Posted Jan 5, 2012 0:52 UTC (Thu) by neilbrown (subscriber, #359) [Link]

By "fast" I didn't mean raw bytes-per-second speed but rather how much it slows down the rest of the system. Android needs to run on slow phones as well a quad-core monsters and we don't want to waste cycles on logs that probably aren't going to be read.

I don't think we are just talking about the stuff that a normal distro sends to syslog. Brian observed that app developers complain that other apps send too many message to the log so "their" much more important messages fall off the end of the log. This suggests a high rate of log-spam.

The kernel is *always* involved. Even if you use shared memory, who do you think it is that maps the memory into each process on the relevant page faults?

Yes: by pre-faulting large shared memory images and using futexs you could interleave logs from multiple processes with little interaction with the kernel (if each process ran on its own processor) but that doesn't fit the android security model. Each process has it's own uid and doesn't trust other processes (or at least the log system doesn't trust any of it's clients beyond the fixed API). So shared-memory logging doesn't really meet the need.

The logger meets linux-kernel

Posted Jan 5, 2012 1:26 UTC (Thu) by dlang (✭ supporter ✭, #313) [Link]

if you want to define it, you can define any inter-process communication to involve the kernel, but most people would not talk the kernel to provide standard syslog /dev/log functionality, even though the kernel is involved.

given that they have a rather tiny fixed-size buffer (256K was mentioned), having enough logs generated to fall off the end does not really equate to that large a data rate.

But even if high performance is needed (and note that I am still not agreeing that it is), that doesn't, by itself, justify implementing something in the kernel. That's why we no longer have the tux webserver in the kernel, and there have been many other things proposed that "required" kernel level implementations for performance reasons, that ended up not going in to the kernel.

the fact that we haven't been able to get anyone to talk about data rates other than the very generic "overrunning a 256K buffer in seconds" doesn't do much to justify this going into the kernel.

I'll bet just about any linux system generates > 256k of syslog messages in 'seconds' as it boots up.

In fact, I have a server that I just rebooted, in it's first second, syslog shows >100K of message data in syslog (and this is a pretty stripped down kernel and boot, I'll bet standard distro kernels end up logging quite a bit more)

The logger meets linux-kernel

Posted Jan 5, 2012 16:41 UTC (Thu) by PaulMcKenney (subscriber, #9624) [Link]

Battery lifetime is critically important to the Android folks. This is related to the raw performance metrics you are thinking in terms of, but not exactly the same. And the differences are part of the reason that they don't want to have another user-space daemon running.

The logger meets linux-kernel

Posted Jan 5, 2012 17:39 UTC (Thu) by dlang (✭ supporter ✭, #313) [Link]

remember that the userspace daemon can be suspended along with the rest of userspace, so it's not keeping the system alive if nothing else is.

and in any case, where are the numbers?

how much traffic are we actually talking about here, what how much processor does it take to do this in kernel space, how much more in userspace (and even if it's 100x more, is it enough to actually matter?)

What matters

Posted Jan 5, 2012 18:03 UTC (Thu) by PaulMcKenney (subscriber, #9624) [Link]

Heh! Over the past couple of years, I have been repeatedly taken to task for RCU's being insufficiently aggressive about allowing the scheduler-clock tick being disabled. As in a handful of ticks at the beginning of an idle period.

So the sensitivity to CPU consumption is way higher than I would have guessed a couple of years ago. In addition, there are other well-known issues that arise when relying on a separate daemon in addition to efficiency. I have run into them many times over the years (make that decades), and I would guess that you have as well.

It is instructive to turn the question around. First, start with the assumption that the Android guys actually are competent in their area of expertise. Then ask yourself why a userspace daemon might be problematic for them.

What matters

Posted Jan 5, 2012 18:58 UTC (Thu) by dlang (✭ supporter ✭, #313) [Link]

the problem is that by that logic, just about everything should be in the kernel.

I really believe that this is premature optimization and micro-benchmark optimization.

to restate an example I've posted before, several years ago at $work, the engineers ran a benchmark that showed a 60x performance improvement (from 30 seconds to .5 seconds) for a read_config() function by creating a shared memory segment and populating that rather than reading from disk.

the problem is that this benchmark was for 100,000 reads of the config file, and the real-world usage showed that this function was executed 50,000 times in an hour across >50 machines. This 6-month engineering project saved a total of 15 cpu seconds in a peak hour across 200 cores.

sometimes "it's faster", even for high multipliers of how much faster just don't matter because the function just isn't called frequently enough.

I have some firewalls that run 'inefficient' fork-for-every-connection proxies (horribly inefficient, right), but on a peak day these systems (on low-end hardware) still don't hit a loadaverage of 0.2, so the inefficiency really doesn't matter.

application logging just isn't that heavy, even on a 'slow' cell phone processor.

What matters

Posted Jan 5, 2012 20:48 UTC (Thu) by PaulMcKenney (subscriber, #9624) [Link]

OK, so the short version of this story is that you are not willing to assume that the Android guys are competent within their area of expertise.

Fair enough.

Just don't be surprised when others return the favor by being unwilling to assume that you are competent within your area of expertise.

What matters

Posted Jan 5, 2012 21:26 UTC (Thu) by dlang (✭ supporter ✭, #313) [Link]

when it comes to performance claims, nobody's claims of need, without the backing of numbers should be believed.

In general, this is exactly how it works in kernel development.

What matters

Posted Jan 5, 2012 22:10 UTC (Thu) by PaulMcKenney (subscriber, #9624) [Link]

Interesting.

How did you determine that the Android guys' avoidance of user-mode daemons was strictly due to performance issues?

What matters

Posted Jan 5, 2012 22:51 UTC (Thu) by dlang (✭ supporter ✭, #313) [Link]

I'm basing this on their statements on l-k in the logger thread. It's always possible that I'm misunderstanding what they are meaning or missed something

What matters

Posted Jan 5, 2012 23:13 UTC (Thu) by dlang (✭ supporter ✭, #313) [Link]

unfortunately with my schedule for the last month and the next few weeks, I have no time to try this, but I don't think that it would be very hard to hack together a user-space replacement for the journal that listened on a few unix sockets, stored the results in a circular buffer, and provide a control socket for requesting the data get dumped out

What matters

Posted Jan 6, 2012 16:35 UTC (Fri) by PaulMcKenney (subscriber, #9624) [Link]

It would be really cool if you did get a chance to try this out!

One caution: It is difficult to get requirements from the Android guys. Not because they are uncooperative, but because they are only conscious of some of the requirements. The rest are so second-nature to them that they don't think of them. This means that working on a suitable replacement is an iterative process, where you learn the requirements by repeatedly implementing your best guess at each point in time.

John Stultz has gone through much of this process with his alarm-timers work, and he does appear to be getting close to a solution that is useful to both the Android guys and to non-smartphone software.

I have been on both sides of this. Back in the 1990s, those of us at Sequent habitually wrote parallel programs, but could not describe how to do it to others. We did try, but the results were not so good. Instead, we used apprenticeship-style training: We put the new guy in a cube surrounded by people who knew parallel programming, and after a few months, the new guy would have no problem writing competent parallel code.

I was on the other side while gathering wakelock requirements in 2010 and 2011. The “Avoid Thundering Herds of Timers” requirement is an example of an important requirement that was not at all obvious to me, but so obvious to the Android guys that they didn't think to state it.

Sort of like you would probably not see the need to put “the acceleration of gravity at the site is 9.8 meters per second squared” when creating specs for a builder. ;–)

What matters

Posted Jan 12, 2012 23:20 UTC (Thu) by Kluge (guest, #2881) [Link]

"Sort of like you would probably not see the need to put “the acceleration of gravity at the site is 9.8 meters per second squared” when creating specs for a builder. ;–)"

Does this mean that Android developers are from Mars, kernel hackers from Venus? ;-)

Mars and Venus

Posted Jan 13, 2012 0:54 UTC (Fri) by PaulMcKenney (subscriber, #9624) [Link]

;-) ;-) ;-)

What matters

Posted Jan 18, 2012 6:02 UTC (Wed) by kevinm (guest, #69913) [Link]

It's not the time taken by the extra cycles that matters - of course a user space daemon could keep up with the rate of message, even on a slow processor.

Rather, it's the fact that every extra cycle wasted is an extra cycle that the CPU must remain in a higher power state, and thus represents an incremental hit to the battery life. Recall in particular that context-switching from one userspace process to another requires a cache-flush on older ARM processors.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds