LWN.net Logo

log compression

log compression

Posted Nov 20, 2011 7:02 UTC (Sun) by dlang (✭ supporter ✭, #313)
Parent article: The Journal - a proposed syslog replacement

the compression that they talk about here sounds very impressinve and sophisticated, but looking for identical strings of text and replacing them with shorter placeholders (i.e. pointers) is exactly what gzip, bzip2, etc do today. I would be surprised if the journal software was really able to do much better.

with many TB of real-world logs, I'm getting the following results

gzip -9 give me 10:1 compression

bzip2 -9 gives me 20:1 compression but is significantly slower

doing a zgrep or zcat from a compressed file is actually faster than grep or cat from the uncompressed file (on a fairly sophisticated disk array.

I haven't yet done tests on lzma compression


(Log in to post comments)

log compression

Posted Nov 20, 2011 10:28 UTC (Sun) by misc (subscriber, #73730) [Link]

I guess the compression will be done on the fly, and not once per day by cron.

log compression

Posted Nov 20, 2011 11:30 UTC (Sun) by dlang (✭ supporter ✭, #313) [Link]

if you have so few logs that you only rotate them once a day, do you really need the compression to happen any sooner?

I've got systems where I rotate the logs FAR more frequently (down to single digit minutes on some systems). This is a pretty large installation (architected to handle > 100K logs/sec)) but when people are claiming the need to optimize things for performance/space reasons, they need to work better than the existing solutions.

log compression

Posted Nov 20, 2011 17:41 UTC (Sun) by ovitters (subscriber, #27950) [Link]

I assume it'll be properly researched. It seems you're very happy with syslog. Journal will still allow you to use syslog. I don't see any big issue, except stop energy. After having pulseaudio and systemd, better to contribute than to try and avoid it :P

log compression

Posted Nov 20, 2011 19:19 UTC (Sun) by slashdot (guest, #22014) [Link]

It seems journald will support random access and indexing log entries, which are both hard or impossible with the naive application of compression you are citing.

log compression

Posted Nov 20, 2011 19:47 UTC (Sun) by quotemstr (subscriber, #45331) [Link]

So build a separate auxiliary file that sits *alongside* conventional log files. This auxiliary file can contain all the journal metadata - PIDs, precise timestamps, message GUIDs - and index some of them. The conventional syslog file would happily exist alongside the auxiliary file, unless disable by an administrator.

Programs that want to log to the journal could use a library that look like this:

struct journal_log_attribute
{
    enum journal_attribute_type type;
    union {
        journal_attribute_guid guid;
        journal_attribute_keyval keyval;
        journal_attribute_module module;
        /* etc */
    };
};

void
journal_vsyslog(
    int priority, 
    const char* msg, 
    va_list args, 
    struct journal_log_attribute* attributes[] /* NULL-terminated */)
{
    if (journald_is_active()) {
        journal_internal_vsyslog(priority, msg, args, attributes);
    }

    vsyslog(priority, msg, args); /* Ignore attributes */
}

Have you read the document?

Posted Nov 20, 2011 19:58 UTC (Sun) by khim (subscriber, #9252) [Link]

So build a separate auxiliary file that sits *alongside* conventional log files.

Have you read the design doc? This is exactly what journald does.

And if your idea is not to keep separate log with indexing and all additional goodies but to try to attach it to the existing textual file then this is stupudity beyond comprehension: referring to this scheme with "duct tape", "bailing wire", or "chewing gum" does a disservice to all three of those fine building materials.

Have you read the document?

Posted Nov 20, 2011 20:02 UTC (Sun) by quotemstr (subscriber, #45331) [Link]

> Have you read the design doc? This is exactly what journald does.

Except that, AIUI, the goal is to eventually have journal-only logging for some facilities. I don't want that to ever come to pass.

> attach it to the existing textual file

Why not? The index could point into a particular offset in a textual log, or just duplicate the contents of the textual log. The point is to ensure that all logs can be queried with plain-text tools and to _optionally_ provide richer information for these logs. The nightmare scenario is for some messages to appear only in the journal and for other messages to appear only in syslog files.

> stupudity beyond comprehension

Can we try to maintain *some* level of decorum here?

Have you read the document?

Posted Nov 20, 2011 20:13 UTC (Sun) by khim (subscriber, #9252) [Link]

> attach it to the existing textual file

Why not?

Because these files are already processed by quite a few different programs stitched together in non-obvious ways. To hope that you can keep all that synchronized... I'll wish you luck.

The nightmare scenario is for some messages to appear only in the journal and for other messages to appear only in syslog files.

If I understand correctly all messages pass the journald but plain old syslogd messages go to syslogd too. This means journald keeps everything no matter what.

log compression

Posted Nov 20, 2011 21:45 UTC (Sun) by dlang (✭ supporter ✭, #313) [Link]

That is a useful option to have for log messages, but it's also available today. you can store your log messages in a database (of many different kinds, including nosql variations) and have the data indexed umpteen ways.

At the risk of being dismissed as an old fogy, the power of unix is based on the tradition of having many simple tools work together rather than having one tool that tries to do everything for everyone.

Nowhere is this more the case than in logging.

How important the logs are to you will vary drastically (do you want the application to stall if the log can't be written, do you want to spool to disk and run the risk of filling your disk, or do you want to throw away the log message)

how you store the logs will vary drastically, and in many cases you may want to store them in multiple ways.

how you examine the logs for 'interesting' things will vary.

At my office we have all of the following in place

recording to flat files combining all the logs togeather

recording to flat files of specific types of log messages

recording to a nosql database cluster across a large farm of machines with everything indexed

opensouce tools to watch for 'interesting' events and notify us when they happen

custom tools to watch for 'interesting' events and notify us when they happen

commercial closed source tools to watch for 'interesting' events and notify us when they happen

with existing syslog, all of these things can work togeather and I can add other log processing as well

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds