User: Password:
|
|
Subscribe / Log in / New account

Development

Log message classification with syslog-ng

January 13, 2010

This article was contributed by Robert Fekete

Operating systems, applications, and network devices generate text messages of the events that happen to them: a user logs in, a file is created, a network connection is opened to a remote host, etc. These messages, called log messages, can be used to detect security incidents, operational problems, policy violations, and are useful in auditing and forensics situations. Traditionally, classifying log messages has been done external to the syslog system, with various log file analysis utilities, but a new feature in syslog-ng seeks to do that processing within the syslog daemon itself. By using a simpler syntax for describing log messages, along with a fast mechanism for recognizing them, message classification in syslog-ng can decrease the need for log file post-analysis, which will help ease the burden for system administrators.

Log messages do not have a predefined content, they can be straightforward or obscure, depending on the attitude of the developer who wrote them. Either way, most of the time they are written with human readers in mind. This ignores the fact that these days more and more companies and organizations collect the log messages of their computers on a central log server and try to process them automatically to detect break-in attempts, network errors, and other issues.

Classifying messages with syslog-ng attempts to remedy this situation by making it possible to add metadata (e.g., event type like user login, hardware error) to the log messages. It can also extract the relevant data (like the username) from the messages and determine what to do or where to store the log message based on this information. For example, if you need to create reports about specific events, you can collect the messages of the relevant events into a separate log file, which can be used as the basis of the reports.

A brief introduction to syslog and syslog-ng

Applications usually send their log messages to the system logging daemon of the operating system, which delivers the messages to the place where the log messages are stored: to log files on the local machine (found typically under /var/log/), or to a remote server. Most UNIX and Linux operating systems use the syslogd application as the system logging daemon. The syslog daemon adds some meta-information (called the syslog header) to the received log messages, like the date and time the message was received, or the name or address of the host where it was created.

The nine-year-old syslog-ng project is a popular, alternative syslog daemon — licensed under GPLv2 — that has established its name with reliable message transfer and flexible message filtering and sorting capabilities. In that time it has gained many new features including the direct logging to SQL databases, TLS-encrypted message transport, and the ability to parse and modify the content of log messages. The SUSE and openSUSE distributions use syslog-ng as their default syslog daemon.

In syslog-ng 3.0 a new message-parsing and classifying feature (dubbed pattern database or patterndb) was introduced. With recent improvements in 3.1 and the increasing demand for processing and analyzing log messages, a look at the syslog-ng capabilities is warranted.

The main task of a central syslog-ng log server is to collect the messages sent by the clients and route the messages to their appropriate destinations depending on the information received in the header of the syslog message or within the log message itself. Using various filters, it is possible to build even complex, tree-like log routes. For example:

[Log routes]

It is equally simple to modify the messages by using rewrite rules instead of filters if needed. Rewrite rules can do simple search-and-replace, but can also set a field of the message to a specific value: this comes handy when client does not properly format its log messages to comply with the syslog RFCs. (This is surprisingly common with routers and switches.) Version 3.1 of makes it possible to rewrite the structured data elements in messages that use the latest syslog message format (RFC5424).

Artificial ignorance

Classifying and identifying log messages has many uses. It can be useful for reporting and compliance, but can be also important from the security and system maintenance point of view. The syslog-ng pattern database is also advantageous if you are using the "artificial ignorance" log processing method, which was described by Marcus J. Ranum (MJR):

Artificial Ignorance - a process whereby you throw away the log entries you know aren't interesting. If there's anything left after you've thrown away the stuff you know isn't interesting, then the leftovers must be interesting.

Artificial ignorance is a method to detect the anomalies in a working system. In log analysis, this means recognizing and ignoring the regular, common log messages that result from the normal operation of the system, and therefore are not too interesting. However, new messages that have not appeared in the logs before can signify important events, and should therefore be investigated.

The syslog-ng pattern database

The syslog-ng application can compare the contents of the received log messages to a set of predefined message patterns. That way, syslog-ng is able to identify the exact log message and assign a class to the message that describes the event that has triggered the log message. By default, syslog-ng uses the unknown, system, security, and violation classes, but this can be customized, and further tags can be also assigned to the identified messages.

The traditional approach to identify log messages is to use regular expressions (as the logcheck project does for example). The syslog-ng pattern database uses radix trees for this task, and that has the following important advantages:

  • Classifying messages is fast, much faster than with methods based on regular expressions. The speed of processing a message is practically independent from the total number of patterns. What matters is the length of the message and the number of "similar" messages, as this affects the number of junctions in the radix tree.

  • Regular-expression based methods become increasingly slower as the number of patterns increases. Radix trees scale very well, because only a relatively small number of simple comparisons must be performed to parse the messages.

  • The syslog-ng message patterns are easy to write, understand, and maintain.

For example, compare the following:

A log message from an OpenSSH server:

    Accepted password for joe from 10.50.0.247 port 42156 ssh2
A regular expression that describes this log message and its variants:
    Accepted \ 
        (gssapi(-with-mic|-keyex)?|rsa|dsa|password|publickey|keyboard-interactive/pam) \
        for [^[:space:]]+ from [^[:space:]]+ port [0-9]+( (ssh|ssh2))? 
An equivalent pattern for the syslog-ng pattern database:
    Accepted @QSTRING:auth_method: @ for @QSTRING:username: @ from \ 
        @QSTRING:client_addr: @ port @NUMBER:port:@ @QSTRING:protocol_version: @

Obviously, log messages describing the same event can be different: they can contain data that varies from message to message, like usernames, IP addresses, timestamps, and so on. This is what makes parsing log messages with regular expressions so difficult. In syslog-ng, these parts of the messages can be covered with special fields called parsers, which are the constructs between '@' in the example. Such parsers process a specific type of data like a string (@STRING@), a number (@NUMBER@ or @FLOAT@), or IP address (@IPV4@, @IPV6@, or @IPVANY@). Also, parsers can be given a name and referenced in filters or as a macro in the names of log files or database tables.

It is also possible to parse the message until a specific ending character or string using the @ESTRING@ parser, or the text between two custom characters with the @QSTRING@ parser.

A syslog-ng pattern database is an XML file that stores patterns and various metadata about the patterns. The message patterns are sample messages that are used to identify the incoming messages; while metadata can include descriptions, custom tags, a message class — which is just a special type of tag — and name-value pairs (which are yet another type of tags).

The syslog-ng application has built-in macros for using the results of the classification: the .classifier.class macro contains the class assigned to the message (e.g., violation, security, or unknown) and the .classifier.rule_id macro contains the identifier of the message pattern that matched the message. It is also possible to filter on the tags assigned to a message. As with syslog, these routing rules are specified in the syslog-ng.conf file.

Using syslog-ng

In order to use these features, get syslog-ng 3.1 - older versions use an earlier and less complete database format. As most distributions still package version 2.x, you will probably have to download it from the syslog-ng download page.

The syntax of the pattern database file might seem a bit intimidating at first, but most of the elements are optional. Check The syslog-ng 3.1 Administrator Guide [PDF] and the sample database files to start with, and write to the mailing list if you run into problems.

A small utility called pdbtool is available in syslog-ng 3.1 to help the testing and management of pattern databases. It allows you to quickly check if a particular log message is recognized by the database, and also to merge the XML files into a single XML for syslog-ng. See pdbtool --help for details.

Closing remarks

The syslog-ng pattern database provides a powerful framework for classifying messages, but it is powerless without the message patterns that make it work. IT systems consist of several components running many applications, which means a lot of message patterns to create. This clearly calls for community effort to create a critical mass of patterns where all this becomes usable.

To start with, BalaBit - the developer of syslog-ng - has made a number of experimental pattern databases available. Currently, these files contain over 8000 patterns for over 200 applications and devices, including Apache, Postfix, Snort, and various common firewall appliances. The syslog-ng pattern databases are freely available for use under the terms of the Creative Commons Attribution-Noncommercial-Share Alike 3.0 (CC by-NC-SA) license.

A community site for sharing pattern databases is reportedly also under construction, but until this becomes a reality, pattern database related discussions and inquiries should go to the general syslog-ng mailing list.

Comments (4 posted)

System Applications

Database Software

PostgreSQL Weekly News

The January 10, 2010 edition of the PostgreSQL Weekly News is online with the latest PostgreSQL DBMS articles and resources.

Full Story (comments: none)

SQLObject 0.11.3 released

Version 0.11.3 of SQLObject has been announced, it is a minor bugfix release. "SQLObject is an object-relational mapper. Your database tables are described as classes, and rows are instances of those classes."

Full Story (comments: none)

SQLObject 0.12.1 released

Version 0.12.1 of SQLObject has been announced, it is a bugfix release. "SQLObject supports a number of backends: MySQL, PostgreSQL, SQLite, Firebird, Sybase, MSSQL and MaxDB (also known as SAPDB)."

Full Story (comments: none)

Embedded Systems

Arduino Ethernet 1.0b2 released

Version 1.0b2 of Arduino Ethernet has been announced. "Arduino Ethernet is a collection of libraries I have written to use within the Arduino programming environment in conjunction with the Ethernet shield." The Arduino is an open-hardware microprocessor platform.

Comments (none posted)

Interoperability

Samba 3.4.4 and 3.5.0rc1 released

Two new releases of Samba are available. Samba 3.4.4: "This is the latest stable release of the Samba 3.4 series." Samba 3.5.0rc1: "This is the first release candidate of Samba 3.5.0. This is *not* intended for production environments and is designed for testing purposes only."

Comments (none posted)

Package Management

RPM 4.8.0 released

Version 4.8.0 of RPM has been announced, it includes general bugfixes and enhancements.

Full Story (comments: none)

Virtualization Software

Anatomy of the libvirt virtualization library (developerWorks)

Over at developerWorks, M. Tim Jones examines the libvirt virtualization control API. "From just the small amount of capabilities that I've demonstrated in this article, you can see the power that libvirt provides. And as you can expect, there are a number of applications that are being successfully built on libvirt. One of the interesting applications is virsh (demonstrated here), which is a virtualization shell. There's also virt-install, which can be used to provision new domains from operating system distributions. The utility virt-clone can be used to clone a VM from another VM (covering both operating system and disk replication). Some of the higher-level applications include virt-manager, which is a general-purpose desktop-management tool, and virt-viewer, which is a lightweight tool for securely attaching to the graphical console of VMs."

Comments (7 posted)

Desktop Applications

Audio Applications

Amarok 2.2.2 released

Version 2.2.2 of the Amarok music player has been announced. "the Amarok team released version 2.2.2 of their music player today. This release includes the return of moodbar, custom labels and more."

Full Story (comments: none)

Klactoveedsedstene - a new MPlayer frontend

Viggo Simonsen has announced the Klactoveedsedstene project. "*Klactoveedsedstene* is an Audio Player frontend to the popular Mplayer engine, written in Java. It is very fast, light, simple - and with an advanced support for Album Art. It recognizes embedded Album Art, and is also mostly able to find the correct Album Art from the Internet, based on the "Artist" and "Album" ID3-tags".

Full Story (comments: none)

Desktop Environments

GNOME Software Announcements

The following new GNOME software has been announced this week: You can find more new GNOME software releases at gnomefiles.org.

Comments (none posted)

KDE Software Compilation 4.4 Release Candidate 1 is available

Version 4.4 Release Candidate 1 of KDE has been announced. "Release Candidate 1 provides a testing base for identifying bugs in the upcoming KDE Software Compilation 4.4, with its components the KDE Plasma Workspaces, the Applications powered by KDE, and the KDE Development Platform. The list of changes between 4.3 and 4.4 is especially long. Important changes can be observed all over the place".

Full Story (comments: none)

KDE Software Announcements

The following new KDE software has been announced this week: You can find more new KDE software releases at kde-apps.org.

Comments (none posted)

Xorg Software Announcements

The following new Xorg software has been announced this week: More information can be found on the X.Org Foundation wiki.

Comments (none posted)

Encryption Software

GPGME 1.3.0 released

Version 1.3.0 of GPGME has been announced, it includes a number of enhancements. "We are pleased to announce version 1.3.0 of GnuPG Made Easy, a library designed to make access to GnuPG easier for applications."

Full Story (comments: none)

Geographical Software

Location-aware search with Apache Lucene and Solr (developerWorks)

developerWorks has posted a lengthy and detailed article on the implementation of spatial searches with the Lucene and Solr libraries. "I'll start with a brief review of some key Lucene concepts, leaving the deeper details to the reader to research. Next, I'll cover some of the basic concepts of geospatial search. GIS is a large field that could easily consume this entire article and many more, so I will instead focus on some basic concepts that should be fairly intuitive given the need to find services, people, and other items of interest on a daily basis. I'll round out the article with some discussion of the approaches available for indexing and searching spatial information using Lucene and Solr. I'll ground these concepts in a real, albeit simple, example using data from the OpenStreetMap (OSM) project."

Comments (1 posted)

Interoperability

Wine 1.1.36 announced

Version 1.1.36 of Wine has been announced. Changes include: "- Completion of the 16-bit separation. - Improved Shader Model 4 support. - A ton of memory leak fixes. - Improved debugging support for MinGW. - A number of MSHTML fixes. - Various bug fixes."

Comments (none posted)

Mail Clients

Claws Mail 3.7.4 unleashed

Version 3.7.4 of Claws Mail has been announced, it includes new capabilities, bug fixes and translation work. "Claws Mail is a GTK+ based, user-friendly, lightweight, and fast email client."

Full Story (comments: none)

Claws Mail Extra Plugins 3.7.4 unleashed

Version 3.7.4 of Claws Mail Extra Plugins has been announced. "The claws-mail-extra-plugins-3.7.4 package contains 20 plugins, including 1 new plugin: GeoLocation!"

Full Story (comments: none)

Music Applications

guitarix 0.05.8-1 bugfix release

Version 0.05.8-1 of guitarix, an electric guitar amplifier simulator, has been announced. "I know, there are many guitarix release last month, but this release fix a memory leak witch we have oversee in a long run. I strongly recommend guitarix users to update to this version."

Full Story (comments: none)

Office Applications

RawTherapee 3.0 alpha 1 and license changes

Version 3.0 alpha 1 of RawTherapee, a RAW editor/workflow manager, has been announced, the software has just been released under the GPL. "The first alpha version of RawTherapee 3.0 is available for download. Note that this is not a feature complete version. In V3.0 both major GUI/workflow and algorithm changes are planned. This alpha version demonstrates the new, much more efficient GUI, but it does not contain any algorithmic changes yet." (Thanks to Spider).

Comments (none posted)

Office Suites

OpenOffice.org Newsletter

The December, 2009 edition of the OpenOffice.org Newsletter is out with the latest OO.o office suite articles and events.

Full Story (comments: none)

Languages and Tools

Caml

Caml Weekly News

The January 12, 2010 edition of the Caml Weekly News is out with new articles about the Caml language.

Full Story (comments: none)

Java

Jato 0.0.2 released

Version 0.0.2 of Jato, a JIT-only virtual machine for Java, is out. "Jato is a JIT-only virtual machine for Java that can run some Java applications under GNU/Linux on modern 32-bit x86 CPUs that support the SSE2 instruction set. A port to the x86-64 machine architecture is currently being developed. Jato depends on GNU Classpath to provide core Java runtime classes. The VM is licensed under the GPLv2 with GNU Classpath linking exception."

Full Story (comments: 1)

Python

Python 2.7 alpha 2 released

Version 2.7 alpha 2 of Python has been announced. "Python 2.7 is scheduled to be the last major version in the 2.x series. It includes many features that were first released in Python 3.1. The faster io module, the new nested with statement syntax, improved float repr, and the memoryview object have been backported from 3.1. Other features include an ordered dictionary implementation, unittests improvements, and support for ttk Tile in Tkinter."

Full Story (comments: none)

IMDbPY 4.4 released

Version 4.4 of IMDbPY has been announced. "IMDbPY is a Python package useful to retrieve and manage the data of the IMDb movie database about movies, people, characters and companies. In this release, a huge number of bugs were fixed and many parsers were made more robust."

Full Story (comments: none)

Sphinx 0.6.4 released

Version 0.6.4 of Sphinx has been announced, it includes over 20 bug fixes. "Sphinx is a tool that makes it easy to create intelligent and beautiful documentation for Python projects (or other documents consisting of multiple reStructuredText source files)."

Full Story (comments: none)

Python-URL! - weekly Python news and links

The January 13, 2010 edition of the Python-URL! is online with a new collection of Python article links.

Full Story (comments: none)

Tcl/Tk

Tcl-URL! - weekly Tcl news and links

The January 8, 2010 edition of the Tcl-URL! is online with new Tcl/Tk articles and resources.

Full Story (comments: none)

Page editor: Forrest Cook
Next page: Announcements>>


Copyright © 2010, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds