|
|
Subscribe / Log in / New account

Malcolm: SQL for the command line: "show"

In his blog, David Malcolm writes about "show", which is a SQL "select" statement that is used from the command line to query various log file formats. "This got me thinking. We have many different log formats, and many different sources of data. All of our tools seem to have different interfaces. [...] For example, why should I write regular expressions and shell pipelines to get at my logs? Why do I have to learn a custom syntax ("rpm -qa --queryformat='various things'") for looking at the software I have installed? Why does e.g. the audit subsystem have its own query format? [...] Why can't I just use SQL, and write SELECT statements to drill down into all of this data?"

to post comments

Malcolm: SQL for the command line: "show"

Posted Mar 23, 2009 16:27 UTC (Mon) by sjj (guest, #2020) [Link]

In Microsoft world, Log Parser does exactly this. It is extremely useful. And free (beer-like).

http://forums.iis.net/default.aspx?GroupID=51

Malcolm: SQL for the command line: "show"

Posted Mar 23, 2009 16:29 UTC (Mon) by clugstj (subscriber, #4020) [Link] (14 responses)

"Why can't I just use SQL"

Because SQL sucks as a general programming language.

Malcolm: SQL for the command line: "show"

Posted Mar 23, 2009 16:33 UTC (Mon) by flewellyn (subscriber, #5047) [Link] (13 responses)

Well, of course it does. It ISN'T a general programming language.

But, that's not what he wants to use it for. He only wants to use it for its problem domain: querying a data store of some kind. And in that domain, it's perhaps not the best thing ever, but it's quite suitable and certainly well-understood.

I just wonder how he'll handle issues like complex queries, with things like subqueries and joins and the like.

Malcolm: SQL for the command line: "show"

Posted Mar 23, 2009 16:53 UTC (Mon) by Wol (subscriber, #4433) [Link] (8 responses)

Use something like ENGLISH :-) (otherwise known as ACCESS, INFORM, RETRIEVE etc :-)

But seriously, if your data naturally fits a flat table, SQL is a good fit. Most data, however, doesn't.

Cheers,
Wol

Malcolm: SQL for the command line: "show"

Posted Mar 23, 2009 18:09 UTC (Mon) by mrshiny (guest, #4266) [Link]

Yeah but for analyzing structured log files such as the apache logs, this is the most awesome thing ever. I want this for my servers at work.

Malcolm: SQL for the command line: "show"

Posted Mar 23, 2009 18:51 UTC (Mon) by KGranade (guest, #56052) [Link] (6 responses)

First, as has been mentioned already, the data he's proposing accessing with SQL IS mostly (or totally, not sure) flat tables ( log files, /proc, tables of installed packages, sounds like flat tables to me ).

Secondly, you recommend using something like English, but I have no idea how the examples you present would be used, which means learning a specialized syntax, which means I might as well use a generalized syntax that is a moderately straightforward mapping of the concepts of database retrieval to English... which is a decent description of SQL.

<Insert overly-verbose rant about the pitfalls of "natural language programming here>

Malcolm: SQL for the command line: "show"

Posted Mar 24, 2009 0:21 UTC (Tue) by Wol (subscriber, #4433) [Link] (5 responses)

:-)

Sorry, but I did *not* say "use English". I said "use ENGLISH" (ENGLISH being a dedicated data access language).

ENGLISH is the original Pick data query language, and is a very good NFNF query tool (It's also called ENGLISH because it is, actually, very similar to English!) For example

SELECT INVOICE WITH INVOICE.TOTAL EQ 1600 AND WHERE LINE.ITEM EQ 215

will select all invoices where the invoice value is 1600 and any individual line is 215.

SQL is *not* a "moderately straightforward mapping of the concepts of database retrieval to English" - no way would I describe it as "moderately straightforward", and it is very relational-oriented. Using it to query a non-relational database is *horrid*.

Cheers,
Wol

Malcolm: SQL for the command line: "show"

Posted Mar 24, 2009 0:44 UTC (Tue) by flewellyn (subscriber, #5047) [Link] (4 responses)

Well, is it incorrect to interpret log files relationally?

Malcolm: SQL for the command line: "show"

Posted Mar 24, 2009 15:51 UTC (Tue) by Wol (subscriber, #4433) [Link] (3 responses)

Log files are typically two-dimensional :-)

So yes, SQL is probably a good language for querying them. But then, so is ENGLISH, because it's n-dimensional (actually, it doesn't work that well if n hits 4 or more :-( so horses for courses, I'd use ENGLISH because that's what I'm comfortable with.

Cheers,
Wol

Malcolm: SQL for the command line: "show"

Posted Mar 24, 2009 16:24 UTC (Tue) by flewellyn (subscriber, #5047) [Link] (2 responses)

That's fine, but who typically knows ENGLISH these days? SQL is far more common.

Malcolm: SQL for the command line: "show"

Posted Mar 26, 2009 14:06 UTC (Thu) by Wol (subscriber, #4433) [Link] (1 responses)

You'd be surprised ...

IBM are on record as saying that their version (the U2 databases) is the fastest growing product in their database VAR sales.

And International Spectrum is holding their conference right now - against a background of collapsing conference attendances (typically down 25 - 50 %), they're holding their own - I think they were down 7% (or was it up?)

imho relational *theory* is great. Unfortunately, relational practice falls foul of Einstein's corollary to Occam - practice is TOO simple, therefore system complexity (as in all the stuff *round* the database) rises sharply as a result. SQL queries are a classic example :-)

Cheers,
Wol

Malcolm: SQL for the command line: "show"

Posted Mar 28, 2009 0:55 UTC (Sat) by nix (subscriber, #2304) [Link]

SQL extensions are an even more classic example. Look at MODEL, for
instance. In case the relational model is 'too hard', now you can turn
your DB into a tiny spreadsheet and bash at it in the query. How
relational...

(actually it *is* useful, but that doesn't mean it's not totally bizarre
and screwy. The real problem here is SQL's halfassed incapable
implementation of half the relational calculus in a non-Turing-complete
fashion. But it paid for my house so I can't complain *too* terribly
hard.)

Malcolm: SQL for the command line: "show"

Posted Mar 23, 2009 20:30 UTC (Mon) by clugstj (subscriber, #4020) [Link] (1 responses)

OK, I was a little flippant before. A useful tool makes simple things simple and complex things possible. I see SQL in this case as only the first. Once you try to do something complex, you will have to throw it out and go back to a more general programming language.

Malcolm: SQL for the command line: "show"

Posted Mar 23, 2009 20:42 UTC (Mon) by flewellyn (subscriber, #5047) [Link]

It depends. SQL is really a domain-specific language, and it does queries very well. But it doesn't do other things that we might want. Having a query language which was fully general, and able to do any sort of general-purpose programming, would be nice. But then, what language to use?

Malcolm: SQL for the command line: "show"

Posted Mar 24, 2009 21:35 UTC (Tue) by skx (subscriber, #14652) [Link] (1 responses)

It is funny how people get hooked on using SQL for querying logfiles. Last year or so I wrote asql which I use for producing adhoc statistics form Apache logfiles.

Simple usage is:

$ asql 
asql v1.2 - type 'help' for help.
asql> load /home/www/www.steve.org.uk/logs/access.log
Loading: /home/www/www.steve.org.uk/logs/access.log
asql> SELECT source,SUM(size) AS Number FROM logs GROUP BY source ORDER BY Number DESC, source LIMIT 0,10;
67.195.37.112 4681922
74.6.17.185 2353628
87.120.8.52 2066975
77.36.6.72 1859180
...

Finding the top ten referers becomes:

asql> SELECT referer,COUNT(referer) AS number from logs WHERE referer NOT LIKE '%steve.org.uk%' GROUP BY referer ORDER BY number DESC,referer LIMIT 0,10;
- 1888
http://www.gnu.org/software/gnump3d/download.html 12
http://community.livejournal.com/lotr_tattoos/?skip=20 5
http://lua-users.org/wiki/LibrariesAndBindings 4

Although SQL is often not the most natural way to query things I found it very useful and natural in this context.

Malcolm: SQL for the command line: "show"

Posted Mar 24, 2009 21:50 UTC (Tue) by flewellyn (subscriber, #5047) [Link]

Well, SQL is not, perhaps, the best thing every when it comes to query languages. I mean, it works okay, and it's "good enough", and I'll defend it on those grounds. But it's hardly the epitome of query languages.

Truth be told, my ideal query language would be something Lisplike, with query operations specified by functions and special operators (or macros), and "views" just being newly defined querying functions or macros.

Hmmm...now there's an idea...

Special filenames

Posted Mar 23, 2009 20:00 UTC (Mon) by epa (subscriber, #39769) [Link] (3 responses)

show host, "count(*)", "total(size)" from /var/log/httpd/*access_log*
And what happens when there is a file called 'where'?

OK, in this example it would be /var/log/httpd/where which presumably wouldn't trip up the parser. But using this thing in your current directory could be flaky.

Special filenames

Posted Mar 24, 2009 15:26 UTC (Tue) by efexis (guest, #26355) [Link] (2 responses)

Standard unixy shell methodology, 'show' reads from standard input, so becomes:

show host, "count(*)", "total(size)" < /var/log/httpd/access_log

or

cat /var/log/httpd/*access_log* | show host, "count(*)", "total(size)"

would solve that

Special filenames

Posted Mar 24, 2009 21:36 UTC (Tue) by skx (subscriber, #14652) [Link]

Or just use asql which has a nice built-in shell for live queries, or the ability to run queries from the command line.

Special filenames

Posted Mar 28, 2009 11:22 UTC (Sat) by jengelh (guest, #33263) [Link]

Here you have your Useless Use Of Cat Award.

Malcolm: SQL for the command line: "show"

Posted Mar 24, 2009 2:48 UTC (Tue) by jordanb (guest, #45668) [Link] (2 responses)

I find SQL to be a horrible interactive query language. So many things that should be implicit aren't. You can't say

# select gender, race, sum(income)/count(income);"

for instance. You have to say:

# select gender, race, industry sum(income)/count(id) group by gender, race, industry;

even though there's only one rational way to group the data. In fact. If you leave off any term in the group by statement, most servers will raise an error.

While there's silly redundancy in the select statement, the others have horrible and dangerous defaults, for instance this:

# delete from records;

doesn't raise an error or do nothing, which would be sensible (you didn't specify what to delete!) instead, it deletes *everything* in the table. It'd be like if 'rm's behavior without arguments was to delete every file in the current directory.

Then there's insert's annoying positional syntax, which is both tediously redundant *and* error prone. Plus the 'shortcut' way to do an insert ignores the fact that nearly every table has as its first column an auto-incrementing ID, forcing you to either go to the horrible long-form or use a non-standard workaround if you're lucky enough to be using a server that has one.

Anyway, while his tool is fairly neat I think it might have been more useful if he'd made something to coerce the data into sqlite. It'd have been less work and you don't get stuck with the shell argument escaping horror he demonstrates there.

Malcolm: SQL for the command line: "show"

Posted Mar 26, 2009 23:56 UTC (Thu) by zlynx (guest, #2285) [Link]

Insert's syntax may seem redundant for *simple* use. But if you do complicated things, it isn't.

For example, the list of values isn't limited to one thing. Values can just go on and on, inserting as many records as you like.

Or, you can put a SELECT statement there instead of VALUES.

Now what seemed redundant is necessary.

I won't argue about DELETE and UPDATE assuming all records without a WHERE being dangerous and stupid though. :-)

Malcolm: SQL for the command line: "show"

Posted Mar 27, 2009 20:09 UTC (Fri) by marcH (subscriber, #57642) [Link]

Hey, how consistent is this?
-          delete from TABLE where  ROWfilter
- select COLnames from TABLE where  ROWfilter
-          insert into TABLE values COLvalues
Every time I go back to SQL I have to google again for examples...

Higher level concerns: "Why SQL Sucks": http://perlmonks.org/?node_id=515776

Malcolm: SQL for the command line: "show"

Posted Mar 24, 2009 13:32 UTC (Tue) by pierre (guest, #663) [Link]

http://www.logparser.com/ is quite powerful.

Author responds

Posted Mar 26, 2009 0:08 UTC (Thu) by dave_malcolm (subscriber, #15013) [Link] (2 responses)

Fame at last!

Follow-up posting here: http://dmalcolm.livejournal.com/2009/03/24/ with info on git repo etc.

Thanks for the feedback. As some have noted, this isn't intended just for log files. Note that in the original post it already had "rpm" and "proc" backends, for querying the local rpm database and /proc respectively.

It can now parse many of the files in /etc, using the Augeas library (http://augeas.net)

I also just added tcpdump support, so you can look at e.g. a wireshark dump and run something like this:
$ show "count(*)", "total(length)", src_mac, dst_mac from test.pcap group by src_mac, dst_mac
(though this is merely a messy proof-of-concept hack at this stage)

Ideas for other backends most welcome.

Author responds

Posted Apr 1, 2009 18:53 UTC (Wed) by ortalo (guest, #4654) [Link] (1 responses)

idea/need: ntsyslog backend (for parsing Windows event logs archived via NTsyslog to a Unix machine).

But the actual reason for my comment was another suggestion. Have you considered implementing the same kind of backends inside a full fledged database? It seems to me at least PostgreSQL should offer enough extensibility to allow this. It could free you from dealing with the intricacies/limitations of an "SQL-like" parser and may open the door to more complex treatments (dunno if writing would be feasible).
Or maybe you would find this too overweight for your intended usage? (I routinely have to consider >30Go of compressed log files so, even a full-fledged database engine does not seem overkill sometimes.)
Gonna look at your tool anyway. Thanks for the contribution.

Author responds

Posted Apr 1, 2009 22:38 UTC (Wed) by nix (subscriber, #2304) [Link]

What I've wondered about doing is hacking syslog() in libc and the syslog
protocol to pass the format string and arguments separately (as well as as
a formatted whole), so that syslog-ng can use its existing facilities to
dump the lot in a database. Then we can *really* do log analysis, with
variable and fixed parts spliced out. (The problem is the break of the
syslog protocol, though. I considered analyzing log messages to attempt to
retrospectively determine which parts are format string and which are
arguments, but that rapidly gets into a pattern-matching tarpit.)


Copyright © 2009, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds