From anti-systemd to pro-systemd in the shortest time
From anti-systemd to pro-systemd in the shortest time
Posted Feb 1, 2014 12:04 UTC (Sat) by khim (subscriber, #9252)In reply to: From anti-systemd to pro-systemd in the shortest time by dlang
Parent article: This week in "As the Technical Committee Turns"
you are assuming that all logs represent errors.
No. I'm assuming messages sent to syslog represent events which affect the whole system: some process have started or dies, someone did dangerous operation (started suid binary, e.g.), etc. They are meant to be consumed by human and thus should not overwhelm human. If they can owerwhelm and journald then human has no chance. The rule of thumb is simple: if you fear that your message may disappear (because someone will crash your process, or you are in inconsistent state, etc) then you send it via syslog(3), if it's informational message then syslog(3) is just wrong interface.
We, too, are generating many megabytes of logs. Of course they are not sent via syslog(3)! They are first collected in-process (similar messages are coalesced—this is feasible because they are not just text but have some internal structure), then they are collected by specialized process and sent away in batches via network. This is sane approach for extremely high amount of logs to be processed by some automated scripts. To push all that via syslog will be just crazy.
XKCD comes to mind, really: when you've noticed that rsyslog calls gettimeofday() you first though should have been not “why is it doing that?” but “why noone else cares about that?”.
Posted Feb 1, 2014 12:17 UTC (Sat)
by zdzichu (subscriber, #17118)
[Link] (13 responses)
Sounds like you are describing job for audit subsystem, not syslog.
Posted Feb 1, 2014 12:43 UTC (Sat)
by khim (subscriber, #9252)
[Link] (12 responses)
Posted Feb 1, 2014 17:13 UTC (Sat)
by dlang (guest, #313)
[Link] (11 responses)
syslog can be an extremely capable logging system, and log sent to syslog are not the limited, intended only for humans, thing that you think they are.
you are thinking of syslog as it existed 15-20 years ago, not what Rsyslog, syslog-ng, nxlog, and logstash have all been doing for years.
I've run Rsyslog at gig-E wire speeds (~400,000 logs per second) and other people with faster networks have run it at over 1,000,000 logs per second. This will keep up with logging information about every RPC processed by your daemons.
And people do just that on a routine basis.
Posted Feb 1, 2014 18:42 UTC (Sat)
by raven667 (subscriber, #5198)
[Link] (2 responses)
There is definitely a niche where new development was needed, to reliably capture data from the earliest part of boot, to record _all_ of the available meta-data the kernel can provide and to make the data at rest more easily searchable and tamper resistant. How much of this could work within the framework of the existing competitive logging utilities or needed new development is a value judgement and matter of opinion but while I think that the new journal implementation is fine for small systems, for large systems I'd rather extend the existing dominant log daemons to be able to handle the cases that journald picks up or have an early hand off.
Syslog has clearly defined inputs and outputs and is more amenable to multiple implementations being drop in replacements for one another than some of the other parts of init and systemd.
Posted Feb 1, 2014 19:49 UTC (Sat)
by khim (subscriber, #9252)
[Link] (1 responses)
Syslog's “clearly defined inputs and outputs” include specialized marks for “ftp daemon” and “USENET news subsystem” but have no marks for HTTP server. Nuff said. I can understand why one will want to plug syslogd interface into these modern solutions (these legacy systems needs to be supported, too), but to actually keep that joke of the interface around and build everything around it using larger and larger pile of hacks… that's just sad.
Posted Feb 2, 2014 22:31 UTC (Sun)
by dlang (guest, #313)
[Link]
Posted Feb 1, 2014 19:41 UTC (Sat)
by khim (subscriber, #9252)
[Link] (7 responses)
True. I do understand logging at the Amazon/Bing/Google/Yandex level, though. I don't doubt that for a second. This is just an application of first part of the third fundamental networking truth, though (with sufficient thrust, pigs fly just fine). From my experience “enterprise” guys tend to forget about second part (However, this is not necessarily a good idea. It is hard to be sure where they are going to land, and it could be dangerous sitting under them as they fly overhead).
When scalability is concened the best approach is Jeff Dean's rule: design for ~10X growth, but plan to rewrite before ~100X. Syslog design was not intended to handle such use-cases and while it it's possible to stretch it to cover that use case it's really, really, REALLY bad idea. Nope. I'm looking on all that activity in wonder and think: are they just stupid or terminally insane? Syslog was not designed for such usecases. It passes data around via sockets, and sockets are just not designed to handle such usecase. Also if you are generating so much similar data (don't tell me all these 400'000 records are totally dissimilar, I'll not believe it for a second…) there are wast coalescing opportunities. And it's important to use them because when you need to send such massive amount of data around you want to send it in batches. Not just context switches are constly at these rates, but transfers between different CPUs in today's SMP systems are constly, too! May be. But how much overhead will it incur on typical Google server with 4x10GbE (perhaps 4x100GbE by now, I'm not sure)? Is it wise to spend this much computing power on logging?
Posted Feb 1, 2014 20:30 UTC (Sat)
by raven667 (subscriber, #5198)
[Link] (6 responses)
Posted Feb 1, 2014 21:26 UTC (Sat)
by khim (subscriber, #9252)
[Link] (5 responses)
This is part of the answer but it's not the whole answer. “Big data” firms quite often are using “software from multiple vendors” too. And quite often it needs complex massaging to make sure it behaves as it should in the appropriate context. I never worked in the “enterprise” but I have quite a few friends and collegues who did and from their explanation the real problem is related to in-the-enterprise-politics: you can not just go and say “our use of X is wrong, we need to replace it with Y or Z”. Because this will make the one who mandated X angry and may undermine it's authority. If there are a way to keep X and still achieve objectives this is the way that'll be chosen. Quite often this means that more resources and more money will be spent in the end but nobody will be forced to admit that s/he's wrong. That's fine, they could do whatever they want but it also means that it's their responsibility to unclog their mess: if they are using screwdrivers to hammer in nails and new screwdriver is not usable in that role then it's their resposility to invent something to live with it. In this thread there were quite a few explanations for how they could continue to use rsyslogd with systemd (LD_PRELOAD, namespaces, etc). It's doable. Yes, it'll be ugly and perhaps somewhat fragile but so what? The whole sheme was fragile to begin with.
Posted Feb 1, 2014 22:21 UTC (Sat)
by raven667 (subscriber, #5198)
[Link] (4 responses)
Posted Feb 2, 2014 10:48 UTC (Sun)
by khim (subscriber, #9252)
[Link] (3 responses)
There are no need to use one solution for everything. /dev/log and syslog(3) work fine for low-volume logging. They will work even if your program is almost completely broken and this is valuable property to have. As for “we can not drop or change X because it's too expensive”—I've seen that, too and more often then not it ends in a disaster: if it's “it's too expensive” to change X now then what gives you confidence that it'll be cheaper later? This typical enterprise myopia which puts this quarter results above long-term survival. If you don't control something in your company then costs spent on this piece tend to grow till the whole house of card collapses. It's true that sometimes it's cheaper to use off-the-shell solutions because the thing you are doing are not that important, but if you are starting to care about one vs four context switches then either this part is in your core competence (and thus must be under your controls) or you are large enough to dedicate resources for the scaleable solutions even if it's not your core competence.
Posted Feb 2, 2014 16:47 UTC (Sun)
by raven667 (subscriber, #5198)
[Link]
Posted Feb 2, 2014 22:30 UTC (Sun)
by dlang (guest, #313)
[Link] (1 responses)
This includes being able to select between the different syslog implementations, and it also means being able to select between the huge number of tools that exist that deal with syslog messages today (including many that support very large volumes of logs)
No, this is not suitable for Google or Amazon levels of messages without consolidation, but it is suitable for just about any company below those levels.
Companies may choose to write customized logging mechanisms for their custom software, but every company (including google and amazon) runs a lot of software that they did not develop from scratch (think routers and switches for example), and so whatever system they use, it's going to have to support syslog anyway.
Posted Feb 3, 2014 0:18 UTC (Mon)
by khim (subscriber, #9252)
[Link]
Well, yeah, this is good example, LOL. But the fact that Google actually has it's own software on routers and switches is just a funny coincidence. No, that fact is not the main difference. Difference between Amazon/Bing/Google/Yandex and “enterprise” lies not with the fact that “big data companies” deal with larger amount of traffic but with priorities: Amazon/Bing/Google/Yandex know that all their solutions may become deficient in the future and thus have contingency plan which are enrolled long before scalability limits of the existing architecture are reached. “Enterprises” tend to exploit whatever they have till they reach 1000% of the intended scalability level where any minor change can collapse the whole house of card then start running around like headless chickens when, inevitably, such change is actually introduced. Think Danger. I'm not sure why: certainly outages which may ruin the whole company are as important for Blackberry or Verizon as they are important for Google so why such a big difference in attitude?
From anti-systemd to pro-systemd in the shortest time
Well, kinda. Syslog is just simplest and weakest audit subsystem in existence, but yes, it's an audit subsystem. If you'll try to store information about every RPC processed by all your daemons it'll choke, journald or rsyslogd.
From anti-systemd to pro-systemd in the shortest time
From anti-systemd to pro-systemd in the shortest time
From anti-systemd to pro-systemd in the shortest time
From anti-systemd to pro-systemd in the shortest time
Syslog has clearly defined inputs and outputs and is more amenable to multiple implementations being drop in replacements for one another than some of the other parts of init and systemd.
From anti-systemd to pro-systemd in the shortest time
From anti-systemd to pro-systemd in the shortest time
This shows that you do not understand logging at the enterprise level.
syslog can be an extremely capable logging system, and log sent to syslog are not the limited, intended only for humans, thing that you think they are.
you are thinking of syslog as it existed 15-20 years ago, not what Rsyslog, syslog-ng, nxlog, and logstash have all been doing for years.
I've run Rsyslog at gig-E wire speeds (~400,000 logs per second) and other people with faster networks have run it at over 1,000,000 logs per second. This will keep up with logging information about every RPC processed by your daemons.
From anti-systemd to pro-systemd in the shortest time
From anti-systemd to pro-systemd in the shortest time
From anti-systemd to pro-systemd in the shortest time
From anti-systemd to pro-systemd in the shortest time
From anti-systemd to pro-systemd in the shortest time
From anti-systemd to pro-systemd in the shortest time
From anti-systemd to pro-systemd in the shortest time
think routers and switches for example