User: Password:
|
|
Subscribe / Log in / New account

Poettering: The Biggest Myths

Poettering: The Biggest Myths

Posted Jan 30, 2013 15:36 UTC (Wed) by pspinler (subscriber, #2922)
Parent article: Poettering: The Biggest Myths

As an enterprise level sysadmin, there's a couple of things in systemd that I think are pretty darn cool, and there's a couple of things I'm not looking forward to.

On the cool side, I rather like that systemd can reliably detect daemon termination and restart it, as well as some of the cgroup daemon isolation stuff. This is quite welcome.

On the concern side, I worry about the extra time it's going to take to debug issues in large chunks of compiled code v. script, the extra dependencies (e.g. on dbus), and I am annoyed at having to learn yet one more subsystem.

The cool stuff is pretty self explanatory, but allow me to explain my concerns.

Extra debugging effort: I recently (in the last two weeks) had to debug a LDAP client performance issue related to over large LDAP queries.

In one case, I found the bug in a set of perl scripts used as service monitors for our veritas clusters. I was able to find the offending code, modify it in place and test the fix in about 2 hours.

In another case, the issue occurred in a bit of compiled code using LDAP (sudo in this case), I had to download a large bit of code, spend a good amount of time reading it, spin up a compiler / development environment, *then* make changes and test 'em. Note that this had the advantage, since it's a pure user space program, that I could test it without e.g. system reboots, and the codebase was smaller than systemd's. Total time, a day and a half.

It'd be even slower for systemd, 'cause the codebase is notably larger, and it's significantly more tied into the system; I'm not sure it could be tested without rebooting.

Extra dependencies: as a good sysadmin, I try to follow the principle of running only the minimum stuff required on a server. For instance, my web servers have only 26 processes on 'em. More stuff running == more stuff to go wrong, and more attack surfaces. In general, I prefer to not do so. From what I can see, systemd requires at least d-bus, and maybe more besides.

Yet one more subsystem: the thing here is simply that sysV init isn't going to go away. It's not like our many hundreds of customers running on rhel5 are going to magically migrate to rhel7 the moment it hits. So I'm going to have to continue to know sysV init, and at least a bit of upstart for personal use, then add systemd knowledge on top of that.

For instance, when my boss tells me to install a new monitoring agent, and I write a puppet recipe go install and configure the service, and it doesn't work, I now have to be concerned about double the domain knowledge as before (remember, heterogeneous deployment across older sysV and newer systemd systems).

I know systemd is fun and it does cool stuff, I'll likely be learning it and using it at home if only because my home distro will use it; it's just that I only have so many hours in the day and both my personal and professional time is valuable. In this case I'm not yet convinced the benefits of systemd outweigh the time and opportunity costs of it.

-- Pat


(Log in to post comments)

Poettering: The Biggest Myths

Posted Jan 30, 2013 17:12 UTC (Wed) by rgmoore (✭ supporter ✭, #75) [Link]

On the concern side, I worry about the extra time it's going to take to debug issues in large chunks of compiled code v. script

This seems like a misplaced worry to me. Consider your example of the perl scripts you debugged. You assumed correctly that any performance problems would be in the scripts rather than the compiled code (i.e. Perl itself). That's a reasonable assumption because your scripts were something you apparently put together in house, so it hadn't had much performance debugging effort. In contrast, Perl is used by millions of people all over the world, and any critical performance issues in it would have been noticed long before you got there. The same thing is true of the shell and standard shell functions when you're writing scripts in sh; you assume that the compiled code has been thoroughly debugged and had its performance problems worked out, so any residual problem must be in your script.

I would argue that systemd is going to be much closer to Perl, bash, etc. than it is to your hand-rolled monitoring scripts. Systemd is going to be doing performance critical tasks on tens or hundreds of millions of machines, so any critical problems are going to be found and fixed quickly. The equivalents of your scripts will be the systemd unit files, which can be much simpler and easier to debug than traditional scripts because they don't require a lot of implementation details.

Poettering: The Biggest Myths

Posted Jan 30, 2013 19:22 UTC (Wed) by pspinler (subscriber, #2922) [Link]

Well, actually, the perl scripts I'm referring to were themselves vendor written, and presumably in wide use (there's a pretty big customer base for HA clusters running Oracle, after all). Ergo, wide deployment is no guarantee of non-bugginess.

Also, the analogy you make isn't quite getting my point, which is: it's easier and faster to debug script code than compiled code. It's further easier to debug small amounts of code than large amounts of code.

Systemd is both large, and compiled, and thus harder to debug.

To your point: I'm skeptical that systemd will remain so bug free as you imply. Perhaps after 5-6 years of bake in; but now? Uh, sure. It's a large, complicated body of code, which has taken on and rewritten a lot of functionality (initd, inetd, logging, scheduling, timezone, yadda yadda yadda).

That amount of functionality is going to be quite quite hard to get right in any short amount of time, no matter how good your software engineering is. In fact, to quote Mr Poettering from this very discussion, earlier:

> Heck, we have so much more bad code in our stack, Upstart totally stands out in quality.

I'm going to refer to an interesting blog post on software engineering by Joel Spolsky. He's a windows dev, and not a big open source guy, but I think he has some pretty good insights, here:

http://www.joelonsoftware.com/articles/fog0000000069.html

Joel S. argues that rewriting from scratch is one of the worst software project choices that you can make. Why? Here's the crux of it:

> The idea that new code is better than old is patently absurd. Old code has been used. It has been tested. Lots of bugs have been found, and they've been fixed.
...snip...
> that two page function. Yes, I know, it's just a simple function to display a window, but it has grown little hairs and stuff on it and nobody knows why. Well, I'll tell you why: those are bug fixes.
...snip...
> When you throw away code and start from scratch, you are throwing away all that knowledge. All those collected bug fixes. Years of programming work.

There's more, but I think the above is telling. We've spent huge amounts of time making sure that the stuff we have works -- it'll take years more to make sure that the new stuff works (and here's the key point) _no matter how much better an architecture it has_.

So yeah, when systemd actually reaches that level of stability, then at that point I'll unlikely have to debug it in an enterprise environment. Right now, I'm skeptical it's there yet.

-- Pat

Poettering: The Biggest Myths

Posted Jan 30, 2013 19:43 UTC (Wed) by raven667 (subscriber, #5198) [Link]

I think you are right to be skeptical but I think that the systemd team is using appropriate software engineering standards leading to decent initial code quality. While you say the code is complex it is no where near as complicated as, say, a filesystem, so will take less time to stabilize, it has been worked on for several years now and shipped on several systems for at least one release cycle. It represents a net reduction in code compared to the previous systems (sysvinit, startup scripts, shell functions commonly sourced by scripts, ancillary tools used by scripts to daemonize and track pidfiles, etc.) which is another factor which should lead to the core systemd stabilizing quickly.

There will be bugs and problems in the future but I don't expect them to be common or widespread and I'd be surprised if it were in the core functionality because the core has such a limited and well defined scope (starting and killing processes).

Poettering: The Biggest Myths

Posted Jan 30, 2013 17:43 UTC (Wed) by raven667 (subscriber, #5198) [Link]

I too don't expect to be reading the source code to systemd when troubleshooting problems, it's the unit files where the logic is, and they should be much easier to debug than shell scripts. systemd has such a limited scope, it just needs to start processes in a configured environment, and has a well defined interface to the kernel I expect the core of it to be pretty stable. There may be more churn in the ancillary tools but they should be fairly easy to debug by logging or watching their inputs/outputs as they are separate processes.


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds