The answers are there, you just don't want to admit they are right answers...

Posted May 27, 2010 10:47 UTC (Thu) by Cyberax (✭ supporter ✭, #52523)
In reply to: The answers are there, you just don't want to admit they are right answers... by khim
Parent article: The road forward for systemd

"If the application bombs out before something is started then the application should be fixed - it's as easy as that. It can connect to bind (then it'll be stopped till bind is started) or we can fix the resolver library to block the application till the bind is started (exactly the same thing, but more clear solution)."

Bombing out on unresolved hosts is a CORRECT behavior. Modifying the app so it can work with a certain init system is crazy.

So we still need to have explicit dependencies at least for SOME services.

"You either need the application, or you don't need it. If you need it then start it and PostgreSQL will be started, if you don't need it then don't start it."

And what if PostgreSQL is down because I'm upgrading it? Or maybe I don't want it to start for some reasons (like system recovery in progress, etc.)?

"What's so painful about it?"

Doing it correctly.

"Your webapp will be stopped before PostgreSQL if it keeps the connection to PostgreSQL active - as it should."

Why should it? I use a connection pool, but it doesn't keep inactive connections forever. So it might certainly be possible that the webapp doesn't have any active connections during shutdown.

"Look, the Unix survived for 40 years as a research project - and a lot of components are connected via ducttape and chewing gum. This is crazy. If it's production system then it should be built as production system. systemd removes the chewing gum and ducttape - and there are no way to reattach them. This means components which don't fit without ducttape must be changed to fit better, that's all. If you don't like it - you can use the old system, if you want, it's free software, after all."

Yes. That's why I hate shell scripting and systemd. They work in _most_ cases and can miserably fail in sometimes.

Upstart has an advantage of clean _explicit_ dependencies. They can be analyzed (with the help of systemd!) and fixed if required. But they are _explicit_.

It's all fixable...

Posted May 27, 2010 11:16 UTC (Thu) by khim (subscriber, #9252) [Link] (2 responses)

Bombing out on unresolved hosts is a CORRECT behavior. Modifying the app so it can work with a certain init system is crazy.

Well, the alternative is fix in resolver library as I've pointed before. And if the application should be fixed in some other cases then it may be done with some kind of "babysitter" wrapper.

So we still need to have explicit dependencies at least for SOME services.

What for? So far all examples were reiterations of the same scheme: "I need the explicit dependencies because I have an application which is broken" and the obvious answer is "well, fix the application"... It can often be done with some kind of wrapper so application itself is intact...

And what if PostgreSQL is down because I'm upgrading it? Or maybe I don't want it to start for some reasons (like system recovery in progress, etc.)?

Then the systemd must be informed and all applications which need the PostgreSQL will wait.

Why should it? I use a connection pool, but it doesn't keep inactive connections forever. So it might certainly be possible that the webapp doesn't have any active connections during shutdown.

If webapp drops all active connections to PostgreSQL while it has unsaved state then it's buggy and should be fixed. If the active connections are closed because there are no work to be done with them then it does not need PostgreSQL to shut down.

Yes. That's why I hate shell scripting and systemd. They work in _most_ cases and can miserably fail in sometimes.

I'm yet to see the problem where systemd fails because of it's design and not because some other component is broken.

Upstart has an advantage of clean _explicit_ dependencies.

IMNSHO it's disadvantage.

They can be analyzed (with the help of systemd!) and fixed if required. But they are _explicit_.

Yup. That's why they'll never be correct. Currently dependency graph is partially stored in application code and partially in upstart configuration. This information is often stale and incorrect - but with some application of time and resources the whole system works - but it only proves the #3 truth from RFC1925. But is it correct and good approach? Obviously not - duplication of information is almost always bad because copies grow apart over time. Sometimes it's needed for performance reason, but I'm not seeing such need with regard to systemd...

It's all fixable...

Posted May 27, 2010 12:03 UTC (Thu) by hppnq (guest, #14462) [Link] (1 responses)

But is it correct and good approach? Obviously not - duplication of information is almost always bad because copies grow apart over time. Sometimes it's needed for performance reason, but I'm not seeing such need with regard to systemd...

But systemd should be able to handle things like socket options, network addresses, file permissions/ownership, and a few more. I don't see how this can be easily done without duplication of information.

(And how about processes that decide on, say, port numbers or domain socket names only after they have been started?)

It's all fixable...

Posted May 31, 2010 16:38 UTC (Mon) by mezcalero (subscriber, #45103) [Link]

Yes, you can configure quite a few socket settings in the systemd .socket files. As it turns out the amount of duplication is only minimal here, since in fact most services don't play any wild games with socket options, and many can be set after the first connection attempt.

The answers are there, you just don't want to admit they are right answers...

Posted May 27, 2010 14:47 UTC (Thu) by buchanmilne (guest, #42315) [Link] (15 responses)

Bombing out on unresolved hosts is a CORRECT behavior.

Depends on the app, and you haven't described it. Maybe it should be restarted when bind restarts, but then ... why does name resolution depend on a local DNS server? openvpn keeps retrying when it can't resolve, and that works just fine for me. When I have connectivity, it starts, when I don't it just retries every one in a while.

Why should it? I use a connection pool, but it doesn't keep inactive connections forever. So it might certainly be possible that the webapp doesn't have any active connections during shutdown.

If there aren't any active connections, there is no harm in shutting Postgresql down at the same time you shut down the web app.

Upstart has an advantage of clean _explicit_ dependencies. They can be analyzed (with the help of systemd!) and fixed if required. But they are _explicit_.

Note that earlier dependency-based init systems also have explicit dependencies. For example, Mandriva has used prcsys (since about 2004), which uses LSB-headers for dependency information. Funnily enough, so does systemd. Quoting from the blog post:

"Note that we make use of the dependency information from the LSB init script headers, and translate those into native systemd dependencies. Side note: Upstart is unable to harvest and make use of that information."

Did you read the blog post?

The answers are there, you just don't want to admit they are right answers...

Posted May 27, 2010 15:10 UTC (Thu) by keybuk (guest, #18473) [Link]

The "unable" in that post is wrong. In fact, the code to have Upstart support LSB headers in init.d scripts as an additional source of jobs has already been contributed, and will be in the next major release.

The answers are there, you just don't want to admit they are right answers...

Posted May 27, 2010 16:09 UTC (Thu) by paulj (subscriber, #341) [Link] (11 responses)

why does name resolution depend on a local DNS server? openvpn keeps retrying when it can't resolve, and that works just fine for me. When I have connectivity, it starts, when I don't it just retries every one in a while.

Ugg.. Such polling behaviour is horrid. How frequently should apps poll? No matter how slow or fast apps poll, it will be too slow for someone and it will burn energy for everyone. If systemd leads to lots of apps being modified to poll for resources then: ye gods!

We surely have to get to a stage where these things are event-driven? The most common system for this seems to be DBus now. DBus already can auto-start services on demand. If we really must modify everything, why not just port to DBus? It seems a lot of apps will have to be modified to use an event posting service anyway, if they're ported to systemd, so why not skip the systemd step altogether? Just because these dependencies can be "magiced" away technically at the init level, does *not* mean they go away. They're still there, and so they're still going to have be dealt with at some level. Which may mean that all that systemd accomplishes is a boatload of pointless churn, on the path toward a userspace that's integrated around an event-driven system service (e.g. DBus).

Also, I'm not looking forward to the release or 4 of instability we Fedora users no doubt will go through as all the corner cases get debugged out (systemd and updating apps).

The answers are there, you just don't want to admit they are right answers...

Posted May 31, 2010 16:49 UTC (Mon) by mezcalero (subscriber, #45103) [Link] (10 responses)

You're FUDding.

There is no polling involved, the normal glibc resolver times out after 30s or so, that is more then enough time to get things started. The problem you are discussing is made up.

Clients need *not* to be patched. Servers do, but the work is minimal, and already finished for all daemons we start by default on F13.

I think you are a bit naive on the amount of coding and political work you'd need to do to make everybody use D-Bus instead of normal sockets. Also, let's not forget that there are quite a few services where D-Bus makes little sense. D-Bus is an RPC. Sockets are on a lower level. I mean, good look in arguing that glibc should now talk D-Bus when resolving DNS info.

The answers are there, you just don't want to admit they are right answers...

Posted May 31, 2010 22:40 UTC (Mon) by paulj (subscriber, #341) [Link] (9 responses)

I'm not FUDing. I was replying to a specific example about openvpn - not systemd. And then generalising from it to *polling* - not to name resolution! I said *IF* systemd leads to more polling in apps, that would be bad.

I understand systemd does allow for basic dependencies (the fulfillment state transition of which is obv. an event). However, the stated philosophy in systemd is to eschew encoding dependencies in config files and instead have the apps just "Do The Right Thing" (whatever that may be), as much as possible.

My question then is: Doesn't this philosophy mean that, in addition to systemd managing process lifetimes, that you also need an a higher-layer event-posting system above systemd to allow apps to stay informed of event fullfilment and transitions, and handle them? Events such as "network available", "name resolution service available", "System time may now be considered stable", etc.

Basically, systemd deliberately does not answer these questions, other than that it can act as a proxy for certain services by handling their fd, correct? If so, do you agree that there would such a higher-level system? If so, why do you think it is worth having BOTH a fancy init AND a fancy higher-level event-handling system? Why not just do all the work in this fancier higher-level service, and migrate services to be started by this higher-level service, and stick with the dumb init?

You may say I'm FUDing, but none my assertions in the comment you replied were about systemd. Anything relating to systemd were in the form of questions (except perhaps my fears of instability), as with this comment, and all I'm interested in is to have my questions/concerns addressed.

The answers are there, you just don't want to admit they are right answers...

Posted May 31, 2010 23:34 UTC (Mon) by nix (subscriber, #2304) [Link] (7 responses)

'network available' and 'name resolution service available' are handled by blocking the network socket of discourse until the service *is* available, by opening the socket but not accept()ing on it until the child is started. (It remains slightly unclear to me how you can fd-pass the socket fd to the child without hacking the child to accept a passed-in fd, which most unmodified daemons cannot accept -- in fact I'm not sure I've ever seen one which can.)

'System time may now be considered stable' I have no idea how you could handle. Lennart?

The answers are there, you just don't want to admit they are right answers...

Posted Jun 1, 2010 1:21 UTC (Tue) by paulj (subscriber, #341) [Link] (3 responses)

If systemd does indeed have some simple dependencies, then "system time may now be considered stable" == having a dependency on a script that runs ntpdate -s. I'm more concerned that there may be events that do not fall easily into "write to an fd, at the boundary of process lifetimes (either before or after)" model. Such events are not visible to systemd. If there's a significant amount of those, then you need something higher-level.

E.g. the printer example Lennart gives, and says "printer plugged in" can be depended on. But what if I want to depend on the type of printer? That's treading into udev below and DBus services above, depending on exactly what I want to do. Or "network available" - but what if I want to depend on a certain kind of network interface? Or a certain location (e.g. "start the automounter, for corporate NFS if ...")? Lots of the information you might use there is being maintained by NetworkManager (using DBus to publish that info).

Basically, if we add systemd to the mix, we're going to have udev, then systemd, then DBus + {various DBus services: NetworkManager, ModemManager, gdm, polkitd, bluez, etc}. Do we really need that extra layer of management? And many services modified for systemd would have to bind into a DBus(-like)? layer anyway, to handle in-lifetime events.

The answers are there, you just don't want to admit they are right answers...

Posted Jun 1, 2010 11:56 UTC (Tue) by mezcalero (subscriber, #45103) [Link] (2 responses)

If your event "system time may now be considered stable" shall be about NTP then it makes little sense since NTP clients tend to slowly adjust the time instead of making it jump. That means that the time is adjusted fully only after quite a bit of time. Applications should not wait for that. They should just assume that time is correct, right from the beginning. And that is a safe assumption for all machines built in the last 25y. And again, I don't see what systemd has to do with eventing like that anyway. I see no need to multiplex events through an eventing system. Get your events from the respective subsystems directly. Don't add an indirection layer here.

People should handle "in-lifetime" events (as you call them) with the native notification logic available. no need to involve systemd, or dbus or anything.

The answers are there, you just don't want to admit they are right answers...

Posted Jun 1, 2010 13:33 UTC (Tue) by dlang (guest, #313) [Link]

I don't know what datacenter you are working in, but I sure don't trust the system time in my datacenter until NTP has started, and at initial startup I don't have it gradually adjust the time as it may be so far off that NTP will decide that it will never get it right and shut down. Instead it does a one-time large jump to get the system time correct.

on some systems I use the -G option, on others I use ntpdate.

The answers are there, you just don't want to admit they are right answers...

Posted Jun 2, 2010 6:33 UTC (Wed) by paulj (subscriber, #341) [Link]

It's not a safe assumption unfortunately. There are distressingly many machines out there which deliberately are run without batteries (lower field maintenance) and which hence have the time set at boot. Note that "ntpdate -s" implies setting the time - not slowly adjusting it. Even if you discount this example, I have a strong sense that there are many other high-level events (e.g. the network and printer examples).

What exactly is the "native notification logic"? (Note that many events are application layer).

Also, I'm not saying systemd needs eventing logic. I'm asking whether it makes sense to try solve these problems in a init process, external to applications. (for value of apps that includes those that would be started by it).

In short I'm asking whether actually its user-space that needs fixing to cope with differences in and changes to environmental state? Because it seems that doing that correctly would allow a not-too-fancy init to fire off apps in parallel and not worry about dependencies, as you argue systemd should be able to do with good apps. It seems applications will have to be modified to do this anyway, to get best effect from systemd.

The answers are there, you just don't want to admit they are right answers...

Posted Jun 1, 2010 11:47 UTC (Tue) by mezcalero (subscriber, #45103) [Link] (2 responses)

I don't think that "System time may now be considered stable" is a valid, or relevant or necessary event.

The answers are there, you just don't want to admit they are right answers...

Posted Jun 2, 2010 7:38 UTC (Wed) by Darkmere (subscriber, #53695) [Link] (1 responses)

Case in hand, several of our machines run without bios batteries (concious decision) and are reset to "plate" state if they boot up. This means dates set at somewhere in 2002. Early part bootup scripts check the time of /etc/init.d, and sets local time to modification date of init.d + 15s. This is done wether or not we have a working clock for consistency. At this point, if time is right, we break it, if it's wrong, we break it. All to enforce consistency and make sure we recover from the error.

After network is established, time is then fast-forwarded to "real" time via ntpdate.

This means you have 2 distinct time-jumps. The first one is to avoid the annoyingly bad fsck times when a filesystem is 480+ days out of fsck ( right.) the second one _has_ to run before dovecot, which will detect time warp and decide "Life is bad, hardware broken, we die now" and block. ( doesn't properly shut down, just stops working properly )

So, yes. A few services, mostly mailservices, and some other ones do not like it when time changes too much. ntp itself _requires_ ntpdate early on, or it will simply decide that the time is too much out of sync to even bother adjusting it. And with large timedrifts, ntp isn't fun anyhow.

So, You may consider those services that require proper timekeeping to be broken ( perhaps they are ) but they are common and have to be managed with. And it's easier to deal with that than to deal with other situations.

Annoying time based fsck

Posted Jun 9, 2010 12:37 UTC (Wed) by hackerb9 (guest, #21928) [Link]

This means you have 2 distinct time-jumps. The first one is to avoid the annoyingly bad fsck times when a filesystem is 480+ days out of fsck ( right.)

I realize I'm going far off-topic by not contributing to the init flame war and instead giving a small helpful hint. Please forgive me. Don't jump your clock just to avoid fsck. If you're going to skip regular fscks anyway, you can use tune2fs -i 0 /dev/sdaX to disable the time based fsck.

ObligatoryFlameContribution: "NO! If you had read my blog post you'd realize you are ALL wrong! Using Makefiles for RC dependency is the one true way!

The answers are there, you just don't want to admit they are right answers...

Posted Jun 1, 2010 11:43 UTC (Tue) by mezcalero (subscriber, #45103) [Link]

I am not sure why you get the idea that you need to poll for anything if systemd is used. systemd itself does not do repetitive polling anywhere, although we now can watch quite a few different kinds of resources for you: devices, file systems, automounts, sockets, paths, timers, swap files and more.

I am not sure what you mean by "basic dependencies". systemd actually has a pretty elaborate dependency system, which distinguishes eight kinds of dependencies: Requires, RequiresOverridable, Requisite, RequisiteOverridable, Wants, Conflicts, Before, After. And you can use that to build dependencies between all ten kinds of units we have.

There doesn't need to be an event for "name resolution available", because in the socket-based actviation scheme it is always available. And most services which need a live network, such as IPv6 discovery daemons, or Avahi or suchlike hook into netlink anyway to get notifications for this -- and rightly so. I see little need for another generalized eventing system. And I don't think that "System time may now be considered stable" is a valid event. On all machines built in the last 25 years or so the RTC should be "stable" from the beginning. I mean, it would be nice to have a notification system where the kernel informs us about a jumping time (i.e. on timezone changes), but that has nothing to do with the monotonic clock or systemd.

So, I fail to see why you'd want any generalized eventing system beyond the dependency system that systemd offers to you and the various notification systems the kernel already provides, such as inotify, netlink, udev, poll() on /proc/mount, and similar. If apps want those events they should use those notification facilities natively, there is little need to involve systemd in that.

Does that answer your question? Because quite frankly, I am not sure I understood the question entirely...

The answers are there, you just don't want to admit they are right answers...

Posted May 27, 2010 16:32 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

"Depends on the app, and you haven't described it. Maybe it should be restarted when bind restarts, but then ... why does name resolution depend on a local DNS server? openvpn keeps retrying when it can't resolve, and that works just fine for me. When I have connectivity, it starts, when I don't it just retries every one in a while."

Because this app must work even without Internet connectivity, using only local authoritative BIND server to resolve local names.

"If there aren't any active connections, there is no harm in shutting Postgresql down at the same time you shut down the web app."

Aside from possible race conditions if app tries to connect to Postgres while Postgres is being shut down. It's especially possible if app needs to do some clean-up actions on shutdown (say, write a log entry to DB).

Unlikely? Yes. Possible? Certainly. And that's what I hate most in Unix systems.

The correct behavior in my opinion would be to explicitly mark certain dependencies as 'parallelizable' so they can be started simultaneously. You won't be able to automatically and reliably detect all dependencies, that's a fact of life.

"Note that earlier dependency-based init systems also have explicit dependencies. For example, Mandriva has used prcsys (since about 2004), which uses LSB-headers for dependency information. Funnily enough, so does systemd. Quoting from the blog post"

Yes. However, earlier init systems were rule based instead of event-based. That's a great advantage of upstart, actually. And another commenter noted that upstart can now use LSB headers.

"Did you read the blog post?"

Yes.

PS:I quite like PulseAudio precisely because it tried to cover _all_ use-cases, including piping audio over the network to a USB bluetooth handset. This project - not so much.

The answers are there, you just don't want to admit they are right answers...

Posted May 27, 2010 17:06 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

BTW, Scott James Remnant posted a great article about Upstart and systemd:

http://www.netsplit.com/2010/05/27/dependency-based-event...

It also nicely clarifies distinction between event-based and rule-based startup systems.

The answers are there, you just don't want to admit they are right answers...

Posted May 31, 2010 16:34 UTC (Mon) by mezcalero (subscriber, #45103) [Link]

You don't have to patch any client. Clients should just connect the the service they want to use, systemd cares about the rest. End of story.

You are discussing a problem that doesn't exist.