Please don’t hard-code assumptions

Posted Jan 29, 2019 3:51 UTC (Tue) by akkornel (subscriber, #75292)
Parent article: Systemd as tragedy

I agree agree that a lot of the stuff systemd is good, but at the same time, systemd makes assumptions that I don’t think it should.

Case in point: I deal with several clusters that use Stanford Central LDAP for account info, and our UID numbers have gotten pretty high. So much so that it overlaps the range systemd uses for dynamic service UIDs.

The biggest annoyance is that the UID range is hard-coded, so I can’t provide an alternate range. Having something upstream take a block of UIDs is not new; Debian’s policy has several such ranges. But in those cases, I can simply grab the package source and rebuild it with a different configured UID. In fact, we do exactly that for one package (specifying a different UID to the install script). But with systemd, the range is buried in the code, and the change is that much more dangerous.

More details: https://github.com/systemd/systemd/issues/9843 (but please don’t spam the GitHub issue!)

Please don’t hard-code assumptions

Posted Jan 29, 2019 5:55 UTC (Tue) by mbiebl (subscriber, #41876) [Link] (6 responses)

meson_option.txt has this:

option('system-uid-max', type : 'integer', value : '-1',
option('system-gid-max', type : 'integer', value : '-1',
option('dynamic-uid-min', type : 'integer', value : 0x0000EF00,
option('dynamic-uid-max', type : 'integer', value : 0x0000FFEF,
option('container-uid-base-min', type : 'integer', value : 0x00080000,
option('container-uid-base-max', type : 'integer', value : 0x6FFF0000,

It should be possible to recompile systemd adjusting the values to your needs. Or am I missing something obvious?

Please don’t hard-code assumptions

Posted Jan 29, 2019 6:18 UTC (Tue) by akkornel (subscriber, #75292) [Link] (5 responses)

There are two issues that I have with the "Just recompile" response. One is that Lennart does not suggest doing so, as per https://github.com/systemd/systemd/issues/9843#issuecomme...

The other reason is, systemd is such a critical component, that I don't want to mess with the distro's packaging process.

That is one thing worth noting: I'm not talking about an individual system, where I might be willing to do `make ; make install` (or the equivalent). I need to be able to take this change and package it up for at least Debian, Ubuntu 18.04, and maybe RHEL 8 if it includes this. I need to do that because we have a substantial number of systems running those distributions of Linux.

So, that is three build infrastructures that I would have to maintain for this, along with an additional GPG key that I'll have to push out during system commissioning (because I'd want to have signed packages/repositories). I will also have to keep it in sync with each distributor's patches to systemd, and it also means that if there is a security update to systemd, I will have to rush to pull in the change and rebuild everything.

That is alot of infrastructure to maintain, and what annoys me is, I would only have to do this because the UID range assumption was hard-coded. If, for example, it was possible to specify a UID range via kernel command-line, then that would be perfectly fine with me (and would actually work really well for our diskless systems).

Please don’t hard-code assumptions

Posted Jan 29, 2019 9:28 UTC (Tue) by jani (subscriber, #74547) [Link] (1 responses)

> If, for example, it was possible to specify a UID range via kernel command-line

N.b. it's the *kernel* command-line. Not userspace command-line.

Please don’t hard-code assumptions

Posted Jan 29, 2019 15:59 UTC (Tue) by akkornel (subscriber, #75292) [Link]

Yes, but systemd already parses the kernel command-line for systemd-specific parameters. See https://www.freedesktop.org/software/systemd/man/kernel-c...

Please don’t hard-code assumptions

Posted Jan 29, 2019 16:22 UTC (Tue) by mbiebl (subscriber, #41876) [Link] (1 responses)

I was mainly commenting since you mentioned that you have no problem in rebuilding packages from source but in case of systemd you found it was somehow hard-coded deep inside the code which made this unnecessarily hard/impossible.
So I wondered if you weren't aware of those config switches which let you easily change those settings.

Please don’t hard-code assumptions

Posted Feb 5, 2019 0:15 UTC (Tue) by jccleaver (guest, #127418) [Link]

> I was mainly commenting since you mentioned that you have no problem in rebuilding packages from source but in case of systemd you found it was somehow hard-coded deep inside the code which made this unnecessarily hard/impossible.

I've regularly version-bumped upstream RPMs or added in a custom patch (if necessary) for sites I've been at, but systemd is far too complex and too central to risk something like that, which really honestly does help prove the larger point that systemd's project design is flawed.

I can't imagine there's anything I'd need to patch traditional init, or upstart on RHEL6, for, but I'd feel comfortable after a bit of testing releasing it if I had to; there's simply not very much it does, the complexity is mostly in the things it launches, like /etc/rc.sysinit. No way I'd risk that with systemd.

Please don’t hard-code assumptions

Posted Jan 31, 2019 17:10 UTC (Thu) by ermo (subscriber, #86690) [Link]

Your use case sounds legitimate, so I'm wondering if not the best thing would be to simply ask the systemd developers whether doing as you suggest would be feasible/possible?

You could even re-use the verbiage from the comments you just made?

Please don’t hard-code assumptions

Posted Jan 29, 2019 6:05 UTC (Tue) by luya (subscriber, #50741) [Link] (5 responses)

From an educated observation, why not take a look about the functionality of Stanford Central LDAP ? Ask noticed on one of the comment, the hardcoded UID are from the 90s era and making one of systemd configuration may not be a good idea in the modern technology.

Please don’t hard-code assumptions

Posted Jan 29, 2019 6:36 UTC (Tue) by akkornel (subscriber, #75292) [Link] (4 responses)

Even in today's modern world, there are still places where you want a single view of UID numbers.

If you have a shared computing environment, then you want all of your systems to have identical copies of account information. Our largest computing environment is Sherlock, which has at least a thousand compute nodes, and multiple thousands of users. So we need to be sure that each user has the same UID across all systems.

Of course, you could just maintain your own LDAP server. But that is extra infrastructure, and not everybody is an OpenLDAP expert. So, the Central LDAP service is run on good hardware, with good people who know OpenLDAP, with support from the company that has the people who develop OpenLDAP. And so, that is where the UID number is allocated.

It is worth noting, there are many environments where NFS version 3 is still in use. For NFS version 3, the protocol only works with UID and GID, so both client and server must be working off of the same list of accounts. Yes, I know NFS version 4 does use names instead of IDs, and many of our compute environments are running version 4, but it has not been smooth. We have several compute environments that use scalable NFS storage from a very large, well-known company. We keep up to date with the storage server software. But when we switched to NFS 4.0, we encountered many major bugs (at least one bug was completely new), and got to the point where faculty were very unhappy.

It is also worth noting, reusing UIDs can be very dangerous. Even if your NFS server is running NFS 4, and sending names over the connection, the files are still being stored on the disk with a UID. So, if that UID eventually becomes linked to a different person, then that person might have access to old files that they should not be able to see.

I should also note, everything I said does not mean that we are ignoring cloud. In fact, on Wednesday two of my co-workers will be giving a session on what we are doing in cloud. But traditional, multi-user environments are still going strong, because that is still the most cost-effective option for most workloads. And that requires a single, authoritative source for UIDs.

Please don’t hard-code assumptions

Posted Jan 29, 2019 7:29 UTC (Tue) by drag (guest, #31333) [Link] (1 responses)

Your feature request sounds very reasonable.

Most people's experience with Linux is just using it as server platform and for that case people have long since moved away from using unix account management for end users for large scale internet stuff. Providing Unix workstation and shell accounts to people on such a large scale is very unusual. It's impressive that it works as well as it sounds like it does.

Cloud computing can probably help some what by reducing costs. Even though the per-resource expensive may be higher the convenience it offers to end users ends up saving money; when it works out. Traditionally for most enterprise environments getting access to compute resources involves a lengthy ritual involving getting the attention and approval of some sysadmin somewhere, typically via some ticketing system. It's a painful and time consuming process were you are forced to justify your need in front of some jaded sysadmin looking for some excuse to shoot you down or change how you want to do things. People end up hanging on to resources once they get them because of the effort required to obtain them in the first place. Were as when a cloud is done right users are provided a budget they can use to spin up resources in mere moments. When people can spin up dozens of instances in seconds using whatever tool they prefer then it's no longer a big deal to release those resources when you are finished with them. Especially when it's their own budget.

Obviously, though, this isn't a solution to the UID issue if applications and environment dictates shared access to Unix systems.

Please don’t hard-code assumptions

Posted Jan 29, 2019 9:32 UTC (Tue) by nilsmeyer (guest, #122604) [Link]

> Most people's experience with Linux is just using it as server platform and for that case people have long since moved away from using unix account management for end users for large scale internet stuff. Providing Unix workstation and shell accounts to people on such a large scale is very unusual. It's impressive that it works as well as it sounds like it does.

Indeed. Most environments I worked in used static accounts, typically deployed using something like Ansible or Puppet. This of course has other issues.

> Cloud computing can probably help some what by reducing costs. Even though the per-resource expensive may be higher the convenience it offers to end users ends up saving money; when it works out.

I've often seen that the costs are a lot higher than projected, especially if you have requirements for spare capacity (HA) and your application doesn't scale well horizontally. You do have a very good point with the time saving for users, it's very easy to overlook that factor.

> Traditionally for most enterprise environments getting access to compute resources involves a lengthy ritual involving getting the attention and approval of some sysadmin somewhere, typically via some ticketing system. It's a painful and time consuming process were you are forced to justify your need in front of some jaded sysadmin looking for some excuse to shoot you down or change how you want to do things.

Not only a sysadmin but also often someone who is actually allowed to spend money, even if it's not "real money" in a sense that the hardware is already paid for. I would say though that it may often be advisable to fix the sysadmins or remove them from the process. This BOFH obstructionist attitude that some people bring really isn't helping things - of course that's usually an issue with overall corporate culture.

> Were as when a cloud is done right users are provided a budget they can use to spin up resources in mere moments. When people can spin up dozens of instances in seconds using whatever tool they prefer then it's no longer a big deal to release those resources when you are finished with them. Especially when it's their own budget.

I agree but the caveat "done right" of course applies, and this is where it often gets hairy since some organizations don't like to spend resources on better tooling. Then you end up with a lot of unused capacity dangling around, budgets being depleted through carelessness and mistakes or things end up breaking when someone pulls the plug once the budget is spent.

Please don’t hard-code assumptions

Posted Jan 29, 2019 22:33 UTC (Tue) by bfields (subscriber, #19510) [Link] (1 responses)

"version 4 does use names instead of IDs"

At the NFS layer, yes, at the RPC layer, no. NFSv4 can use strings when getting and setting attributes like file owners or groups or ACLs. At the RPC layer (which is what identifies who is performing a given operation) it still uses numeric IDs, unless you're using kerberos. The NFSv4 design pretty much assumed everyone would want kerberos.

You may already know that, and it's a bit of a digression, apologies. But it causes frequent confusion.

Please don’t hard-code assumptions

Posted Jan 31, 2019 17:40 UTC (Thu) by akkornel (subscriber, #75292) [Link]

Ah, I did not know that! Thank you very much for the clarification; I will have to remember that.

Please don’t hard-code assumptions

Posted Jan 29, 2019 23:26 UTC (Tue) by intgr (subscriber, #39733) [Link] (3 responses)

> I deal with several clusters that use Stanford Central LDAP for account info, and our UID numbers have gotten pretty high. So much so that it overlaps the range systemd uses for dynamic service UIDs.
> The biggest annoyance is that the UID range is hard-coded

To me, static non-configurable UID ranges in this case actually sounds like a feature: that way you can adapt your own UID allocation to reliably avoid that range. Although sure, it's probably a major inconvenience to do the migration.

It seems the situation would be a worse if the dynamic UID range could vary from machine to machine, that way the collisions would be more surprising and harder to avoid.

Please don’t hard-code assumptions

Posted Jan 31, 2019 9:56 UTC (Thu) by jschrod (subscriber, #1646) [Link] (2 responses)

> that way you can adapt your own UID allocation to reliably avoid that range.

That is my major issue with systemd: That it postulates that everybody else shall adapt to its conventions, w/o configurability, even if there other conventions exist for a very long time.

Another sign of this mindset is the refusal to add proper environment variable support to service unit files, enabling the relocation of directory trees according to local setup standards. (E.g., not wanting to place your PostgreSQL database files in /var/lib/pgsql, or managing multiple installation of services.)

Please don’t hard-code assumptions

Posted Jan 31, 2019 17:49 UTC (Thu) by akkornel (subscriber, #75292) [Link]

> > that way you can adapt your own UID allocation to reliably avoid that range.

>That is my major issue with systemd: That it postulates that everybody else shall adapt to its conventions, w/o configurability, even if there other conventions exist for a very long time.

This is also pretty difficult here. In our case, Central LDAP is used by groups throughout the University, and getting a complete list of people who use the UIDs is difficult, because the UID attribute is one that is available without needing to authenticate.

So, if we wanted to change someone's UID, we would have to perform a large outreach campaign, which still wouldn't catch everyone. Then, on the day of the change, everyone involved would trigger a big series of `find -user … -exec chown new_uid {} \;` commands, across all of their file storage. Oh, and you'd have to ensure that the user isn't logged in _anywhere_ while the change is done.

Now, in a normal organization, there would be a manager somewhere up the hierarchy who is in charge, and who would tell people "you have to do this, on this schedule". Or, you will have a corporate IT department who sets the policies. Here, the common manager is the University President. Forcing a move like this would burn a fair amount of 'political capital'.

Plus, and this may sound petty, but we were there first. What I mean is, we allocated those UIDs before systemd picked that range to use.

So I hope you can understand now why "adapt[ing] your own UID allocation" is pretty difficult to do here.

Please don’t hard-code assumptions

Posted Jan 31, 2019 22:32 UTC (Thu) by rahulsundaram (subscriber, #21946) [Link]

> Another sign of this mindset is the refusal to add proper environment variable support to service unit files

Not sure what you mean? There are multiple ways to expose environment variables to services. The most straightforward one being

https://coreos.com/os/docs/latest/using-environment-varia...

What more is required here?

Please don’t hard-code assumptions

Posted Jan 30, 2019 4:41 UTC (Wed) by filbranden (guest, #87848) [Link] (1 responses)

To be fair, there is a workaround for if you have uids in that uid range (flocking files named after the uids, which granted doesn't really scale that well and requires you to keep a locking process around for it.)

Also, most of the developers there agreed that there should be some dynamic configs for your particular case. (They haven't been implemented yet, but I find there's agreement that there should be some.) So I imagine your problem will get solved eventually.

The uid range clash is mainly for systemd Dynamic Users and that feature is not really being used in the wild yet (there are some more large issues with it, such as D-Bus handling of the dynamic owner user/group of the service.) So just not using any Dynamic Users for now is definitely an option for you too.

In short, this should be addressed in time.

Please don’t hard-code assumptions

Posted Jan 31, 2019 17:58 UTC (Thu) by akkornel (subscriber, #75292) [Link]

> To be fair, there is a workaround for if you have uids in that uid range (flocking files named after the uids, which granted doesn't really scale that well and requires you to keep a locking process around for it.)

Yeah, from what I could tell, I would need to have one or more processes that would maintain `flock()` 2,498 files. That seemed like a bit of a stretch.

> Also, most of the developers there agreed that there should be some dynamic configs for your particular case. (They haven't been implemented yet, but I find there's agreement that there should be some.) So I imagine your problem will get solved eventually.

Ah, thank for for the insight! I wasn't really sure because the GitHub issue is still tagged 'needs-discussion'. It hasn't yet been tagged anything else (like 'rfe'), so my impression was that there was not yet a consensus (or agreement), and that my RFE was still at risk for being closed.

> The uid range clash is mainly for systemd Dynamic Users and that feature is not really being used in the wild yet (there are some more large issues with it, such as D-Bus handling of the dynamic owner user/group of the service.) So just not using any Dynamic Users for now is definitely an option for you too.

Indeed. My biggest concern is that, the longer this takes, the harder it will be to get any change brought back into the distros we use.