Please don’t hard-code assumptions

Posted Jan 29, 2019 6:36 UTC (Tue) by akkornel (subscriber, #75292)
In reply to: Please don’t hard-code assumptions by luya
Parent article: Systemd as tragedy

Even in today's modern world, there are still places where you want a single view of UID numbers.

If you have a shared computing environment, then you want all of your systems to have identical copies of account information. Our largest computing environment is Sherlock, which has at least a thousand compute nodes, and multiple thousands of users. So we need to be sure that each user has the same UID across all systems.

Of course, you could just maintain your own LDAP server. But that is extra infrastructure, and not everybody is an OpenLDAP expert. So, the Central LDAP service is run on good hardware, with good people who know OpenLDAP, with support from the company that has the people who develop OpenLDAP. And so, that is where the UID number is allocated.

It is worth noting, there are many environments where NFS version 3 is still in use. For NFS version 3, the protocol only works with UID and GID, so both client and server must be working off of the same list of accounts. Yes, I know NFS version 4 does use names instead of IDs, and many of our compute environments are running version 4, but it has not been smooth. We have several compute environments that use scalable NFS storage from a very large, well-known company. We keep up to date with the storage server software. But when we switched to NFS 4.0, we encountered many major bugs (at least one bug was completely new), and got to the point where faculty were very unhappy.

It is also worth noting, reusing UIDs can be very dangerous. Even if your NFS server is running NFS 4, and sending names over the connection, the files are still being stored on the disk with a UID. So, if that UID eventually becomes linked to a different person, then that person might have access to old files that they should not be able to see.

I should also note, everything I said does not mean that we are ignoring cloud. In fact, on Wednesday two of my co-workers will be giving a session on what we are doing in cloud. But traditional, multi-user environments are still going strong, because that is still the most cost-effective option for most workloads. And that requires a single, authoritative source for UIDs.

Please don’t hard-code assumptions

Posted Jan 29, 2019 7:29 UTC (Tue) by drag (guest, #31333) [Link] (1 responses)

Your feature request sounds very reasonable.

Most people's experience with Linux is just using it as server platform and for that case people have long since moved away from using unix account management for end users for large scale internet stuff. Providing Unix workstation and shell accounts to people on such a large scale is very unusual. It's impressive that it works as well as it sounds like it does.

Cloud computing can probably help some what by reducing costs. Even though the per-resource expensive may be higher the convenience it offers to end users ends up saving money; when it works out. Traditionally for most enterprise environments getting access to compute resources involves a lengthy ritual involving getting the attention and approval of some sysadmin somewhere, typically via some ticketing system. It's a painful and time consuming process were you are forced to justify your need in front of some jaded sysadmin looking for some excuse to shoot you down or change how you want to do things. People end up hanging on to resources once they get them because of the effort required to obtain them in the first place. Were as when a cloud is done right users are provided a budget they can use to spin up resources in mere moments. When people can spin up dozens of instances in seconds using whatever tool they prefer then it's no longer a big deal to release those resources when you are finished with them. Especially when it's their own budget.

Obviously, though, this isn't a solution to the UID issue if applications and environment dictates shared access to Unix systems.

Please don’t hard-code assumptions

Posted Jan 29, 2019 9:32 UTC (Tue) by nilsmeyer (guest, #122604) [Link]

> Most people's experience with Linux is just using it as server platform and for that case people have long since moved away from using unix account management for end users for large scale internet stuff. Providing Unix workstation and shell accounts to people on such a large scale is very unusual. It's impressive that it works as well as it sounds like it does.

Indeed. Most environments I worked in used static accounts, typically deployed using something like Ansible or Puppet. This of course has other issues.

> Cloud computing can probably help some what by reducing costs. Even though the per-resource expensive may be higher the convenience it offers to end users ends up saving money; when it works out.

I've often seen that the costs are a lot higher than projected, especially if you have requirements for spare capacity (HA) and your application doesn't scale well horizontally. You do have a very good point with the time saving for users, it's very easy to overlook that factor.

> Traditionally for most enterprise environments getting access to compute resources involves a lengthy ritual involving getting the attention and approval of some sysadmin somewhere, typically via some ticketing system. It's a painful and time consuming process were you are forced to justify your need in front of some jaded sysadmin looking for some excuse to shoot you down or change how you want to do things.

Not only a sysadmin but also often someone who is actually allowed to spend money, even if it's not "real money" in a sense that the hardware is already paid for. I would say though that it may often be advisable to fix the sysadmins or remove them from the process. This BOFH obstructionist attitude that some people bring really isn't helping things - of course that's usually an issue with overall corporate culture.

> Were as when a cloud is done right users are provided a budget they can use to spin up resources in mere moments. When people can spin up dozens of instances in seconds using whatever tool they prefer then it's no longer a big deal to release those resources when you are finished with them. Especially when it's their own budget.

I agree but the caveat "done right" of course applies, and this is where it often gets hairy since some organizations don't like to spend resources on better tooling. Then you end up with a lot of unused capacity dangling around, budgets being depleted through carelessness and mistakes or things end up breaking when someone pulls the plug once the budget is spent.

Please don’t hard-code assumptions

Posted Jan 29, 2019 22:33 UTC (Tue) by bfields (subscriber, #19510) [Link] (1 responses)

"version 4 does use names instead of IDs"

At the NFS layer, yes, at the RPC layer, no. NFSv4 can use strings when getting and setting attributes like file owners or groups or ACLs. At the RPC layer (which is what identifies who is performing a given operation) it still uses numeric IDs, unless you're using kerberos. The NFSv4 design pretty much assumed everyone would want kerberos.

You may already know that, and it's a bit of a digression, apologies. But it causes frequent confusion.

Please don’t hard-code assumptions

Posted Jan 31, 2019 17:40 UTC (Thu) by akkornel (subscriber, #75292) [Link]

Ah, I did not know that! Thank you very much for the clarification; I will have to remember that.