Program names and "pollution"
A Linux user's $PATH likely contains well over a thousand different commands that were installed by various packages. It's not immediately obvious which package is responsible for a command with a generic name, like createuser. There are ways to figure it out, of course, but perhaps it would make sense for packages like PostgreSQL, which is responsible for createuser, to give their commands names that are less generic—and more easily disambiguated—such as pg_createuser. But renaming commands down the road has "backward compatibility problems" written all over it, as a recent discussion on the pgsql-hackers mailing list shows.
Someone with the unlikely name of "Fred .Flintstone" started things off
with a post complaining that PostgreSQL
"pollutes the file system
" with generic program names. The
post suggested that names either be
prefixed with "pg_" or that they become subcommands of a wrapper
command,
à la Git: postgresql createuser. It is not the first
time that the topic has been raised, Andreas Karlsson pointed to this
thread from 2008; Tom Lane reached further
back and pointed to a discussion
in 1999.
At issue are a handful of commands that come with PostgreSQL and are potential sources of confusion for users: createdb, dropuser, vacuumdb, and so on. As Lane pointed out, though, the outcomes from the previous discussions make it pretty clear what will probably happen this time as well:
One command that was not mentioned in the early going, perhaps because it is so widely used in scripts, is initdb. Julien Rouhaud thought its name was more confusing than some of the others that had been mentioned, but Lane disagreed:
That led Alvaro Herrera to suggest making symbolic links from pg_* to, at least, createuser and dropuser. That would cause no change but, at some point, a deprecation warning could be printed for the unadorned versions and, eventually, they could perhaps be dropped entirely. But Tomas Vondra wondered what problem was truly being solved:
He went on to note that there are multiple ways for users to figure out what some random binary does, including man, -h or --help flags, or asking the package manager. Most seem to agree that some of the names are too generic (createuser and dropuser in particular), but there are not likely to be name conflicts with other tools since PostgreSQL has 20+ years of seniority at this point. Even though there is support for some kind of rename, doing so will cause pain—and not for the PostgreSQL project. As Lane put it:
A suggestion from Chris Travers that
perhaps createuser and dropuser should just be removed
led to
concerns that leaving it to users to write
their own
shell scripts might result in security problems. The psql command
can be used to create users, but the way to do so is somewhat
non-obvious—more obvious
alternatives could lead to SQL injection holes. But Lane wondered what the overarching plan is; will
createuser actually be removed at some point, especially given that the
postmaster command has been deprecated for more than 12 years but
still has not been removed? Peter Eisentraut agreed that deprecation was
not the project's strong suit, so: "How about we compromise in this
thread and remove postmaster and leave
everything else as is. ;-)
"
Herrera argued that clearing up the
confusion should be done as a service to future users. "The implicit
argument here is that existing users are a larger
population than future users. I, for one, don't believe that.
"
Some seem to think that simply adding symbolic links for the pg_*
variants might help
things down the road, however. Herrera suggested adding the links and leaving it for
a fictional
future AI to do the deprecation. David Steele concurred with that plan:
"+1 to tasking Skynet with removing deprecated features. Seems like it
would save a lot of arguing.
"
Jokes aside, it doesn't really seem like the idea is going any further than it did in 2008 or 1999. As Lane repeatedly said, if the project were starting from scratch, surely other choices would be made; at this point, though, there is two decades of precedent, scripts, and muscle memory to overcome. That sentiment likely affects plenty of other projects, especially those that have been around for many years—free software grew up in a much smaller pool.
For the future, though, we will probably see less of these kinds of problems. New projects are generally thinking about "pollution" and finding ways to make it clear which binaries go with a particular package/project. That is a good thing since the number of packages we install is only going to grow.
Posted Apr 2, 2019 20:16 UTC (Tue)
by mrshiny (guest, #4266)
[Link] (18 responses)
Posted Apr 2, 2019 20:26 UTC (Tue)
by ntnn (guest, #109693)
[Link] (1 responses)
Also it's not hard to rename commands. Rename them in one major version, provide the old ones as symbolic links and display warnings when they're called by their old names that they're going to be unavailable in the next version.
Will it break some scripts somewhere? Yeah. But then it's the fault of the admins running it for upgrading a major version without checking the changelog.
Posted Apr 2, 2019 20:29 UTC (Tue)
by ntnn (guest, #109693)
[Link]
Posted Apr 3, 2019 8:18 UTC (Wed)
by kandreas (guest, #131050)
[Link] (9 responses)
Posted Apr 3, 2019 10:15 UTC (Wed)
by lamawithonel (subscriber, #86149)
[Link] (4 responses)
1: Move existing commands to pg_* and create symlinks.
Posted Apr 3, 2019 10:24 UTC (Wed)
by lamawithonel (subscriber, #86149)
[Link]
Posted Apr 3, 2019 15:19 UTC (Wed)
by kh (guest, #19413)
[Link] (1 responses)
Posted Apr 4, 2019 13:25 UTC (Thu)
by lamawithonel (subscriber, #86149)
[Link]
Posted Apr 3, 2019 17:19 UTC (Wed)
by rotty (guest, #14630)
[Link]
This build system feature, though, would probably helpful on source-based systems, though I'm not familiar with those.
Posted Apr 7, 2019 0:13 UTC (Sun)
by rossmohax (guest, #71829)
[Link] (3 responses)
Posted Apr 9, 2019 1:40 UTC (Tue)
by k8to (guest, #15413)
[Link] (2 responses)
Posted Apr 9, 2019 15:16 UTC (Tue)
by nix (subscriber, #2304)
[Link] (1 responses)
Posted Apr 9, 2019 23:35 UTC (Tue)
by karkhaz (subscriber, #99844)
[Link]
Posted Apr 7, 2019 23:07 UTC (Sun)
by jschrod (subscriber, #1646)
[Link] (5 responses)
And while we're at it:
On a serious note, it's not always as practical to change established namings even if it would be sensible.
Cheers, Joachim
PS: These two changes/gripes were mentioned by Dennis Ritchie at a conference dinner where I had the occasion to be at the same table. So please don't blame me for bringing them up. :-) :-)
Posted Apr 8, 2019 9:01 UTC (Mon)
by anselm (subscriber, #2796)
[Link] (4 responses)
According to legend, Ken Thompson was once asked what he would do differently if he were to redo Unix from scratch. His answer was “I'd spell ‘create()’ with an ‘e’ at the end.”
OTOH, if your main means of interaction with the computer is a 110-baud teletypewriter, you may be forgiven for wanting to make every character count. In that situation, using “?” as an error message starts making a lot of sense.
Posted Apr 11, 2019 9:14 UTC (Thu)
by mgedmin (subscriber, #34497)
[Link] (3 responses)
Posted Apr 11, 2019 19:42 UTC (Thu)
by jengelh (guest, #33263)
[Link]
There is the "localtime" call in V7 (at least whatever copy I could grab off github), which is way above 6 characters.
Posted Apr 11, 2019 19:51 UTC (Thu)
by jengelh (guest, #33263)
[Link] (1 responses)
And as for V1, there is the "putchar" function.
Posted Apr 12, 2019 1:08 UTC (Fri)
by neilbrown (subscriber, #359)
[Link]
Was there?? I thought "putchar" was a macro for fputc(c, stdout), but I haven't checked V1. Rules for macro names would be different from those for linker symbols.
Posted Apr 2, 2019 20:41 UTC (Tue)
by HenrikH (subscriber, #31152)
[Link] (3 responses)
Well for one, for us that do not work with psql on a regular basis the problem is usually the reverse, i.e figuring out which commands to actually use to admin the database. The initial idea is usually to type psql and then try to tab complete which of course don't work for "createuser" and friends.
Posted Apr 2, 2019 23:58 UTC (Tue)
by KaiRo (subscriber, #1987)
[Link] (2 responses)
Posted Apr 3, 2019 8:10 UTC (Wed)
by cpitrat (subscriber, #116459)
[Link]
Posted Apr 3, 2019 8:34 UTC (Wed)
by giggls (subscriber, #48434)
[Link]
Posted Apr 2, 2019 22:16 UTC (Tue)
by buck (subscriber, #55985)
[Link] (6 responses)
i seem to recall, back in the day, that packages like postgresql would be autoconf-ed to installing someplace like /usr/local/pgsql (and it seems it still is, by default:
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;...
) and then the programs would go in a bin subdirectory thereof etc. it's only since the FHS and distributions dumping everything that conceivably could be run into /usr/bin or /usr/sbin that it's become the case that there are thousands of things in /usr/bin that one has no clue about the purpose of
notwithstanding the slightly pejorative connotation of "dumping", i don't mean to cast blame. surely it makes sense for useful binaries not to be tucked away where nobody would know where to look for them, but something like environment modules (
http://modules.sourceforge.net/
) and nix environments achieve that goal in a somewhat more determinative fashion
Posted Apr 3, 2019 0:01 UTC (Wed)
by KaiRo (subscriber, #1987)
[Link] (1 responses)
Posted Apr 3, 2019 5:23 UTC (Wed)
by rsidd (subscriber, #2582)
[Link]
Seriously, why not do this? Also, I guess newer systems like flatpak and snappy would basically be doing this.
Posted Apr 3, 2019 10:07 UTC (Wed)
by civodul (guest, #58311)
[Link] (1 responses)
More generally, Nix and Guix do away with the global program name space that /usr/bin is. You can create environments on the fly with guix environment containing exactly the packages you want, so name clashes become a non-issue.
Posted Apr 4, 2019 5:37 UTC (Thu)
by tzafrir (subscriber, #11501)
[Link]
[1] Or however this is called.
Posted Apr 3, 2019 15:57 UTC (Wed)
by jccleaver (guest, #127418)
[Link]
This really is *primarily* a postgresql problem. I'm hard pressed to think of any conflict or overly-vague script or binary names I've experienced outside of PGSQL other than one conflict with the "maildir" utility back in the early 2000's. Most projects know to keep anything likely to be placed into a $PATH as unique as possible and seem pretty reasonable about it. That's not to say that growth isn't a problem, but that's more a function of Fedora's UsrMove and some people pushing for a conflation of Bin and Sbin than anything else.
One thing I really would like to see though is more use of libexec where appropriate. If these binaries aren't really intended for execution by humans except in unusual, debugging situations, they don't need to be in $PATH. Move them to /usr/libexec/ where they belong.
Posted Apr 9, 2019 22:43 UTC (Tue)
by flussence (guest, #85566)
[Link]
The choice of directory location is unfortunately vague and didn't age well, but at least it's an attempt.
Posted Apr 3, 2019 2:33 UTC (Wed)
by rfunk (subscriber, #4054)
[Link] (8 responses)
Posted Apr 3, 2019 7:35 UTC (Wed)
by cavok (subscriber, #33216)
[Link]
Posted Apr 3, 2019 14:48 UTC (Wed)
by cortana (subscriber, #24596)
[Link] (5 responses)
Posted Apr 3, 2019 14:56 UTC (Wed)
by rfunk (subscriber, #4054)
[Link] (2 responses)
Posted Apr 4, 2019 17:28 UTC (Thu)
by zdzichu (subscriber, #17118)
[Link] (1 responses)
Posted Apr 9, 2019 22:54 UTC (Tue)
by flussence (guest, #85566)
[Link]
/me glares in pulseaudio's general direction...
Posted Apr 4, 2019 16:42 UTC (Thu)
by perennialmind (guest, #45817)
[Link]
Well that's just awful.
Whatever anchor they choose, I'll add an alias of my own.
Posted Apr 4, 2019 17:23 UTC (Thu)
by perennialmind (guest, #45817)
[Link]
Those people are already having to adapt. It's been deprecated in util-linux since util-linux v2.23, released in 2013 and hasn't been built by default since v2.29, released in 2016. After 2016, Debian, Ubuntu and Slackware were the outliers still shipping
Posted Apr 7, 2019 20:52 UTC (Sun)
by Karellen (subscriber, #67644)
[Link]
Package all the client commands into /usr/lib/postgresql/, add a "/usr/bin/pgctl" or similar utility which makes "pgctl <command> [args...]" redirect to "/usr/lib/postgresql/<command> [args...]", and then add a postgresql-legacy package which adds symlinks for "/usr/bin/<command>" to "/usr/lib/postgresql/<command>"
Do it in "sid" as soon as the freeze ends after the next release, see what breaks, and revert back to the current system if it's too bad before the following release. That's what sid is for :-)
Posted Apr 3, 2019 5:06 UTC (Wed)
by arekm (guest, #4846)
[Link]
Posted Apr 3, 2019 7:11 UTC (Wed)
by lkundrak (subscriber, #43452)
[Link] (4 responses)
Posted Apr 3, 2019 10:00 UTC (Wed)
by epa (subscriber, #39769)
[Link] (3 responses)
Posted Apr 3, 2019 13:19 UTC (Wed)
by mathstuf (subscriber, #69389)
[Link] (2 responses)
Posted Apr 3, 2019 13:54 UTC (Wed)
by epa (subscriber, #39769)
[Link] (1 responses)
That doesn't work if the main 'git' program is more than just a simple wrapper for subcommands -- if there were some hypothetical 'git -xyz' that did something by itself. And the shell's message would be much more basic than the help text printed by the 'git' binary. But there is always the top-level manual page.
Posted Apr 3, 2019 17:03 UTC (Wed)
by martinfick (subscriber, #4455)
[Link]
Posted Apr 3, 2019 13:04 UTC (Wed)
by geert (subscriber, #98403)
[Link] (1 responses)
Only recently did I discover this was the cause of mysteriously appearing files "/var/lib/initramfs-tools/v." and "/boot/initrd.img-v.".
Posted Apr 3, 2019 16:42 UTC (Wed)
by BenHutchings (subscriber, #37955)
[Link]
Posted Apr 3, 2019 21:12 UTC (Wed)
by david.a.wheeler (subscriber, #72896)
[Link]
At first I thought "A Linux user's $PATH likely contains well over a thousand different commands..." meant "a thousand different directories". That is highly unlikely, and not what the author meant anyway (but I note it in case someone else has that misunderstanding).
Trying to figure out "what package installed this" from just the name is a terrible idea. There are already package management tools that accurately provide this information; please use them. Trying to guess from the name will be wrong too often to be useful.
I think if you are creating new programs it'd be wise to have easily disambiguated names. Not because that should replace a package manager, but because that will avoid a lot of unnecessary heartache and confusion if the same name has multiple meanings.
I think it'd be good if PostgreSQL created synonyms with new non-generic names and used them uniformly in new scripts, while retaining the old (deprecated) names for a long time for backwards compatibility. Not because you're trying to replace a package manager, but because someone who never uses PostgreSQL is likely (sooner or later) to use the same names. If you supported the "old" and "new" names for a long time (say 7+ years), it could be relatively painless.
Posted Apr 4, 2019 13:29 UTC (Thu)
by simosx (guest, #24338)
[Link]
In terms of usability, they use should have a single entry command like "pg" (from PostGress).
As it is now, you need to remember "pg_lsclusters", which is wrong in many ways. It is the wrong order of words. "cluster" is plural because otherwise the command would look silly.
In addition, Postgress can provide shell completion (bash-completion) rules to help users with the commands.
It is up to Postgress to plan for a transition and make the CLI commands somewhat better.
Posted Apr 7, 2019 0:40 UTC (Sun)
by tlw (guest, #31237)
[Link]
Posted Apr 9, 2019 20:11 UTC (Tue)
by mirabilos (subscriber, #84359)
[Link]
Program names and "pollution"
Program names and "pollution"
Once that one rolls around remove the symlinks and you're done.
Program names and "pollution"
Program names and "pollution"
Program names and "pollution"
1a: (optional) Create a central binary with sub-commands, à la `git` or `systemctl`, and move all the pg_* commands under that. Replace symlinks with shim binaries.
2: Start printing a warning if commands are called by their legacy names, but allowi users to silence the warnings with an option flag.
3: Create a build path to build only the symlinks/shims, and a non-default way to build without them. This gives packagers an easy way to package them separately, and users a way to test without them. It also gives packagers some control over how and when they remove the symlinks/shims.
4: Make deprecation warning silence flags NOOP.
5: Inspired by rfunk's comment, move the symlinks/shims out of $PATH, to something like /usr/share/postgres/bin, and declare them unsupported. Inform users that they can add the new directory to their $PATH if they really need them for some reason, but that you won't accept bug reports against them. Maintain this for at least another major release, possibly several more.
6: Rejoice when our AI overloads remove all the symlinks/shims.
Program names and "pollution"
Program names and "pollution"
Program names and "pollution"
Program names and "pollution"
Program names and "pollution"
Program names and "pollution"
Program names and "pollution"
Program names and "pollution"
Program names and "pollution"
- Rename creat() to create().
- Clean up that awful abbreviation "pwd" to
have something to do with passwords instead of meaning "print working directory", as any newbie would expect.
Program names and "pollution"
These two changes/gripes were mentioned by Dennis Ritchie
Program names and "pollution"
Program names and "pollution"
Program names and "pollution"
Program names and "pollution"
Program names and "pollution"
Program names and "pollution"
I so often forget that Postgres command names assume that they are the only DB around that I curse it often enough because I forget again that what I wanted was a generic "dropdb" when I was looking for something tab-completing at "pg" or "postgre".
(And yes, dropdb is what I need most often as it's usually during testing where some SQLAlchemy or similar creates the DB but for testing my code from scratch I need to remove the DB.)
Program names and "pollution"
Program names and "pollution"
Program names and "pollution"
Program names and "pollution"
Program names and "pollution"
Indeed, with Guix and similarly with Nix, readlink -f `which createuser` returns the absolute file name of createuser; its parent directory contains the package name, so I can see that it comes from PostgreSQL.
Program names and "pollution"
Program names and "pollution"
Program names and "pollution"
Program names and "pollution"
Program names and "pollution"
Program names and "pollution"
Program names and "pollution"
Program names and "pollution"
Program names and "pollution"
pgctl
Program names and "pollution"
Program names and "pollution"
pg
would otherwise be the obvious choice. It's certainly the first thing I'd try tabbing at the console. Actually, thinking back, I think I did exactly that years ago and was peeved to realize what I'd stepped in. Postgres deserves a two-letter namespace prefix. Thankfully most distros have cleared the stinking pile off the path long ago and Debian derivatives will soon.
pg createuser
just feels right.
Program names and "pollution"
/usr/bin/pg
. After Buster is releases as as Debian 10 this year, it'll just be Slackware.
Program names and "pollution"
Program names and "pollution"
Program names and "pollution"
Program names and "pollution"
Program names and "pollution"
Program names and "pollution"
Program names and "pollution"
Program names and "pollution"
Apparently Ubuntu already has a command named "linux-version", which it wants to call every time the initramfs-tools package is upgraded...
Program names and "pollution"
Program names and "pollution"
Program names and "pollution"
The Postgress community should act and fix this issue without the need to be told to.
The user would need to run "pg" to get a helpful reminder screen with the available subcommands.
If the user wants to perform cluster tasks, they would run "pg cluster" and get the list of "cluster" subcommands.
Like, "pg cluster create", "pg cluster ls", etc.
This is what GNU stow is for.
Program names and "pollution"
Program names and "pollution"