|
|
Log in / Subscribe / Register

Development quote of the week

And what I just don't understand about this whole discussion: We're talking about people who want to be frozen in time for 5 years straight during this "maintenance support" window by the vendor (whom they are paying), with only access to security fixes. But somehow they do want to run the latest Postgres Major release, even though the one that they had running still receives bug fixes and security fixes. I just don't understand who these people are. Why do they care about having no changes to their system to avoid breakage as much as possible, except for their piece of primary database software, of which they're happily running the bleeding edge.
Jelte Fennema-Nio



to post comments

I used to support folks like that

Posted Jun 19, 2025 1:35 UTC (Thu) by davecb (subscriber, #1574) [Link] (23 responses)

They reasoned that if they froze everything else, keeping up with the critical apps would be easier.

Alas, all the variants of "freeze and forget" fail a lot: they're pretty close to "preserve all bugs" (:-))
Paul Stachour explains this in "You Don't Know Jack About Software Maintenance" https://cacm.acm.org/practice/you-dont-know-jack-about-so... (I was his editor)

"preserve all bugs" but avoid new bugs caused by maintenance

Posted Jun 19, 2025 1:57 UTC (Thu) by jreiser (subscriber, #11027) [Link] (21 responses)

> Alas, all the variants of "freeze and forget" fail a lot: they're pretty close to "preserve all bugs" (:-))

This can be a rational choice. Most old bugs have been discovered and dealt with, and the costs are known. Maintenance might introduce new bugs with unknown costs, and some environments cannot tolerate the possibility of unknown costs.

"preserve all bugs" but avoid new bugs caused by maintenance

Posted Jun 19, 2025 2:12 UTC (Thu) by davecb (subscriber, #1574) [Link] (6 responses)

Absolutely true!

At Sun Canada, we got far more complaints from the folks who updated after 3-5 years that we did from the few folks who preferred the bleeding edge. We lived on the bleeding edge ourselves, "eating our own dogfood".

Alas, I lack statistical validation of our clalim that the "freeze" folks had more problems that the "maintain" folks did. The best I can do is point at continuous development, notably of SAAS offerings, where the development community overwhelmingly prefers small changes applied rapidly rather than large changes applied slowly.

"preserve all bugs" but avoid new bugs caused by maintenance

Posted Jun 19, 2025 9:12 UTC (Thu) by Wol (subscriber, #4433) [Link] (5 responses)

> Alas, I lack statistical validation of our clalim that the "freeze" folks had more problems that the "maintain" folks did. The best I can do is point at continuous development, notably of SAAS offerings, where the development community overwhelmingly prefers small changes applied rapidly rather than large changes applied slowly.

Any evidence as to what the *user* community prefers?

That's the root of the problem in all likelihood - the *users* don't like change, the developers are used to living with it. And the "pain threshold" of change also varies with the individual - I've just upgraded Android from v12 to v15, and because I don't tend to use my phone from one full charge to the next, my pain threshold is very low, small changes hurt a lot. A teenager on their phone 24/7 probably wouldn't notice the upgrade - if they'd even let their phone last that long between upgrades...

Cheers,
Wol

"preserve all bugs" but avoid new bugs caused by maintenance

Posted Jun 19, 2025 10:07 UTC (Thu) by farnz (subscriber, #17727) [Link] (3 responses)

I worked at a place that did an internal study on this; first, you need a working feedback path from users to developers. If you don't have a path for users to feed their views back to developers, then it matters not whether you change frequently in small increments, or infrequently in large increments, because users can't express their views, and thus will be ignored regardless.

Second, there needs to be a way to revert to older versions temporarily for testing - users need to be able to bisect when a change came in to usefully report things to you. A report of "this was fine yesterday, and is not today" is a small window of commits to look through; a report of "I haven't used this in six months, but it was fine 6 months ago, and is not now" is a lot to look through. You need to be able to convert the latter class of report into "OK, it was fine 42 days ago, but not 41 days ago" in order to track down exactly which change is the issue.

Third, with the above in place, users strongly prefer frequent small changes to infrequent large changes.

User interviews revealed that the reason for this is that changes fall into three categories (from most common to least common):

  1. Completely unnoticed changes. If your phone gains support for a new VoLTE codec only used in one province in China, and you never communicate with anyone in China, you're not going to notice this. Similarly, if it's gained support for a new type of hearing aid that you don't own, you're not going to notice.
  2. Welcomed improvements. These may not always be visible; for example, a recent change enables my phone to use its microphone to reduce the amount of background noise it transmits on calls when I'm using my headset, which helps my mother understand me when she's using her hearing aids.
  3. Unwanted changes. These are the sticking point for users; there's been a noticeable change, and they want it reverted. The reason for wanting a revert doesn't matter - the point here is that the user does not want this particular change, whether it's because something has regressed, because their phone now gives them an electric shock whenever they answer a call, or because a button has gone from green-gray to a teal-gray and they find it ugly.

With infrequent changes, by the time you report an unwanted change, there's a good chance that welcomed improvements are built on top of it, and it can't be fixed; you've just got to live with it, now, because it's ingrained in the system. With frequent changes, you have decent odds of reporting the problem before something else depends on it, allowing the change to be undone, then redone into a wanted improvement or unnoticed change. Additionally, if the report comes in late, if the user will bisect down to a small window of change, you can at least explain why the change was made, because you can identify the commit(s) that led to it, rather than just shrugging because the change is too old to track down now.

"preserve all bugs" but avoid new bugs caused by maintenance

Posted Jun 20, 2025 23:29 UTC (Fri) by mirabilos (subscriber, #84359) [Link] (2 responses)

> users strongly prefer frequent small changes to
> infrequent large changes.

Definitely not. When I use something… I have the use case of “user of a library” in mind, but this also applies to “user of a smartphone” (an area where I *gladly* don’t develop in), I want the damn thing to work the same and not break my workflow every few days.

"preserve all bugs" but avoid new bugs caused by maintenance

Posted Jun 20, 2025 23:51 UTC (Fri) by davecb (subscriber, #1574) [Link]

I agree, but you're talking about user interfaces. I hate frivolous changes there, too.

Infrastructure? There I want controlled evolution, as described by Stachour: https://cacm.acm.org/practice/you-dont-know-jack-about-so...

qv, folks!

"preserve all bugs" but avoid new bugs caused by maintenance

Posted Jun 22, 2025 8:28 UTC (Sun) by farnz (subscriber, #17727) [Link]

That's why you need the feedback loop; a change that breaks your workflow is an unwanted change, by definition, and you need the feedback loop to tell the developers that a given change has broken your workflow, and get them to fix things for you.

Without the feedback loop, you're stuffed either way - your workflow will be broken as part of making changes the developers want to make, because they're ignoring your needs. With it, frequent changes mean that when they do break your workflow, you can immediately feed back to them, and get them to fix their change.

That way, you get the change that reduces power consumption (hence increases time between charges) on your smartphone, or the library change that reduces CPU use and memory consumption, but not the change that moves all the buttons around, or the change that requires you to supply an allocator to the library's specific internal needs (not just a normal malloc implementation).

"preserve all bugs" but avoid new bugs caused by maintenance

Posted Jun 19, 2025 10:37 UTC (Thu) by davecb (subscriber, #1574) [Link]

yes, that's more what I would want to know, rather than what the developers found

"preserve all bugs" but avoid new bugs caused by maintenance

Posted Jun 19, 2025 13:48 UTC (Thu) by ballombe (subscriber, #9523) [Link] (7 responses)

In HPC it is common that the OS is not upgraded during the lifetime of the cluster, because it requires support for proprietary interconnect and hardware that might not be available for any newer OS version.

Fundamentally, the problem is that you cannot upgrade the _hardware_.
And from the point of view of the system vendor, after two years, your hardware is out of sale and obsolete. There is no point in porting any drivers to a newer OS version.

And if you cannot upgrade the kernel you often cannot upgrade the userspace.

So as a user of HPC system, you are often stuck with dealing with very old OS. The one I have access run RH 7.6.

Re: HPC Clusters

Posted Jun 19, 2025 14:09 UTC (Thu) by davecb (subscriber, #1574) [Link] (6 responses)

Oh goodness gracious!

At a recent customer, they updated their big-data clusters with considerable regularity. They only had two hardware suppliers, so that wasn't particularly risky: both used rather plain-vanilla components, and we found Linux had good forward compatibility with anything we used.

They were slow to upgrade the OS on the main (non-cluster) program at one point, and it took longer than they expected. Conversely, the update from Centos 8 to an alternative happened soon thereafter, and appeared to be both fast and trouble-free.

Personally, I used Fedora at the time, updated it with wild abandon and had only one non-fatal glich in a 5-year period. I always try to use the production OS, or a newer version of the production OS, so Fedora was appropriate for RHEL and Centos-using customers. I'm now on non-LTS Ubuntu, for the same reason.

As noted by Wol, this was _not_ something visible to the customers: we were doing SAAS, versioning and staged rollouts, so any problems we had were (almost) invisible.

YMMV!

Re: HPC Clusters

Posted Jun 19, 2025 17:03 UTC (Thu) by ballombe (subscriber, #9523) [Link] (1 responses)

Were they using any proprietary interconnect like infiniband or mellanox, with the assorted proprietary network cards and switches ?

Re: HPC Clusters

Posted Jun 19, 2025 20:05 UTC (Thu) by Paf (subscriber, #91811) [Link]

I'm a bit confused by this - Mellanox seems to have reasonably *good* driver compatibility across kernel versions.

That said, I have known many HPC users to be intolerant of upgrades - for some of them, they buy the machine to do specific things and expect to keep it largely the same until they replace it.

Re: HPC Clusters

Posted Jun 19, 2025 20:56 UTC (Thu) by SLi (subscriber, #53131) [Link] (3 responses)

> Personally, I used Fedora at the time, updated it with wild abandon and had only one non-fatal glich in a 5-year period. I always try to use the production OS, or a newer version of the production OS, so Fedora was appropriate for RHEL and Centos-using customers. I'm now on non-LTS Ubuntu, for the same reason.

How many fatal glitches did you have?

Re: HPC Clusters

Posted Jun 19, 2025 23:07 UTC (Thu) by davecb (subscriber, #1574) [Link] (2 responses)

sqrt(-1) (:-))

Re: HPC Clusters

Posted Jun 20, 2025 10:45 UTC (Fri) by Wol (subscriber, #4433) [Link] (1 responses)

You mean they were all imaginary?

Cheers,
Wol

Re: HPC Clusters

Posted Jun 20, 2025 10:50 UTC (Fri) by davecb (subscriber, #1574) [Link]

Yup!

Only the non-fatal ones were in the real-number space (:-))

"preserve all bugs" but avoid new bugs caused by maintenance

Posted Jun 20, 2025 7:42 UTC (Fri) by taladar (subscriber, #68407) [Link] (5 responses)

What a lot of the people who prefer that approach forget is that backporting fixes is also maintenance and will also introduce new bugs.

"preserve all bugs" but avoid new bugs caused by maintenance

Posted Jun 20, 2025 9:02 UTC (Fri) by farnz (subscriber, #17727) [Link] (4 responses)

The approach that does work is to not backport fixes, either. You just accept the full set of bugs that the system had at installation as the bugs that you have to live with, since you freeze the software stack completely.

This is not viable if that system is going to be connected to a network shared with people you do not trust (e.g. the Internet), but if you've airgapped the system anyway to prevent your data leaking out, it can be made to work. I've seen this done in the past by places like film studios - data in and out of the system goes via digital tape, and there are no network connections between the secured system and the rest of the world.

"preserve all bugs" but avoid new bugs caused by maintenance

Posted Jun 20, 2025 9:52 UTC (Fri) by geert (subscriber, #98403) [Link] (3 responses)

As long as they don't have USB, you're fine ;-)
https://hackaday.com/2025/06/16/an-open-source-justificat...

"preserve all bugs" but avoid new bugs caused by maintenance

Posted Jun 21, 2025 18:19 UTC (Sat) by ballombe (subscriber, #9523) [Link] (2 responses)

Only a small team is allowed to enter the room that host the hardware. On the other hand the local root password is written on the wall. Because if you need to enter that means there is some kind of emergency, and you should
not waste time trying to remember the local root password...

"preserve all bugs" but avoid new bugs caused by maintenance

Posted Jun 23, 2025 7:55 UTC (Mon) by taladar (subscriber, #68407) [Link]

Changing out the wall must be a real pain every time someone gets a glimpse at the password through the open door.

"preserve all bugs" but avoid new bugs caused by maintenance

Posted Jun 23, 2025 10:02 UTC (Mon) by amacater (subscriber, #790) [Link]

Remembering the local password is straightforward if you adopt a sensible policy:

For machines and subsystems that are still being actively worked on to bring them into service, the password is "insecure". For machines that are fully configured and in service, the password is "secure".

If you access a machine and the first password works, you know it's still being worked on and is insecure.

If you access a machine and the second password works, you know you can carry on using it.

In conversation, you can refer to the secure or insecure passwords as necessary: no one outside your team or organisation need ever know differently.

I used to support folks like that

Posted Jun 20, 2025 7:56 UTC (Fri) by nim-nim (subscriber, #34454) [Link]

> They reasoned that if they froze everything else, keeping up with the critical apps would be easier.

Unfortunately software is not designed in a vacuum, if you try to run software from different era together you’re almost certain that the newest components will run in a legacy badly tested mode, because the features they are QA-ed with are not available in the older stack (and that’s the *best* case will well behaving well tested components, the usual case is that you *will* hit bugs somewhere).

> I just don't understand who these people are. Why do they care about having no changes to their system to avoid breakage as much as possible, except for their piece of primary database software, of which they're happily running the bleeding edge.

That’s built-in the way MBAs organised IT teams, with the least visible plumbing parts subcontracted to the cheapest bidder, who has every incentive to build up technical debt where it is not visible (leaving it to the next subcontractor), limiting updating to the shiny surface bits that can be paraded to the customer.

That‘s also why those people are institutionally resistant to any form of non permissive free software, because non permissive free software forces you to make upfront efforts instead of leaving the mess to someone else. One reason Linux has been a huge HPC success is that HPC hardware is expensive enough MBAs understand they can not subcontract it to muppets.

The muppets are currently designing all kinds of baldrick-esque cunning software version locking schemes to re-enable the Java enterprise software “let’s pile technical debt to the sky” trainwreck with modern languages.


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds