Quotes of the week

[Posted January 7, 2015 by corbet]

"New and improved" is only really improved if it also takes backwards compatibility into account, rather than saying "now everybody must do things the new and improved - and different - way"

— Linus Torvalds

The fact that I still have to rattle a tin cup to fix bufferbloat at this point is quite bothersome. With such an epidemic of a problem I really thought the world would have beat a path to our doors long, long ago, and/or start leveraging the plethora of information and code we have put online to go forth and deploy bufferbloat solutions, especially in extreme cases like aircraft, access in the third world, and in remote areas.

— Dave Täht

Quotes of the week

Posted Jan 8, 2015 19:16 UTC (Thu) by dlang (guest, #313) [Link] (12 responses)

On the subject of regressions, Linus posted a much longer comment just around the time that LWN went to press

https://lkml.org/lkml/2015/1/7/807

On Wed, Jan 7, 2015 at 2:14 PM, Catalin Marinas <catalin.marinas@arm.com> wrote:
>
> We need to look back at the point we added timer-based delay about 2.5
> years ago. Prior to commit d0a533b18235d362, platform A reported
> bogomips 300. After that commit, the *same* platform A (not B),
> started reported 6.
>
> Is the above considered user breakage?

Things change. The only thing that is considered "user breakage" is
when something actually doesn't work any more.

That has always been the rule. It's not that the kernel ABI (with all
the system calls, all the /proc files, all the ioctl's, etc) is set in
stone and "sacred". Absolutely anything can be changed, wildly.

But if it turns out that applications (or hardware) that people use
end up breaking noticeably, then *that* is a regression.

And the important part there are those weasel-words: "that people use"
and "noticeably".

For example, a test-suite giving a different result is *not* a
regression, although it should obviously be considered a big red flag.
So if somebody tells you that some test-suite shows that some ABI
changed, at the very least you should be very nervous about things.

But if that same test-suite result is then used in a production
environment as part of some actual user flow, and it breaks that user
model, then it suddenly becomes a regression. So the very definition
of "regression" is not really about the API changing, but about
breaking peoples existing setups. Of course, if you never change any
API that is visible to user space, you can never create that kind of
regression, so they are _related_, but some people confuse the two.
They are still very different.

Similarly, theoretical arguments of "so-and-so wouldn't work after
this change" are just that - theoretical arguments. It's something to
worry about, but it's not an actual *regression* until it causes
problems.

For an extreme example of this: we can remove support for whole
platforms and architectures, and sometimes we do. It clearly
completely breaks support for the hardware in question - but it only
counts as a regression if anybody notices and cares. There may still
be active users of that platform that provably cannot possibly work at
all any more, but if they never upgrade the kernel, then it's still
not a regression.

In this case, pretty much all of /proc/cpuinfo is mainly
"informational". Maybe there are applications that show it, but more
likely you have people who ssh in and just do

cat /proc/cpuinfo

to see what kind of system they are running on. That's the main point
of much of /proc, and things like /proc/cpuinfo in particular.

Now *main* point doesn't necessarily mean "only point". There clearly
are binaries parsing it. Some do it to figure out how many CPU's the
system has, often simply because using /proc is simple from various
scripting environments, for example. So while most of /proc/cpuinfo is
clearly for human consumption, it's also understandable that some
parts of it might matter for people.

And quite frankly, I personally think that any program that parses
/proc/cpuinfo in order to find the bogomips value and use it for
anything is just clearly insane. Why would you ever do that? It makes
no sense. It's crazy. Apparently the crazy audio library didn't even
do it in a meaningful way, and the use of that value seems to be
pretty much random, and the actual value likely doesn't really even
*matter*.

But the rule for "regression" has never been about sanity, or about
whether the ABI changes. There are tons of horribly insane user
programs. Parsing /proc to find bogomips may be insane and odd, but
it's certainly not the worst kind of diseased code I've ever heard
about. We have had major programs that literally depended on totally
insane small details that were never intentional, and just happened to
have some particular implementation detail. And then the
implementation changed, and the interface ostensibly did exactly the
same thing, but because it did it with some meaningless difference
that couldn't *possibly* matter in any sane situation, it caused a
regression.

So the kernel regression rules are very strict in that it's the
absolute #1 rule in kernel development, but at the same time, they are
about as lax as they can possibly be: an interface change is only a
regression if somebody notices.

Changing the bogomips value - even radically - or removing it entirely
isn't a regression in itself.

And in this case, I do suspect that the *actual* value really almost
doesn't matter. It looks more like some internal badly done hint for
some buffer size or other. It is possible that wild fluctuations could
be noticeable, but it's fairly unlikely.

The other "good news" in this area is that I suspect that the random
ARM platforms that actually changed 2.5 years ago are not very widely
used any more. So not only does the actual real value probably not
matter much to begin with, but the platforms where it really changed
are probably not a major issue.

Linus

Quotes of the week

Posted Jan 9, 2015 1:51 UTC (Fri) by jschrod (subscriber, #1646) [Link] (4 responses)

It's also interesting to read the rest of the thread you cited.

Linus' harsh reactions to Nicolas Pitre's sensible comments (no factual answers from Linus on the named issues that I could detect in this thread) are »interesting«. Is Nicolas Pitre a known trouble-maker? Or is it more the kind of manager reaction »I don't want to hear about that bogomips topic any more«?

Quotes of the week

Posted Jan 9, 2015 3:20 UTC (Fri) by dlang (guest, #313) [Link] (3 responses)

I'm not sure which "sensible comments" you are referring to.

We've had cases where newer, faster hardware reports lower bogomips values than older, slower hardware since about the time the first new hardware was released after bogomips was invented, so complaints that it's not what users expect on their new hardware should be answered with "it's BOGOmips, don't take the value seriously"

And similarly, if anyone actually tries to scale something based on what this value is, the answer needs to be the same (it's known that this value can fluctuate wildly on a given piece of hardware, depending on cpu freq, cache contention, phases of the moon, and various other variables)

So we know software works if it's nonsense, but the fact that software broke when it was removed is justification enough to put it back, as per the long post about "no regressions" above

having bogomips report bogus values may cause people to complain, but it's not a regression.

Quotes of the week

Posted Jan 9, 2015 13:20 UTC (Fri) by ayeomans (guest, #1848) [Link] (1 responses)

So will it be a regression if people stop complaining? :-)

Quotes of the week

Posted Jan 9, 2015 18:33 UTC (Fri) by vonbrand (subscriber, #4458) [Link]

Sure enough. Change the people until they complain again. ;-)

Quotes of the week

Posted Jan 9, 2015 19:45 UTC (Fri) by rodgerd (guest, #58896) [Link]

Not to mention virtualisation, which means that bogomips will change depending on the load on the hypvervisor at reboot. If you rely on bogomips, you should be seeing errors in a huge range of scenarios unless you also don't bother testing common scenarios.

Quotes of the week

Posted Jan 10, 2015 17:46 UTC (Sat) by nix (subscriber, #2304) [Link] (2 responses)

Consumers of /proc/cpuinfo inside glibc alone include:

- Looking up the system time for ioperm() support on Alpha and ARM
- Looking up the number of processors in sysconf() and getconf(1), on *many* platforms (looking up the amount of available memory parses /proc/meminfo)
- Looking up the clock frequency for clock_getres(), clock_gettime() et al, on PowerPC, SPARC64, i386, and IA64

None of these are exactly crucial (though the last is fairly important) -- but if *glibc* is using these things, they clearly cannot change without great consideration. glibc has fallbacks in all these cases, but often of the variety 'oops, tell the caller that the clock frequency cannot be determined'.

Quotes of the week

Posted Jan 11, 2015 17:40 UTC (Sun) by pr1268 (guest, #24648) [Link] (1 responses)

Is the information glibc needs from those files part of the debate from the QOTW? I don't mean to argue, but it seems that /proc/cpuinfo is a (decently) legitimate source of finding the number of processors, as is /proc/meminfo for querying the amount of memory on a running system. Again, I'm sincerely curious; not trying to stir up a storm...

I'll make a general comment about glibc: If they're using /proc/cpuinfo (or similar files) then, if something broke due to kernel ABI changes, someone would complain loudly. I can only imagine (hope) any changes in this area would be well-communicated with glibc prior to implementation.

Quotes of the week

Posted Jan 11, 2015 23:47 UTC (Sun) by nix (subscriber, #2304) [Link]

Well, it's not related to bogomips, but it *is* proof by example that reasonable software (insofar as one can call glibc "reasonable" -- crucial software, at least) can read /proc/cpuinfo as part of normal operation, and that at least part of it does therefore constitute an ABI of sorts.

Quotes of the week

Posted Jan 20, 2015 3:32 UTC (Tue) by Baylink (guest, #755) [Link] (3 responses)

Linus:
> There may still be active users of that platform that provably cannot possibly work at all any more, but if they never upgrade the kernel, then it's still not a regression.

This won't be the first time Linus has been wrong, or that I've said so.

Regression, by its definition, means "*In a version transition of any kind, something which used to work as defined no longer does*".

It *means* that such users as he alludes to cannot upgrade safely anymore without the risk of something breaking.

The *job* of a release manager is to be able to make intelligent judgements about the relative risks of and labor to fix regressions between releases.

Test suites are *designed* to spot your regressions; that's actually why they exist.

Quotes of the week

Posted Jan 20, 2015 9:16 UTC (Tue) by dgm (subscriber, #49227) [Link]

*Tecnically* (hehe: http://xkcd.com/1475/), you're right. It's a regression, albeit a *virtual* one until someone *real* tries to update such a system and fails (and hopefully complains).

I think this is what Linus is arguing, at least.

Quotes of the week

Posted Jan 22, 2015 12:32 UTC (Thu) by FrozenGeek (guest, #96688) [Link]

If a tree falls in a forest and no one is around to hear it, does it make a sound?

Quotes of the week

Posted Jan 24, 2015 2:03 UTC (Sat) by mathstuf (subscriber, #69389) [Link]

If anyone can design a test suite that takes care of all cases and runs within a merge window, great. Until then, if you want to upgrade with confidence, run the rc releases on test hardware and make noises when things break. If you're silent for years then come complaining that something broke last year, you don't garner much sympathy from me at least. Sure, fixes would be accepted, but now the new behavior might be expected, so you have to deal with that too.