Haunted by ancient history

By Jonathan Corbet
January 6, 2015

Kernel development policy famously states that changes are not allowed to break user-space programs; any patch that does break things will be reverted. That policy has been put to the test over the last week, when two such changes were backed out of the mainline repository. These actions demonstrate that the kernel developers are serious about the no-regressions policy, but they also show what's involved in actually living up to such a policy.

The ghost of wireless extensions

Back in the dark days before the turn of the century, support for wireless networking in the kernel was minimal at best. The drivers that did exist mostly tried to make wireless adapters look like Ethernet cards with a few extra parameters. After a while, those parameters were standardized, after a fashion, behind the "wireless extensions" interface. This ioctl()-based interface was never well loved, but it did the job for some years until the developers painted themselves into a corner in 2006. Conflicting compatibility issues brought development of that API to a close; the good news was that there was already a plan to supersede it with the then under-development nl80211 API.

Years later, nl80211 is the standard interface to the wireless subsystem. The wireless extensions, which are now just a compatibility interface over nl80211, have been deprecated for years, and the relevant developers would like to be rid of them entirely. So it was perhaps unsurprising to see a patch merged for 3.19 that took away the ability to configure the wireless extensions into the kernel.

Equally unsurprising, though, would be the flurry of complaints that came shortly thereafter. It seems that the wicd network manager still uses the wireless extensions API. But, perhaps more importantly, the user-space tools (iwconfig for example) that were part of the wireless extensions still use it — and they, themselves, are still in use in countless scripts. So this change looks set to break quite a few systems. As a result, Jiri Kosina posted a patch reverting the change and Linus accepted it immediately.

There were complaints from developers that users will never move away from the old commands on their own, and that some pushing is required. But it is not the place of the kernel to do that pushing. A better approach, as Ted Ts'o suggested, would be:

[W]hy not hack into the "iw" command backwards compatibility so that if argv[0] is "iwlist" or "iwconfig", it provides the limited subset compatibility to the legacy commands. Then all you need to do is to convince the distributions to set up the packaging rules so that "iw" conflicts with wireless-tools, and you will be able to get everyone switched over to iw after at least seven years.

Such an approach would avoid breaking user scripts. But it would still take a long time before all users of the old API would have moved over, so the kernel is stuck with supporting the wireless extensions API into the 2020's.

Bogomips

Rather older than the wireless extensions is the concept of "bogomips," an estimation of processor speed used in (some versions of) the kernel for short delay loops. The bogomips value printed during boot (and found in /proc/cpuinfo) is only loosely correlated with the actual performance of the processor, but people like to compare bogomips values anyway. It seems that some user-space code uses the bogomips value for its own purposes as well.

If bogomips deserved the "bogo" part of the name back in the beginning, it has only become more deserving over time. Features like voltage and frequency scaling will cause a processor's actual performance to vary over time. The calculated bogomips value can differ significantly depending on how successful the processor is in doing branch prediction while running the calibration loop. Heterogeneous processors make the situation even more complicated. For all of these reasons, the actual use of the bogomips value in the kernel has been declining over time.

The ARM architecture code, on reasonably current processors, does not use that value at all, preferring to poll a high-resolution timer instead. On some subarchitectures the calculated bogomips value differed considerably from what some users thought was right, leading to complaints. In response, the ARM developers decided to simply remove the bogomips value from /proc/cpuinfo entirely. The patch was accepted for the 3.12 release in 2013.

Nearly a year and a half later, Pavel Machek complained that the change broke pyaudio on his system. Noting that others had complained as well, he posted a patch reverting the change. It was, he said, a user-space regression and, thus, contrary to kernel policy.

Reverting this change was not a popular idea in the ARM camp; Nicolas Pitre tried to block it, saying that "No setups actually relying on this completely phony bogomips value bearing no links to hardware reality could have been qualified as 'working'." Linus was unsympathetic, though, saying that regressions were not to be tolerated and that "The kernel serves user space. That's what we do." The change was duly reverted; ARM kernels starting with 3.19 will export a bogomips value again; one assumes the change will make it into the stable tree as well.

That still leaves the little problem that the bogomips value calculated on current ARM CPUs violates user expectations; people wonder when their shiny new CPU shows as having 6.0 bogomips. Even ARM systems are expected to be faster than that. The problem, according to Nicolas, is that a constant calculated to help with the timer-based delay loops was being stored as the bogomips value; the traditional bogomips value was no longer calculated at all. There is no real reason, he said, to conflate those two values. So he has posted a patch causing bogomips to be calculated by timing the execution of a tight "do-nothing" loop — the way it was done in the beginning.

The bogomips value has long since outlived its value for the kernel itself. It is calculated solely for user space, and, even there, its value is marginal at best. As Alan Cox put it, bogomips is mostly printed "for the user so they can copy it to tweet about how neat their new PC is". But, since some software depends on its presence, the kernel must continue to provide this silly number despite the fact that it reflects reality poorly at best. Even a useless number has value if it keeps programs from breaking.

Index entries for this article
Kernel	Bogomips
Kernel	Development model/User-space ABI
Kernel	Wireless extensions

Why not move the backward compatibility layer out of kernel into libc?

Posted Jan 8, 2015 2:49 UTC (Thu) by skissane (subscriber, #38675) [Link] (4 responses)

Question: If the wireless extensions ioctls are implemented in kernel on top of the nl80211 API - why couldn't that backwards compatibility code move from the kernel to libc? If old cruft needs to be kept for backward compatibility, it would seem safer if it lived in user space rather than the kernel itself. Almost all applications using it would also use glibc (or another libc), so the applications shouldn't care where the API translation takes place.

Why not move the backward compatibility layer out of kernel into libc?

Posted Jan 8, 2015 6:26 UTC (Thu) by mebrown (subscriber, #7960) [Link] (3 responses)

IOCTL() is not processed by glibc, but rather passed straight through to the kernel. There is no reasonsable way for glibc to emulate this.

Why not move the backward compatibility layer out of kernel into libc?

Posted Jan 8, 2015 12:24 UTC (Thu) by skissane (subscriber, #38675) [Link] (2 responses)

But surely libc could provide an ioctl function which doesn't directly call the ioctl system call, but does other things? i.e. check ioctl number, if it has certain values, don't call the ioctl system call, call some other function instead. Then that other function could then emulate old ioctls by calling netlink.

Why not move the backward compatibility layer out of kernel into libc?

Posted Jan 8, 2015 14:26 UTC (Thu) by rsidd (subscriber, #2582) [Link]

It would still break backward compatibility, which means binary compatibility. Modifying source is easy.

Why not move the backward compatibility layer out of kernel into libc?

Posted Jan 10, 2015 17:54 UTC (Sat) by nix (subscriber, #2304) [Link]

Sure -- but glibc is generally upgraded much *less* often than the kernel, not much more often, and a world in which you find your wireless interface is broken if you upgrade your kernel until you upgrade glibc too is not a remotely desirable one. Sure, glibc can include backward-compatibility code for old kernels, and does, but an old glibc is already released cannot contain such code. The only solution there is for the *kernel* to contain backwards-compatibility code in case it is used with an older glibc.

Since this is exactly the situation now, the only effect of your change would be to double the amount of compatibility code that would need to be retained nearly forever.

Haunted by ancient history

Posted Jan 8, 2015 5:26 UTC (Thu) by ncm (guest, #165) [Link] (1 responses)

It seems, too, like the old stuff should be behind a config ifdef, so that the distros can turn it off as soon as they get their iw house in order.

Haunted by ancient history

Posted Jan 8, 2015 14:23 UTC (Thu) by fishface60 (subscriber, #88700) [Link]

It already is is.
The patch in question turned off the ability to change the config to allow the IWEXT code to be reachable.

Haunted by ancient history

Posted Jan 8, 2015 11:06 UTC (Thu) by ibukanov (subscriber, #3942) [Link] (3 responses)

I had no idea that these days one better always use iw rather than iwconfig. I like the latter better as it comes with a real manpage when iw refers to own iw help output.

Haunted by ancient history

Posted Jan 8, 2015 13:26 UTC (Thu) by mm7323 (subscriber, #87386) [Link] (2 responses)

I feel the same for the ifconfig vs ip commands. The latter is much more powerful, but it is very overloaded and requires traversing multiple man pages to figure out even basic use. ifconfig 'just works', even if it has been deprecated for a long time now.

This edition's 'Quotes of the Week' from Linus seems very relevant:
"New and improved" is only really improved if it also takes backwards compatibility into account, rather than saying "now everybody must do things the new and improved - and different - way"

Haunted by ancient history

Posted Jan 8, 2015 13:58 UTC (Thu) by ibukanov (subscriber, #3942) [Link]

In the beginning the ip command had not provided manual pages as well referring to a PDF file (or was it PS?) that was supposed to list all details in a very formal language.

Haunted by ancient history

Posted Jan 10, 2015 19:21 UTC (Sat) by ploxiln (subscriber, #58395) [Link]

I've used "ip" for a while. I like it. Maybe it's not obvious how to get started with it, but try these:

ip addr

ip link

ip link set dev eth1 up

ip addr add 10.0.1.5/24 dev eth1

ip route

ip addr del 10.0.1.5/24 dev eth1

It's easier to manage multiple IPv4 addresses on an interface with the ip command (that does work, and can be useful). It's also more obvious how to name vlan "interfaces" and such with arbitrary names.

iw versus iwlist/iwconfig

Posted Jan 8, 2015 14:25 UTC (Thu) by cesarb (subscriber, #6266) [Link]

In my experience, there are things the "iw" command can't do but the older commands can. For instance, what's the command to get the txpower? I see a "set txpower" command in iw's builtin help, but no "get txpower". On iwlist, it's "iwlist wlan0 txpower".

Haunted by ancient history

Posted Jan 8, 2015 14:46 UTC (Thu) by johill (subscriber, #25196) [Link] (19 responses)

I haven't replied to that thread at all - in fact I haven't even read most of it! I need to sit down and write a larger rebuttal/defense and include all the mistakes that practically everybody on that thread made.

However I'll quickly point out one thing here: you cannot implement "iwconfig" semantics (in iw) on top of nl80211, the API simply doesn't allow those nonsensical ways of building a configuration.

I've compared this to ordering food before: the wext way (since SIOCSIWCOMMIT was never mandatory and thus essentially has to be ignored for compatibility reasons) basically allows you to do the following:

order eggs - the cook has to do something so he'll start boiling some eggs
[set the SSID to connect to a network]
order scrambled - the cook throws away the boiling water and eggs and starts making scrambled eggs
[set the channel to narrow down the AP selection]
order with bacon bits - the cook throws away the half-done scrambled eggs and starts from scratch
[set the BSSID to narrow down the AP selection even more]

This is simply not supported in nl80211 - you have to fully specify your order before the kernel will do anything. With wext, since SIOCSIWCOMMIT cannot be relied on, there's no way to do this. If you're careful to order your orders correctly (set the SSID last) then this can be avoided because the SSID is required to do anything, but that relies on the user knowing something about wext internals, which is clearly bad.

Contrast this with iw, where you simply specify the network (SSID) to connect to, with optional channel and BSSID arguments to the same command.

Haunted by ancient history

Posted Jan 8, 2015 18:54 UTC (Thu) by dlang (guest, #313) [Link] (17 responses)

If iw can't do things that are possible with iwconfig, then it isn't a replcement, period, full stop.

Those things may not make sense to you in the way that you think everyone should do things, but you are not omniscient, so you don't know everyone's use case.

with your example, if people are successfully using iwconfig, it may not matter that the cook has to throw out a bunch of eggs, they aren't the cooks eggs, they are the sysadmin's eggs (in other words, it's the sysadmin's cpu cycles and power that's wasted). It's fine to write a blog post or explanation in a man page saying that this is an inefficient way to do things, and they should specify the entire order at once, but you shouldn't prevent the sysadmin from doing whatever they want.

There are very good reasons for not specifying a channel, and are probably very good reasons to not specify the BSSID as well.

Haunted by ancient history

Posted Jan 8, 2015 20:02 UTC (Thu) by johill (subscriber, #25196) [Link] (16 responses)

Absolutely there are good reasons to never specify the channel or frequency - but these are things people can do (in a saner way) with the new tools as well.

Really my original thought was that nobody would even be using any of the command line tools (even the new one - iw!) to connect to a wireless network. Talking about use cases for this - I can't imagine why for example (apart from testing) you'd want your wireless connection to not be re-established when it drops for whatever reason (perhaps the AP being confused), this is really quite common.
I really did think that everybody would be using wpa_supplicant for connections these days since that gives you automatic reconnections, roaming, encryption, etc.

wpa_supplicant has for a long time supported and by default used nl80211, so if people had started using it as I expected then none of this mess would have happened.

Who knew though, apparently some people don't care about any of this. Clearly there are plenty of people who haven't moved to using wpa_supplicant and are - I find that hard to believe - happy with their system. Or maybe they aren't, but can't even be bothered to file bugs (not that I'd actually look at bugs filed with wext, apart from maybe telling them to move to wpa_supplicant.)

Anyway, the change has been reverted, and we'll simply keep the wireless extensions code indefinitely. I'm certainly not going to support it though.

Haunted by ancient history

Posted Jan 8, 2015 20:38 UTC (Thu) by dlang (guest, #313) [Link] (12 responses)

not all uses of wifi are clients connecting to wpa servers

I use the iw* commands quite a bit when I'm looking at what's around, doing a site survey to layout new APs, etc.

I haven't tried the iw command (I had never heard of it before this problem showed up)

If you can specify a wildcard for channel/frequency with iw, then you should be able to emulate the iwconfig commands that didn't specify them.

What else is possible with the iw* commands that isn't possible with iw?

Haunted by ancient history

Posted Jan 8, 2015 21:35 UTC (Thu) by johill (subscriber, #25196) [Link] (11 responses)

> If you can specify a wildcard for channel/frequency with iw, then you should be able to emulate the iwconfig commands that didn't specify them.

No, you didn't quite understand my cooking analogy I think. iwconfig requires that the kernel keep track of the previously configured settings, and allows you to modify them later. With nl80211, the kernel will forget everything it did when it disconnects.

Say you have a network on channel 11 (2462 MHz) called "my-network".

With iwconfig you can now say

iwconfig wlan0 essid "my-network"
(it connects to your network)
iwconfig wlan0 freq 2412
(it disconnects, and fails to reconnect since there's no network on channel 1)
iwconfig wlan0 freq 2462
(it connects again to the original network)

with iw/nl80211 you don't have this option.

You can say
iw wlan0 connect "my-network"
(and it will connect)

or
iw wlan0 connect "my-network" 2412
(and it will fail to connect)

or
iw wlan0 connect "my-network" 2462
(and it will connect again)

But the kernel will not remember [1], upon disconnection, that you had previously wanted some SSID. To emulate it completely, you'd thus have to store state somewhere (filesystem? shared memory? ...?) which gets complex really quickly.

This is really intentional though - having the kernel keep state means that the kernel needs to guess when it should start doing something. This is like my cooking analogy, there's no way for the kernel to know when it should start cooking. [2]

As far as site survey is concerned (iwlist scan) - there's another interesting quirk here: if you have too many APs in the environment, iwlist/wext will stop working and report only errors and no networks at all. This is an inherent design flaw that comes from having to fit the entire list of APs into a single buffer that's limited to 64KiB, and no way to fetch the list incrementally (like nl80211 which uses netlink dump functionality). People have run into this in big deployments (usually when there's also a lot of data in the beacons which eats up the buffer quickly.)

I don't believe that there's much that is possible with iwconfig/iwlist that you cannot do with iw at all, other than some esoteric settings like modifying sensitivity that no modern hardware supports anyway. I certainly haven't used iw* in many years.

[1] and that's really intentional, if the kernel were to remember some data you get into the strange situations where to switch from one network to the other the kernel might try to connect to the old network on the new channel, or the new network on the old channel, or similar. With open networks this really isn't a problem, but with encrypted or potentially rogue networks you really don't want to even attempt to connect to a new network with half of the old configuration. Forcing userspace to commit the entire needed configuration at once easily side-steps that issue and made the whole mechanism far easier to extend to new spec updates as well.

[2] Jean actually invented SIOCSIWCOMMIT for this purpose, which in theory would have allowed you to set all the parameters and then commit them all together. However, that came too late - since it wasn't designed into the API from the beginning there was no way for the kernel to require it since that would also have broken older userspace.

Haunted by ancient history

Posted Jan 8, 2015 22:17 UTC (Thu) by cesarb (subscriber, #6266) [Link] (1 responses)

> I don't believe that there's much that is possible with iwconfig/iwlist that you cannot do with iw at all, other than some esoteric settings like modifying sensitivity that no modern hardware supports anyway. I certainly haven't used iw* in many years.

I recently found something that you cannot do with iw at all. I was curious to know the transmission power of my network card.

$ iwlist wlp3s0 txpower
wlp3s0 unknown transmit-power information.

Current Tx-Power=15 dBm (31 mW)

I couldn't find a way to do that with iw. This is most annoying in the openwrt router I have, since it doesn't have iwlist installed by default (and it makes no sense to install it for just this single use); I have to use the web interface (which does show the current txpower).

This is made more annoying by the fact that iw *does* know how to set the txpower; why doesn't it have a way to get it?

dev <devname> set txpower <auto|fixed|limit> [<tx power in mBm>]
Specify transmit power level and setting type.

Haunted by ancient history

Posted Jan 8, 2015 22:46 UTC (Thu) by johill (subscriber, #25196) [Link]

Not sure - from a brief look the API doesn't seem to have this, for whatever reason. Certainly not intentional.

Haunted by ancient history

Posted Jan 8, 2015 22:32 UTC (Thu) by dlang (guest, #313) [Link] (6 responses)

well, if the kernel keeps track of the data for iwconfig, you should be able to use that same mechanism to allow iw to work.

Yes, it's a permanent wart, but if it lets you get rid of the rest of the support in a decade or so (after the new version of iw gets out to all the distros, it's better than keeping everything around.

And just declaring that you aren't going to support the old version isn't really a solution, if tools are depending on it, any change that is made that accidentally breaks the old version is going to require that that change get reverted (or the old code modified to work with the new change)

changing topics, is there a way to configure an interface not to use slow speeds? I'd like to configure some equipment so that if the only way it can talk to another device is at speeds <11Mb, refuse to connect to it, but I haven't been able to find a way to do this on my WNDR3800 router.

Haunted by ancient history

Posted Jan 8, 2015 22:51 UTC (Thu) by johill (subscriber, #25196) [Link] (5 responses)

Well, no, it only keeps track of the data for wext. There's currently not even a way in the API of nl80211 to transport just such single values.

I don't believe that it's better to mess up the nl80211 APIs with such stateful problems than keeping the old wext support around - at least the latter can easily be disabled and if it is people can determine relatively quickly why things broke (we never ship wext to our customers, for example, and will not support it).

If we added stateful problematic API to nl80211 we'd have the worst of both worlds, because we'd have to support the mess with all the unclear semantics of when to start connecting in the new API. That's really not worth it at all, even if it would allow us to get rid of all the other ugly wext code (and believe me - there's a lot, down to 32/64 userspace compat code that always builds two messages for the same thing because they differ etc.)

Regarding your other question, it is possible for the AP to require that the client be capable of certain bitrates - they're called "basic rates", you might have such a configuration. It's also possible to require HT or even VHT to be supported. I have no idea if typical APs have configuration options in their limited web UI for that though - if you were to run hostapd (say on OpenWRT) it'd be simple to set this in the configuration file.

Haunted by ancient history

Posted Jan 8, 2015 23:04 UTC (Thu) by dlang (guest, #313) [Link] (2 responses)

> Regarding your other question, it is possible for the AP to require that the client be capable of certain bitrates - they're called "basic rates", you might have such a configuration. It's also possible to require HT or even VHT to be supported. I have no idea if typical APs have configuration options in their limited web UI for that though - if you were to run hostapd (say on OpenWRT) it'd be simple to set this in the configuration file.

how would I set this with command line tools? (once I know that, I can go figure out how to set this in the distro config). I am using OpenWRT on these routers.

Haunted by ancient history

Posted Jan 9, 2015 13:36 UTC (Fri) by johill (subscriber, #25196) [Link] (1 responses)

You can't set it using command line tools - this is only advertised and enforced by hostapd. You have to set it in the hostapd configuration file (basic_rates= option). To require HT (11n), set require_ht=1, to require VHT (11ac) set require_vht=1.

One OpenWRT, you might be able to inject options, not sure off the top of my head.

Haunted by ancient history

Posted Jan 9, 2015 14:19 UTC (Fri) by cesarb (subscriber, #6266) [Link]

Take a look at http://wiki.openwrt.org/doc/uci/wireless. I'd guess it's the basic_rate and require_mode options.

Haunted by ancient history

Posted Jan 9, 2015 5:40 UTC (Fri) by alankila (guest, #47141) [Link] (1 responses)

One of the issues here is that *someone* has to remember the values set. It's either in user space of kernel space. If the kernel remembers it, then anyone can query the kernel APIs to figure out what the current wireless settings that have been told to hardware are. If it's in user space, I can imagine someone writing another dbus API for wireless connections, and adding a permanent daemon to keep track of the info.

Historical unix style favors the former, so the kernel services user processes by keeping track of important state for them, because the processes themselves are short-lived and can't do it. Modern unix style is about running everything imaginable in separate daemons. Anyway, someone can just go ahead and define the dbus API for wpa_supplicant or something, and then tell everyone to implement that.

Haunted by ancient history

Posted Jan 9, 2015 13:38 UTC (Fri) by johill (subscriber, #25196) [Link]

All of that already exists. If you're running wpa_supplicant, you can never have any of these problems [even with wext, since wpa_supplicant is smarter than the average user]

However, if you're not running wpa_supplicant, then it really only leaves the kernel to remember it, or the user to re-enter it every time. iwconfig/wext went with the former (and in turn got all the associated problems), iw/nl80211 went with the latter to avoid those problems.

Haunted by ancient history

Posted Jan 10, 2015 9:41 UTC (Sat) by riking (subscriber, #95706) [Link] (1 responses)

> As far as site survey is concerned (iwlist scan) - there's another interesting quirk here: if you have too many APs in the environment, iwlist/wext will stop working and report only errors and no networks at all. This is an inherent design flaw that comes from having to fit the entire list of APs into a single buffer that's limited to 64KiB, and no way to fetch the list incrementally (like nl80211 which uses netlink dump functionality). People have run into this in big deployments (usually when there's also a lot of data in the beacons which eats up the buffer quickly.)

Just wondering here - have you considered turning iw into a argv[0]-switching binary, which calls the old APIs when it *absolutely needs to* (the separate SSID and frequency setting...) but otherwise uses the new APIs when it results in the same or a consistent output?
(consistent: displaying all the huge number of networks instead of crashing. $5 says nobody depends on iwlist crashing with too many networks.)

I routinely use plain 'iwconfig' and 'ipconfig' as debugging tools - "what's my network?". It seems like the diagnostics should be possible with just the new APIs.

Haunted by ancient history

Posted Jan 10, 2015 21:24 UTC (Sat) by johill (subscriber, #25196) [Link]

I have - but the cost is too high. There's very little cost in keeping this code indefinitely - it won't get any better and new drivers might not work properly with it, but it certainly won't get any worse.

Using the old API in the new tools is just going to get it wrong anyway - there are 20 or so API versions of wext after all that all need to be treated slightly differently (they didn't care all that much for backward compatibility when those were designed ....)

So no, this isn't going to happen.

You should probably just use "iw wlan0 scan" and "iw wlan0 link" instead. Doing anything beyond those two commands with the command line is mostly not going to do what you want it to anyway.

Haunted by ancient history

Posted Jan 15, 2015 9:45 UTC (Thu) by Wol (subscriber, #4433) [Link] (2 responses)

> not be re-established when it drops for whatever reason

You mean, like, when the user deliberately kills it???

Okay, this is my windows laptop, but I routinely kill the wifi when I plug my laptop into a network cable, it increases my bandwidth by about three orders of magnitude!

Cheers,
Wol

Haunted by ancient history

Posted Jan 15, 2015 16:31 UTC (Thu) by tialaramex (subscriber, #21167) [Link] (1 responses)

But the right way to do that is to switch off your WiFi. On my laptops I usually do this by moving the physical switch provided for this purpose. On other devices (which lack a switch) you can tell software "I don't want WiFi any more" independently of the wireless code trying to connect to a specific AP.

Also, the Right Thing™ (and I'm a little surprised the Windows laptop doesn't do this of its own accord unless you've missed part of the story) is that you'd leave the WiFi up in your scenario, and the IP stack will choose the Cat5 (or whatever) for all new connections while continuing to handle old connections seamlessly on the WiFi.

Almost twenty years ago we demonstrated running VoIP through this scenario (moving from a wireless to a wired network) with Mobile IPv6. That has yet to become a commonplace application, but it was taken as read that if you began a _new_ call it would move to the better connection with no additional technology, our work was on handling an _in progress_ call correctly.

Haunted by ancient history

Posted Jan 20, 2015 13:09 UTC (Tue) by nye (subscriber, #51576) [Link]

>Also, the Right Thing™ (and I'm a little surprised the Windows laptop doesn't do this of its own accord unless you've missed part of the story) is that you'd leave the WiFi up in your scenario, and the IP stack will choose the Cat5 (or whatever) for all new connections while continuing to handle old connections seamlessly on the WiFi

Modern versions of Windows definitely do this; I'm not sure whether older versions do, but I'd expect so since this is a simple case of setting the metric to prefer wired over wireless connections, which I find it hard to believe has ever not been the default.

There are probably other factors at work here.

Haunted by ancient history

Posted Jan 9, 2015 2:06 UTC (Fri) by jschrod (subscriber, #1646) [Link]

If you want that people use iw instead of iw*, I would recommend that some volunteer invest the time to write a good man page and get that into the distributions.

Oh, to note: Transforming the output of "iw help" into a man page would not make it a _good_ one.

Until that happens, I suppose that many CLI users (like myself) continue to use the old well-documented iw* commands.

Just as an example: How do I determine txpower with your shiny iw command? No hint at all in "iw help".

Haunted by ancient history

Posted Jan 8, 2015 18:01 UTC (Thu) by flussence (guest, #85566) [Link] (15 responses)

I propose to make bogomips return (cpu_max_MHz * 2). It's close enough to "real" numbers that (with any luck) the change would go unnoticed, while being obviously wrong enough that no naïve programmer would think it's a good idea to use it in new code.

Haunted by ancient history

Posted Jan 8, 2015 20:12 UTC (Thu) by reubenhwk (guest, #75803) [Link] (1 responses)

I'm having trouble envisioning any use-case where somebody would need to know the speed of the processor... Why is anybody using bogomips in code anyway?

Haunted by ancient history

Posted Jan 9, 2015 11:52 UTC (Fri) by mathstuf (subscriber, #69389) [Link]

Lazy defaults for quality settings in a game? I'd think it'd be better to just run a series of renders and drop quality until you hit your target though.

Haunted by ancient history

Posted Jan 9, 2015 2:41 UTC (Fri) by mina86 (guest, #68442) [Link] (11 responses)

I say, just print constant 1.

Haunted by ancient history

Posted Jan 9, 2015 13:58 UTC (Fri) by moltonel (guest, #45207) [Link] (10 responses)

+1, but make it configurable. Some software apparently *does* make use of bogomips, and we apparently want to avoid breaking them even more than they broke themselves.

* CONFIG_BOGOMIPS_REAL for software that actually needs it
* CONFIG_BOGOMIPS_FAKE for software that reads it but will still function with a low value
* CONFIG_BOGOMIPS_NONE for modern systems

Prefer none over fake over real, and let users/distributions bug upstream about broken software. Voilà - it'll only take a decade to get rid of bogomips completely.

Haunted by ancient history

Posted Jan 9, 2015 20:20 UTC (Fri) by dlang (guest, #313) [Link] (9 responses)

what is "REAL bogomips"? especially on systems that change the cpu frequency and/or have different cores running at different frequencies?

Haunted by ancient history

Posted Jan 9, 2015 22:28 UTC (Fri) by moltonel (guest, #45207) [Link] (8 responses)

Whatever the definition used to be, how far can you count in 1ms or something. It really doesn't matter. Call it BELIEVABLE or USABLE if you think that REAL gives a false sense of correctness.

The point is that some (buggy but kernel-supported) software will use it to calibrate some kind of performance loop. So try to give them a usable value; if you get within 20% of the ideal value it's good enough. Don't go beyond the call of duty trying to give a "good" value to something that is bogus by design.

That said, I think (as a user) that the kernel's no-breakage stance is too strict. If you're willing to keep using old unmaintained software, you should be ok with using old unmaintained kernels that are still compatible with said software. I'd have been happy to see bogomips disappear from Linux immediately. But if you're not willing to do that today, please at least prepare things so that we can do it in a few years.

Haunted by ancient history

Posted Jan 9, 2015 23:33 UTC (Fri) by dlang (guest, #313) [Link] (3 responses)

> if you get within 20% of the ideal value it's good enough.

you can't get it within 200% of the ideal value, it may not be possible to get it within 2000% of the ideal value. And the ideal value will change from time to time with no notice.

any software the uses delay loops is broken on a multi-user or even multi-application machine, let alone using bogomips to calibrate such loops.

That said, breaking software that's doing the wrong thing still isn't acceptable. Even if the software is maintained, getting the new version into the hands of users can take a LONG time. Rsyslog is very actively maintained and is on v8.6 right now. We still get people asking questions about v3.x, which was out of date in 2006.

Haunted by ancient history

Posted Jan 10, 2015 0:52 UTC (Sat) by moltonel (guest, #45207) [Link] (2 responses)

We may disagree on the maths but we agree on the basic observations; I'm not sure what you're trying to add ?

Since Linus has decreed that bogomips should be kept, we should keep it in the form that applications expect. We all know that that form is completely broken, but that's irrelevant. Apps ask a stupid question, and Linux gives a stupid answer in the name of retrocompatibility. That's what Linus asked for, and I'm not in a position to disagree.

What I *can* suggest is to take measures today so that in many years the compatibility argument gets weak enough to be ignored. Making bogomips configurable is such a measure, a pretty standard deprecation strategy (even if the time between deprecation and removal is huge).

Haunted by ancient history

Posted Jan 10, 2015 2:29 UTC (Sat) by dlang (guest, #313) [Link] (1 responses)

As I read Linus' message, he was saying that some value needed to be there, even if it was known to have n relation to reality (because software broke if it wasn't there to read). In your posts it sounded like you were saying that there needed to be an option to provide a "real" value. That's a much more demanding option than just having some value there.

Haunted by ancient history

Posted Jan 11, 2015 22:56 UTC (Sun) by moltonel (guest, #45207) [Link]

If no program depends on a "real" value then great, just return a constant like "1" and be done with it. I find it unlikely that there isn't a program that uses the value in its logic (as opposed to just displaying it), but I admit I don't have an example to provide.

Haunted by ancient history

Posted Jan 10, 2015 18:01 UTC (Sat) by nix (subscriber, #2304) [Link] (3 responses)

If you're willing to keep using old unmaintained software, you should be ok with using old unmaintained kernels that are still compatible with said software.

It is quite possible for old unmaintained software not to contain security holes and to work perfectly well. Software does not intrinsically rot.

It is not possible for an old unmaintained kernel not to contain security holes (for the simple reason that the kernel is one single piece of software and has holes discovered from time to time).

Thus, your proposal would force people to upgrade to the latest and greatest, no matter how much of their workflows it broke. That's not very nice to the users. (This unthinking attitude that "I use a recent version of everything, therefore so can everyone else" is a large part of what's wrong with the modern Linux world IMNSHO).

Haunted by ancient history

Posted Jan 11, 2015 23:46 UTC (Sun) by moltonel (guest, #45207) [Link] (2 responses)

Just to be clear: I do think that maintaining compatibility is an essential requirement.

But there are various degrees of old, unmaintained, and incompatible. And Linux often takes what looks like a hypocritical decision on what to keep and what to remove. Where's the kernel for my devfs distro ? Can I really run this 15-years old XFree86 on 3.18 ? The "if someone notices" rule is both pragmatic and naïve. there are plenty of kernel-broken software out there. Note that the situation is worse with applications and libraries.

Having a zero-tolerance "no backward-compatibility" rule is very noble until you realize that so much stuff is broken anyway.

I'm doing the opposite of forcing people to upgrade to the latest : I'm telling them to keep running that old kernel if they need to. There are plenty of situations where this is fine. Not all security issues affect everybody. I've got plenty of systems I can't afford to update, so I just put them somewhere where they won't be exposed to threats.

And if that fails, virtualisation is easy nowadays. Bogomips can't have a decent value on most modern hardware, so I'd even have to use an emulator to get my old game working properly. Thankfully, the app that require this treatment and are in comon use should be a rarity.

Not breaking compatibility is important. But folling this rule to the extreme is silly.

http://xkcd.com/1172/

Haunted by ancient history

Posted Jan 12, 2015 3:37 UTC (Mon) by dlang (guest, #313) [Link] (1 responses)

Once you start saying that regressions for some people are acceptable if they make things better for more people you very quickly degrade into a very bad situation

different people's estimates of how many are helped by the fix vs hurt by the regression will vary wildly.

you will end up breaking _something_ for people on a regular basis

people remember things that break for them FAR more than they remember new features or speed. You need something way over 10:1 fix:break ratio to begin to have things break even (some people think this is over 100:1)

Linux-kernel development has been there in the past, and it had a reputation for breaking users systems on a regular basis. Avoiding a repeat of this history is why Linus fights so hard against regressions.

I've reported problems with binary-only apps and seen the offending patch reverted within hours. However, if you don't run current upstream kernels, or don't report when things are broken, you can't count on anyone else reporting it either, and so it's possible for the problem to live for a long while. "Enterprise" distros make this worse due to the added lag in getting new kernels out to users, so when a problem is noticed, it make be due to a change that was made upstream years ago. Such problems still need to be reported, and as this case shows, they are taken seriously by Linus and get fixed.

Haunted by ancient history

Posted Jan 15, 2015 1:12 UTC (Thu) by nix (subscriber, #2304) [Link]

people remember things that break for them FAR more than they remember new features or speed. You need something way over 10:1 fix:break ratio to begin to have things break even (some people think this is over 100:1)

Exactly! I was just about to follow up and comment on how a recent SCSI breakage of mine was fixed nice and fast -- but then I realised that that 'recent' breakage was in 2013! But it still *felt* recent, because it broke the boot in a scary fashion.

One nice example of a good recent fix for me was a series of helpful responses, suggestions, and finally a quirk addition when my Simtec Entropy Key started spontaneously failing in 3.16, even though 3.17 had been out for some time and 3.18 was fairly close to release. This was a particular monster to track down because it was timing-related, intermittent, only happened after a reboot, and required physical removal of a USB device to fix. I think I rebooted 180-odd times in the process of bisecting, and had two or three wrong bisection paths due to mistaking bad commits for good ones. I couldn't ask anyone else to do this -- Entropy Keys aren't very common -- and it took me some weeks to find the time to do it, but once it was tracked down, a fix was almost immediate, even though Entropy Keys are not exactly the most crucial hardware ever, and I say that even though my firewall won't boot without its nice juicy random numbers. Now that's the right way to fix bugs! So bravo Johan! :)

Haunted by ancient history

Posted Jan 10, 2015 1:07 UTC (Sat) by moltonel (guest, #45207) [Link]

If the change can go unnoticed (in other words if the value is good enough) you can be sure that some naive programmer will use it for new work. Using bogomips was already "obviously wrong" 10 years ago, and programmers still used it.

If you want naive programmers to notice you need an even wrong-er value like "1" or "not implemented", but that'll raise compatibility red flags. But if those wrong-er values are configurable...

Haunted by ancient history

Posted Jan 8, 2015 18:16 UTC (Thu) by darwish07 (guest, #49520) [Link] (1 responses)

"people wonder when their shiny new CPU shows as having 6.0 bogomips. Even ARM systems are expected to be faster than that."

Thanks for the laugh!

Haunted by ancient history

Posted Jan 15, 2015 6:19 UTC (Thu) by Russ.Dill@gmail.com (guest, #52805) [Link]

As far back as I can remember on ARM (some 12-14 years), people have been complaining on mailing threads and IRC that there is something wrong with the bogomips value on new hardware X, or on new kernel version Y. And every time the answer has been the same. It's *bogus* mips, or bogus, meaningless, interpretation of processor speed.

Haunted by ancient history

Posted Jan 10, 2015 8:30 UTC (Sat) by amonnet (guest, #54852) [Link] (3 responses)

Don't break userspace is a nice policy.

A well defined depreciation process could help in that.
Ex: this interface will disappear in 4.0; update your userspace before so that it doesn't break.

I'm sure it's better to have nice depreciation than unmaintened interface.

(Apply this to drivers too!)

+++

Off topic aside about the confusing words "Deprecation" and "Depreciation"

Posted Jan 10, 2015 13:01 UTC (Sat) by tialaramex (subscriber, #21167) [Link] (2 responses)

Depreciation is when things are worth less over time, for example a normal family car will depreciate - you can't buy a mid-range Ford Mondeo, drive it for a couple of years and sell it for the same or more money. Much as we joke about "code rot" APIs do not depreciate.

Deprecation is when things are no longer approved, for example the gets() C library function is deprecated - there are very few cases where it could be possible to use it safely so it's better no-one uses it at all and modern compilers may warn if it is used.

Off topic aside about the confusing words "Deprecation" and "Depreciation"

Posted Jan 10, 2015 14:02 UTC (Sat) by amonnet (guest, #54852) [Link] (1 responses)

API may not depreciate ;-)
However, maintenance costs increase over time.

API may not rot. Implementation are.

Go ask the maintainers...

+++

Off topic aside about the confusing words "Deprecation" and "Depreciation"

Posted Jan 14, 2015 17:15 UTC (Wed) by k8to (guest, #15413) [Link]

NO CARRIER

Haunted by ancient history

Posted Jan 12, 2015 10:52 UTC (Mon) by imitev (guest, #60045) [Link]

Re- bogomips

>> Linus was unsympathetic

Unsympathetic sounds nice compared to his next post [1]. If the regression had simply been a matter of reverting the patch, one may understand the harsh language, but it seems it's not [2].
Being a/the prominent kernel dev doesn't prevent him from staying polite, especially with the various initiatives being pushed for friendler open-source communities. That's really not the kind of language that will attract new kernel developers.

+1 for the "Civility within our community will continue to be a hot-button issue in 2015" prediction (which maybe was written in light of this specific thread).

[1] https://lkml.org/lkml/2015/1/4/148
[2] https://lkml.org/lkml/2015/1/5/267