LWN.net Logo

Stable kernel 2.6.27.4

Stable kernel 2.6.27.4

Posted Oct 26, 2008 16:00 UTC (Sun) by nick.lowe (subscriber, #54609)
Parent article: Stable kernel 2.6.27.4

Sadly missing:

tcp: Restore ordering of TCP options for the sake of inter-operability

http://git.kernel.org/?p=linux/kernel/git/davem/net-2.6.git;...

The impact of which is that a machine running the 2.6.27 kernel is unable to access, depending on route taken, many hosts on the Internet.

For the unlucky few behind unmaintained/unpatched consumer routers that contain this bug, no TCP/IP services are accessible.

See:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/264019
http://www.ubuntu.com/getubuntu/releasenotes/810
https://qa.mandriva.com/show_bug.cgi?id=43372
http://wiki.mandriva.com/en/2009.0_Errata#TCP_timestamps


(Log in to post comments)

Stable kernel 2.6.27.4

Posted Oct 26, 2008 17:42 UTC (Sun) by gregkh (subscriber, #8) [Link]

The -stable tree can only accept patches that are already in Linus's tree, so this patch will have to wait until then for inclusion.

Stable kernel 2.6.27.4

Posted Oct 26, 2008 18:21 UTC (Sun) by jake (editor, #205) [Link]

> The -stable tree can only accept patches that are already in Linus's tree,
> so this patch will have to wait until then for inclusion.

It looks like that patch *is* in Linus's tree, at least as of yesterday. Was there just a timing issue between when the patches went out for -stable review vs. when Linus merged that patch?

jake

Stable kernel 2.6.27.4

Posted Oct 26, 2008 18:38 UTC (Sun) by nick.lowe (subscriber, #54609) [Link]

And...

"Applied and I'll queue up for -stable, thanks!"

http://bugzilla.kernel.org/show_bug.cgi?id=11721#c61

Ubuntu planning on releasing with this as a known issue

Posted Oct 26, 2008 18:49 UTC (Sun) by nick.lowe (subscriber, #54609) [Link]

The scary thing for me, and the reason that I and others are making so much noise, is that Ubuntu are planning to release 8.10 with this as a known issue!

Ubuntu planning on releasing with this as a known issue

Posted Oct 26, 2008 18:53 UTC (Sun) by nick.lowe (subscriber, #54609) [Link]

Ubuntu planning on releasing with this as a known issue

Posted Oct 27, 2008 5:29 UTC (Mon) by jengelh (subscriber, #33263) [Link]

By definition they can't win.

Ubuntu planning on releasing with this as a known issue

Posted Oct 27, 2008 11:28 UTC (Mon) by Cato (subscriber, #7643) [Link]

Ubuntu are releasing a workaround via procps for Ubuntu 8.10 - see https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2640...

Ubuntu planning on releasing with this as a known issue

Posted Oct 27, 2008 22:29 UTC (Mon) by ncm (subscriber, #165) [Link]

"... and then they had two problems."

Stable kernel 2.6.27.4

Posted Oct 27, 2008 21:03 UTC (Mon) by dowdle (subscriber, #659) [Link]

The work around for this is to use the sysctl fix. Isn't this an issue with bad firmware in routing appliances and not really Linux's problem?

Don't make me actually read all of those bug reports. :)

Stable kernel 2.6.27.4

Posted Oct 27, 2008 21:22 UTC (Mon) by nick.lowe (subscriber, #54609) [Link]

Absolutely, however that is a very academic point ;)

In an ideal world, bad router models would be identified, revised software produced and pushed out to all affected devices... but we all know that isn't going to happen.

Linux loses nothing reordering the TCP option negotiation to work around this and it regains interoperability with the Internet.

ECN

Posted Oct 27, 2008 22:27 UTC (Mon) by ncm (subscriber, #165) [Link]

As a case in point, try enabling ECN ("explicit congestion notification") in your TCP stack ("echo 1 >/proc/sys/net/ipv4/tcp_ecn"), and then connecting to, say, the Southwest Airlines site. Even after seven years, some routers that drop all packets with those bits set haven't been patched.

This whole episode exposes streaks of dangerous insanity in both the kernel and Ubuntu communities. What are the odds anybody involved will learn anything, or the right thing, from it?

ECN

Posted Oct 27, 2008 22:52 UTC (Mon) by agl (subscriber, #4541) [Link]

> This whole episode exposes streaks of dangerous insanity in both the kernel and Ubuntu communities.

Ubuntu will do what needs to be done to work around I'm sure. Ideally they would pull the patch into their kernel and push a revised kernel package. But there are many issues that I don't know about, I'm sure.

As for the kernel community, it doesn't say anything. I (yes, me) made this change and wasn't careful to keep the options in the same order because it's perfectly valid to emit options in any order and with any alignment. To be honest, the reordering was just a slip.

Now it happens that the broken hardware is, apparently, somewhat rare because the issue wasn't picked up in testing. Maybe hackers have an unrepresentative sample of networking hardware.

The correct thing to learn from this is a) option order matters and b) networking hardware sucks. I knew b) before hand, a) is new to me although given (b) it isn't a surprise. There's now a big fat warning around that code not to reorder the options in the future.

Speaking of "dangerous insanity" in the kernel community is incorrect and a little perplexing.

ECN

Posted Oct 27, 2008 23:42 UTC (Mon) by ncm (subscriber, #165) [Link]

Anybody can make a mistake.

The insanity shows up in imagining that a mention in release notes suffices in place of a fix. It shows up in resolving not to fix it because it's "too close" to release time, instead of delaying the release, if necessary, or patching without a delay. It shows up in adding a fragile, messy, and performance-damaging "procps workaround" instead of applying the simple, well-understood patch. It shows up in pretending that it's somebody else's bug.

Yes, it's their bug, but since 2.6.27 it's our bug too. Their bug is not following RFCs. Ours is failing to deliver packets routed through widely used hardware. Ours is objectively worse.

The lessons to be learned do not include anything about TCP option order, nor about networking hardware, but are all about responding sensibly to real problems. For the record, I have no complaints about agl's direct response, only the administrative and distributional foofaraw.

ECN

Posted Oct 28, 2008 0:04 UTC (Tue) by dlang (✭ supporter ✭, #313) [Link]

reading the post above yours it sounds to me like they did follow the RFCs.

the RFCs state that the options can be in any order.

unfortunantly some routers don't follow the RFCs and are sensitive to the order.

since that sensitivity isn't known to the developers of the code (and didn't show up in testing), how could they be in the wrong? or put it another way, what should they have done differently to have detected this before it was released?

the fact that Ubuntu may not pick this up in their initial release is a seperate issue.

unfortunantly this is one of the problems with a time-based release. the 2.6.27 kernel got delayed a bit, and so there was less than the expected amount of testing time between it's release and the ubuntu release. I can understand their reluctance to tweak the kernel. it may be that they end up including a -stable kernel instead of the base kernel, but that is going to be a matter of their internal decision process.

can you imagine the flames that they would get if they changed their kernel a week before release date (skipping some of their testing) and found that there were other, bigger problems in the new kernel?

ECN

Posted Oct 28, 2008 0:11 UTC (Tue) by ncm (subscriber, #165) [Link]

"They", who did not follow the RFCs, refers to whoever coded the routers.

Delaying the release, or going with a better-tested, older kernel, are both sane options. "Release Candidate" really shouldn't the considered the same as "release version", and problems found in one really should be fixed properly before release. Otherwise what's the point? Delays are embarrassing, but that's because bugs are embarrassing. Bugs are even more embarrassing if you fail to fix them.

More proof that the lessons won't be learned.

ECN

Posted Oct 28, 2008 0:42 UTC (Tue) by dlang (✭ supporter ✭, #313) [Link]

which 'They' are you referring to?

the 'They' who introduced and didn't find the bug followed the RFCs

the 'they' who are doing the ubuntu release need to balance the problems that this problem can cause with all the fixes that are in the 2.6.27 release (including the support for some very common wireless hardware)

you say that they should go with a 'well tested older release', but if they didn't do the testing of that older release is it really any better than the current release? or than going with the untested -stable release?

ECN

Posted Oct 28, 2008 1:01 UTC (Tue) by nick.lowe (subscriber, #54609) [Link]

Sorry, I do have to question:

"balance the problems that this problem can cause with all the fixes that are in the 2.6.27 release"

What!?

I would be fascinated to know what you think "the problems that this problem can cause" are?

The commit is http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-...

Yours in anticipation! :)

ECN

Posted Oct 28, 2008 2:06 UTC (Tue) by ncm (subscriber, #165) [Link]

Please read the posting you are replying to. 'They' is who I said it was: "whoever coded the routers".

Considering the details of the patch, the additional testing needed to verify a patched kernel is minimal -- certainly less than what was needed to verify the cheesy procps workaround with the present kernel.

ECN

Posted Oct 28, 2008 1:33 UTC (Tue) by nick.lowe (subscriber, #54609) [Link]

"unfortunantly this is one of the problems with a time-based release"

I couldn't agree more! My main objection to all of this is Ubuntu's definition of RC which seems to
be Release ComeHellOrHighWater,

The release date is set in stone in the version number (200)8.10. To me, it is an absurd way to
develop software, the date trumping release quality.

ECN

Posted Oct 28, 2008 1:53 UTC (Tue) by dlang (✭ supporter ✭, #313) [Link]

you need to be fair here.

Ubuntu has delayed a release in the past, so the release date isn't as fixed as you make it out to be.

also, the debian model of "don't release it until there are no known bugs" is also known to have it's own problems (releses that can be delayed for years)

since you feel very strongly about this, you really should be telling the ubunto folks your opinion, not just posting them on a third party website.

ECN

Posted Oct 28, 2008 1:57 UTC (Tue) by ncm (subscriber, #165) [Link]

Nothing but embarrassment keeps them from releasing 8.10 in November. The insanity lies not in picking a name that implies a target date, but in letting embarrassment at missing the target date trump release quality.

ECN

Posted Oct 28, 2008 8:48 UTC (Tue) by njs (subscriber, #40338) [Link]

...But they're not compromising release quality. As mentioned upthread, in a comment you replied to before writing this comment, they've already released a workaround -- one that was less risky than re-rolling their whole kernel package, and that had already been tested to fix the problem.

You know I respect you a lot, but in this thread, it feels almost like some need to find things to be angry about is overpowering your technical judgement. Hope everything's alright...

ECN

Posted Oct 28, 2008 19:01 UTC (Tue) by ncm (subscriber, #165) [Link]

Not angry, disgusted. I read through the whole launchpad thread, and was exposed to the denialism you must have missed.

I agree that 8.10 with the procps workaround will work OK on low-speed links. But the workaround is enormously more complicated, and has much broader effect, than the patch, so would need more testing, not less. And how will the workaround ever be got rid of?

My brother runs Ubuntu on his laptop, and brings it to me for fixups. The last upgrade (F -> G) was such a mess I that haven't encouraged him to bring it in for H. I certainly wouldn't suggest he do it himself.

ECN

Posted Oct 28, 2008 5:18 UTC (Tue) by jamesh (subscriber, #1159) [Link]

There will always be one more thing that'd be nice to fix before a release, but you need to stop at some point if you want to make a release. If you are going to freeze things make releases, you can do worse than regular time based releases (since it gives a good indication of when features that miss the release will be made available).

In this case, a problem was identified with two possible solutions: disable the problem code or fix it. While in the long term disabling the code is the wrong solution, it must have been deemed to be the less risky choice in the context of the upcoming release (i.e. which change is less likely to introduce more bugs). So, the choice made is actually related to release quality.

You'll see the same sort of decisions made elsewhere, picking a low risk solution for the short term and a correct solution for the long term. For example, disabling the dynamic ftrace feature in the 2.6.27 series rather than merging a proper fix.

Copyright © 2012, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds