User: Password:
|
|
Subscribe / Log in / New account

Avoiding the OS abstraction trap

Avoiding the OS abstraction trap

Posted Aug 18, 2011 9:14 UTC (Thu) by dmag (guest, #17775)
In reply to: Avoiding the OS abstraction trap by Np237
Parent article: Avoiding the OS abstraction trap

> Many large-scale open source projects have found elegant solutions to similar problems

Most plug-in APIs (like browser plug-ins) have a small 'surface area'. This is not true with kernel drivers. There is a massive number of internal APIs/datastructures. Trying to keep back-compatibility would slow development.

For example, if the kernel would be faster if we eliminated the "jiffies" variable, should we delay that change because someone somewhere _may_ have a proprietary driver that _might_ use it?

> They let several APIs coexist for reasonable timeframes

Either you refactor the API or you don't. If your code is required to be back-compatible for some time, then you can't _really_ refactor your code. You can only "create a new API, shuffle things around a little bit." You are in a straight jacket and prevented from making radical simplifying changes.

> forcing people to migrate to new APIs after 1 or 2 years.

That implies forcing people to keep their radical patches for 1 or 2 years.

> I’m not asking for refactorings to wait for 3 years.

Well, you just said 2. Even 1 year is too long in my book.

> I fail to see in what way the kernel would need to be different.

If an internal API is deemed to be defective, you are saying it would have to be supported for years. Today, it is simply deleted and rewritten, so the defective API can no longer cause problems. Much simpler, no maintenance burden. (For examples, see read the excellent articles here on how the RCU API has changed thru the years.)

You're asking the kernel developers to take on a *lot* more maintenance burden. But what do they get in return?


(Log in to post comments)

Avoiding the OS abstraction trap

Posted Aug 18, 2011 9:27 UTC (Thu) by Np237 (subscriber, #69585) [Link]

To claim that the current kernel development model means less resources is a fallacy. Compared to other open source projects of similar size (KDE, GNOME, Mozilla, LibreOffice), the Linux kernel involves around 10 times more developers per kloc. The very reason for requiring so much people is that the current model IS the burden. Every time you rewrite the SATA stack or the USB one, it involves an insane amount of work to update every single driver.

1 year is too long in your book? Sorry, but that’s nice of you for living in a pure-developer world where you can break everything every 2 weeks. There’s also a world out there where computers need to just work, and to do the same thing for decades.

Avoiding the OS abstraction trap

Posted Aug 18, 2011 9:56 UTC (Thu) by dmag (guest, #17775) [Link]

> To claim that the current kernel development model means less resources is a fallacy.

Well, I guess I (and the kernel developers) agree to disagree.

> Compared to other open source projects of similar size (KDE, GNOME, Mozilla, LibreOffice)

It's not news that an Operating System Kernel requires more work than a Word Processor or a Browser. Want to bet that Linux supports more CPU architectures than your Word Processor supports file formats?

> Every time you rewrite the SATA stack or the USB one, it involves an insane amount of work to update every single driver.

Heh. It only looks that way on TV. Real programmers write programs to do the programming for them.

http://lwn.net/Articles/315686/

> There’s also a world out there where computers need to just work and to do the same thing for decades.

Exactly. That's the job of the distributors. They fork the kernel and support it for as long as people will pay for it.

Avoiding the OS abstraction trap

Posted Aug 25, 2011 0:25 UTC (Thu) by chip (subscriber, #8258) [Link]

To claim that the current kernel development model means less resources is a fallacy. Compared to other open source projects of similar size (KDE, GNOME, Mozilla, LibreOffice), the Linux kernel involves around 10 times more developers per kloc.
I don't know if that statistic is correct, but if it is, I'd attribute that to the difficulty of developing a kernel vs. developing an application. I've done both, at least a bit -- I identified and fixed an SMP-specific dentry race condition in NFS, and wrote the boot-time DHCP support -- so I'm not just guessing.

From The Tao of Programming, 3.3:

There was once a programmer who was attached to the court of the warlord of Wu. The warlord asked the programmer: "Which is easier to design: an accounting package or an operating system?"

"An operating system," replied the programmer.

The warlord uttered an exclamation of disbelief. "Surely an accounting package is trivial next to the complexity of an operating system," he said.

"Not so," said the programmer, "when designing an accounting package, the programmer operates as a mediator between people having different ideas: how it must operate, how its reports must appear, and how it must conform to the tax laws. By contrast, an operating system is not limited by outside appearances. When designing an operating system, the programmer seeks the simplest harmony between machine and ideas. This is why an operating system is easier to design."

The warlord of Wu nodded and smiled. "That is all good and well, but which is easier to debug?"

The programmer made no reply.

kloc

Posted Aug 25, 2011 0:31 UTC (Thu) by chip (subscriber, #8258) [Link]

Let us also consider the possibility that more effort per kloc means they're working harder to make better code.

Writing big code is easy! Writing small yet good code is much harder.

kloc

Posted Aug 25, 2011 4:41 UTC (Thu) by dlang (subscriber, #313) [Link]

when was the last time anyone found a million lines of dead code in the kernel (like what was removed by libreoffice from the openoffice.org codebase)

how many of the 'easy' office suites that existed 10-20 years ago are still in use today?

Avoiding the OS abstraction trap

Posted Aug 18, 2011 10:26 UTC (Thu) by fuhchee (guest, #40059) [Link]

"But what do they get in return?"

Perhaps more drivers from companies who cannot afford to chase upstream, but who'd be willing to produce mediocre/working ones for a stabler API.

This is the problem, not a solution...

Posted Aug 18, 2011 15:43 UTC (Thu) by khim (subscriber, #9252) [Link]

Yeah, "more crappy drivers" is real problem, and unstable ABI solves this problem quite handily, so what are the advantages?

It's actually better not to have any drivers rather then have crappy drivers - not just long-term, but also a medium-term. If there are no drivers then someone will write them or hardware will just be abandoned, but crappy drivers make user's life miserable for a long, long time.

This is the problem, not a solution...

Posted Aug 18, 2011 15:47 UTC (Thu) by Np237 (subscriber, #69585) [Link]

Of course, this is why we have such excellent drivers for wi-fi and graphics cards and the ones with crappy drivers have all been abandoned.

Wait…

If you tried irony then you failed. Utterly and completely.

Posted Aug 18, 2011 16:11 UTC (Thu) by khim (subscriber, #9252) [Link]

WiFi is actually pretty good example. We used to have bunch of crappy WiFi drivers but they were mostly abandoned and replaced.

As for videodrivers... well, the vendors decided that they absolutely, positively need to produce crap - and that's why videodrivers are crappiest drivers on all platform. I think open-source drivers eventually will fix the problem but this process is slow-going because most vendors are not just indifferent - they actively hurt these efforts.

If you tried irony then you failed. Utterly and completely.

Posted Aug 19, 2011 22:06 UTC (Fri) by zlynx (subscriber, #2285) [Link]

Yeah.

And for years (2004-2008) I used ndiswrapper and the WINDOWS wireless driver for my laptop because the Linux drivers either didn't work, crashed, broke suspend or changed their behavior. This seemed to vary with each kernel release.

As for the open source video drivers, for most of them I don't believe the problem is vendors hurting the effort. AMD and Intel have been supportive. I believe the problem is just not enough resources.

It needs a massive test effort that will verify correctness in all major OpenGL applications on each card version. It needs exhaustive OpenGL test suites that exercise all the code paths. That's a hundred test machines right there.

It needs dozens of developers who analyze OpenGL use by software and determine how to optimize the code. The commercial drivers often have optimized code paths for specific applications.

If you want to match Nvidia binary drivers, you have to do those things. It's expensive. Really expensive.

This is are totally different issues...

Posted Aug 20, 2011 10:07 UTC (Sat) by khim (subscriber, #9252) [Link]

And for years (2004-2008) I used ndiswrapper and the WINDOWS wireless driver for my laptop because the Linux drivers either didn't work, crashed, broke suspend or changed their behavior. This seemed to vary with each kernel release.

Yup, but this period stretched for so long because "use the ndiswrapper" was standard answer and thus native drivers were poorly tested and few developers spent time fixing them. This is obvious trade-off: stable ABI makes life easier for user short-term but does not improve situation all that much long term, yet it makes life of developers miserable both short term and long term... and kernel developers rarely accept any decision which promise tiny short-term relief at the price of huge long-term pain. That's why Linux is still around and a lot of other OSes are not.

It needs dozens of developers who analyze OpenGL use by software and determine how to optimize the code.

I can't care less about "optimized code paths". The fact that all these complex and fragile compilers are in driver is madness. They don't belong there! Just like format converters for webcams don't belong there. The problems with drivers I do care about are things the driver should do: setup the videomode, handle DMA, etc. And there are two problems with these needs:
1. There are no documentation: AMD's is probably the best, Intel is slightly worse, NVidia does not have anything.
2. These sequences are changing between releases (and sometimes between steppings) for no good reason.

Note that problem #2 is significantly worse and it's brought to light entirely because Windows offers stable ABI for driver writers. This means that decisions like "save 0.01% of transistor budged and fix all the crap in drivers" look viable. This then leads to problems in both Windows and Linux, but as long as thing more-or-less works in Windows chip can be sold.

If you want to match Nvidia binary drivers, you have to do those things. It's expensive. Really expensive.

It's not hard to match NVidia driver's quality. They may offer fast 3D in games - but I don't care about that. I do care about silly things like suspend, thermal monitoring, etc - and there NVidia drivers are awful. Open-source drivers are better in this regard - but just barely, they still are awful, nothing like [currently quite stable] WiFi drivers.

This is are totally different issues...

Posted Aug 20, 2011 19:30 UTC (Sat) by BenHutchings (subscriber, #37955) [Link]

This means that decisions like "save 0.01% of transistor budged and fix all the crap in drivers" look viable.

The decision is more likely "don't delay tape-out to fix this hardware bug". And it has nothing to do with the ability to hide the workarounds in a proprietary driver. Like programs, all chips have bugs, and you can see workarounds for them in almost any Linux driver you care to look at.

I'm not talking about bugs...

Posted Aug 21, 2011 8:19 UTC (Sun) by khim (subscriber, #9252) [Link]

A lot of drivers contain workarounds for buggy hardware. I'm not talking about these. I'm talking about the fact that each new generation of GPU chips have totally different initialization sequences and even basic operations often require totally different driver.

AFAIK videodrivers are more-or-less unique in this regard. 802.11n WiFi card have driver almost identical to what 801.11b had - even if chip internals are wildly different. Yes, I know, GPUs are significantly more complicated... well, they are not more complicated then CPUs and these can reuse older software just fine. About the only piece of silicone comparably buggy were winmodems - and these, too, were brought to life by the promise to paper over hardware problems in software.

This is are totally different issues...

Posted Aug 21, 2011 18:37 UTC (Sun) by Np237 (subscriber, #69585) [Link]

Yup, you point out the very problem. Once you care about speed (and at work, we do, a lot), you’re basically screwed. You’re forced to buy nVidia since they are the only ones with fast Linux drivers, and you get a shitload of bugs and no support for modern desktops.

This is the problem, not a solution...

Posted Aug 18, 2011 15:47 UTC (Thu) by fuhchee (guest, #40059) [Link]

"Yeah, "more crappy drivers" is real problem" ...

You are misparaphrasing me. Mediocre/working can be good enough for users. There exist other popular operating systems that demonstrate this quite well.

What "other popular operating systems" are you talking about?

Posted Aug 18, 2011 16:26 UTC (Thu) by khim (subscriber, #9252) [Link]

If you are talking about Windows then it's unmitigated disaster: drivers voodoo is huge part of that culture - and it only exist because neither users not hardware vendors can abandon this crazy platform. It's because it has a monopoly and a lot of softwate only work under window - not because it's such a fun to use. Often drivers only work for one particilar version of OS (Windows 98 or XP or whatever), and even then they are huge PITA.

And all other platforms have died for the lack of drivers: such diverse OSes and Solaris, Symbian and many many others were abandoned and replaced with Linux because Linux has the most drivers available.

Think Android: it's brand-new OS and we all know Google does not like GPL - yet it uses Linux, not NetBSD or something like that. Why? Because they needed "bag of drivers" and Linux was the only sane choice.

Or think RIM. They have choosen to switch to QNX. Well, QNX is good OS, but it does use your favored "stable ABI" drivers model... and suprisingly enough this means it does not have as many drivers as Linux and the ones they do have are often worse. This means RIM's hardware selection is limited and that means RIM is already dead: it's question of "when", not question of "if".


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds