|
|
Subscribe / Log in / New account

AMD's Display Core difficulties

By Jonathan Corbet
December 13, 2016
Back in 2007, the announcement that AMD intended to reverse its longstanding position and create an upstream driver for its graphics processors was joyfully received by Linux users worldwide. As 2017 approaches, an attempt by AMD to merge a driver for an upcoming graphics chip has been rejected by the kernel's graphics subsystem maintainer — a decision that engendered rather less joy. A look at this discussion reveals a pattern seen many times before; the positions and decisions taken can seem arbitrary to the wider world but they are not without their reasons and will, hopefully, lead to a better kernel in the long run.

A quick timeline

Back in February, Harry Wentland posted a patch set adding the new "DAL display driver" to the AMD GPU subsystem; this driver, he said, would "allow us going forward to bring display features on the open amdgpu driver (mostly) on par with the Catalyst driver." It was not a small patch, adding 279 new source files containing nearly 94,000 lines of code. That code saw little public discussion and was never merged, though it has become clear that some private discussion took place.

In March, Alex Deucher proposed that DAL should be merged, saying that it was to be the core of the new display stack; his goal was to get it into the 4.7 release. Graphics maintainer Dave Airlie made it clear that this was not going to happen, saying that: "I think people are focusing on the minor comments and concerns and possibly deliberately ignoring the bigger concern that this code base is pretty much unmergeable as-is." His biggest complaint had to do with the overall design, which involved a lot of abstraction code that tried to hide the details of working with the kernel from the rest of the code. Others echoed his concerns and, indeed, the code was not merged for 4.7 or any other kernel released since then.

The current chapter began on December 7, when Wentland posted an RFC note saying that this code (now going by the name "display core") was needed to support the next generation GPU coming out of AMD. The company, he said, has based all of its efforts on the display core code, using it as the foundation for all of its quality-assurance work, its OEM preloads, and more. And, he noted: "DC behavior mirrors what we do for other OSes". That last point is important; the display core code helps the company maintain the driver across multiple operating systems by hiding a lot of system-specific interfaces.

This time, Daniel Vetter complained about the abstraction layers in the code and described why they were not acceptable in the mainline kernel. Airlie responded more strongly, saying that this code would not be merged in its current form:

Given the choice between maintaining Linus' trust that I won't merge 100,000 lines of abstracted HAL code and merging 100,000 lines of abstracted HAL code I'll give you one guess where my loyalties lie.

As one might expect, a certain amount of back-and-forth resulted; the AMD developers were not pleased by this response. It can only have felt like a slap to a group of developers who were trying to do the right thing by getting support for their hardware into the mainline kernel. Even so, they stayed with the discussion, which remained almost entirely civil, and which, in the end, seems to be leading to a viable path forward.

The trouble with midlayers

There are a number of complaints with the AMD driver code; it is not often that tens of thousands of lines of new code are free of problems. But the biggest issue has to do with the midlayer architecture. A midlayer, as its name suggests, sits between two other layers of code, typically with the purpose of hiding those outer layers from each other. In this case, for example, the display core code tries to hide the details of low-level hardware access, allowing the upper-layer driver to run on any operating system.

The kernel community has a long experience with midlayers, and that experience is generally seen as being bad. For an extensive discussion of the midlayer pattern in the kernel, this 2009 article from Neil Brown is recommended reading. A midlayer architecture can bring a whole raft of long-term maintainability issues in general; the graphics developers are also concerned about some more specific issues.

The idea behind AMD's midlayer is to abstract out the Linux-specific details in the driver. That may be desirable for somebody trying to maintain a cross-platform driver; it also helps the AMD developers get the Linux driver working before the hardware engineers move on to the next product and stop answering questions. But code structured in this way is painful for people trying to maintain the Linux kernel. Understanding higher-level code becomes harder when that code does not follow the normal patterns used by graphics drivers; that can be hard for maintenance in general, but it can become a severe impediment to any sort of refactoring work. As Airlie put it:

If I'd given in and merged every vendor coded driver as-is we'd never have progressed to having atomic modesetting, there would have been too many vendor HALs and abstractions that would have blocked forward progression. Merging one HAL or abstraction is going to cause pain, but setting a precedent to merge more would be just downright stupid maintainership.

A hardware abstraction layer must be maintained to meet the needs of code for other operating systems — code that the Linux kernel developers cannot see (and probably don't want to). In effect, that means that nobody outside of the originating company can touch the midlayer code, making community maintenance impossible. If members of the community do try to patch the midlayer — often to remove code that, from the kernel's perspective, is redundant — they will run afoul of the driver maintainers, who may well try to block the work. If they are successful in doing so, the result is code in the community kernel that is essentially off-limits for community developers.

Functionality placed in a midlayer, rather than in common code, has a high likelihood of developing its own behavioral quirks. As a result, drivers using the midlayer will behave differently from other drivers for similar hardware, often in subtle ways. That creates pain for application developers, who no longer have a single driver interface to work with.

A midlayer will also tend to isolate its developers from the common core code. The midlayer will be fixed and improved, often to work around shortcomings in the common layer, rather than improving the common layer itself. Kernel developers would rather see that effort going into the common code, where it benefits all users rather than one driver in particular. Developers who work on this support code have a say in the direction it takes, while developers who work on a midlayer generally do not. So things tend to diverge further over time, with the driver developers feeling that the core is not developed with their needs in mind.

Finally, midlayer-based code has a certain tendency to get stuck on older kernel versions; indeed, the current display core code is still based on 4.7. That makes it hard to maintain as the kernel goes forward. In this case, Vetter summarized this aspect of the problem by saying: "I think you don't just need to demidlayer DAL/DC, you also need to demidlayer your development process." Code intended for the mainline needs to be developed and tested against the current mainline, or it will inevitably fall behind.

The way forward

Regardless of how one views the odds of seeing the Year of the Linux Desktop in the near future, it seems certain that those odds can only be worse in the lack of AMD GPU drivers. The blocking of such a driver — otherwise on track to be in mainline before the hardware ships — thus looks like a step backward for a subsystem that has already struggled to gain support for current hardware.

Chances are, though, that this standoff will be resolved more quickly than people might expect. The AMD developers were not unaware of the problems and, it seems, not unwilling to fix them. Deucher said:

What I really wanted to say was that this was an RFC, basically saying this is how far we've come, this is what we still need to do, and here's what we'd like to do. This was not a request to merge now or an ultimatum.

Some work has indeed been done since the early posting of this patch set; and, it is said, about one-third of the midlayer code is gone. Vetter made it clear that this work had been seen and appreciated:

I guess things went a bit sideways by me and Dave only talking about the midlayer, so let me first state that the DC stuff has massively improved through replacing all the backend services that reimplemented Linux helper libraries with their native equivalent. That's some serious work, and it shows that AMD is committed to doing the right thing.

The code that had been removed so far is, naturally enough, the easiest third to take care of; getting rid of the rest of the midlayer will require some significant refactoring of the code. Vetter provided a roadmap for how that work could be done; Wentland and AMD developer Tony Cheng agreed that the path seemed workable. Wentland acknowledged that things could have been done better at AMD, saying: "We really should've started DC development in public and probably would do that if we had to start anew." Actually getting all that work done may take a while; it must compete with other small tasks like making the driver actually work for existing and upcoming hardware.

One might conclude that what we are really seeing here is a negotiation over just how much of this work must be done before the code is deemed good enough that the rest of the fixes can be done in the mainline. Maintainers tend to worry about giving way in such situations because, once they have merged the code, they have given up their strongest leverage and can find that the developers become surprisingly unmotivated to finish the job. Arguably, that is a relatively unlikely outcome here; AMD has been trying to improve its upstream development for nearly a decade and its developers know what the rules are.

The most likely outcome, thus, is that this driver is delayed for perhaps a few more development cycles while the worst problems are taken care of and some sort of convincing story exists for the rest. Then it will be welcomed into the kernel as a welcome addition to mainline support for current hardware, and users worldwide will have something to celebrate. The Year of the Linux Desktop, unfortunately, may be a little slower in coming.

Index entries for this article
KernelDevelopment model
KernelDevice drivers/Graphics


to post comments

AMD's Display Core difficulties

Posted Dec 13, 2016 20:29 UTC (Tue) by tshow (subscriber, #6411) [Link] (43 responses)

It would be interesting for LWN to have a regular "State of Linux Graphics" article; I'd kind of like to know how long it will be before I can work on Vulkan & Wayland programs on a standard machine.

AMD's Display Core difficulties

Posted Dec 13, 2016 21:56 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (35 responses)

Phoronix is pretty good at that.

AMD's Display Core difficulties

Posted Dec 14, 2016 0:02 UTC (Wed) by Felix (subscriber, #36445) [Link] (25 responses)

Well, Phoronix has a broad coverage but it just throws out the "news" as fast as possible without much (any?) additional editing with a solid technical background. Phoronix posts almost never include a broader view about the situation and/or deeper technical explanation.

Also one has to remember that while numbers and benchmarks reported by Phoronix might be technically correct you must not jump to conclusions because the test compared apples to oranges (case in point: the recent post about "network performance" comparing Linux and BSD).

AMD's Display Core difficulties

Posted Dec 14, 2016 0:09 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (4 responses)

Michael Larabel (their main editor) is tracking lots of sources, including git repos, blogs and mailing lists. So it's pretty good at giving a heads up for something interesting. If you read the sources linked into Phoronix articles you'll get a pretty good picture of the current state.

And I actually find Phoronix benchmarks to be extremely useful. They have uncovered surprising performance regressions more than once. And BSD vs. Linux comparisons are an entirely fair game.

AMD's Display Core difficulties

Posted Dec 14, 2016 9:44 UTC (Wed) by ovitters (guest, #27950) [Link] (2 responses)

The Phoronix benchmarks usually lack a summary/conclusion. The analysis is left to the reader. Meaning: read the entire benchmark split across many pages to see if there's something strange. I stopped reading those benchmarks as a result.

AMD's Display Core difficulties

Posted Dec 14, 2016 11:31 UTC (Wed) by FLHerne (guest, #105373) [Link] (1 responses)

If you subscribe, you get them all on one page.

(still no analysis besides 'X was faster than Y').

AMD's Display Core difficulties

Posted Dec 16, 2016 0:17 UTC (Fri) by JanC_ (guest, #34940) [Link]

Also see OpenBenchmarking.org for test results…

AMD's Display Core difficulties

Posted Dec 14, 2016 21:17 UTC (Wed) by ballombe (subscriber, #9523) [Link]

My problem with Phoronix is that there is almost no link to external source, outside the occasional link to LKML or the Linux GIT repository.

AMD's Display Core difficulties

Posted Dec 14, 2016 9:52 UTC (Wed) by renox (guest, #23785) [Link] (19 responses)

How is comparing Linux and BSD network performance comparing apples to oranges?
Please explain: both ran the same series of test..
Without sensible explanation your post looks like a "fanboy post" disappointed that Linux lost most of the test.

AMD's Display Core difficulties

Posted Dec 14, 2016 11:50 UTC (Wed) by Felix (subscriber, #36445) [Link] (18 responses)

> Without sensible explanation your post looks like a "fanboy post" disappointed that Linux lost most of the test.

I'm sorry that it came across like a "fanboy post". While I prefer Linux over any BSD I can absolutely live with any of the BSDs being "better". After all I think this is not about "loosing". FreeBSD can have a much better networking stack without Linux being a "looser" unless you try to set up some kind of competition and rivalry (which some of the Phoronix clickbait titles seem to do).

Ok, some more details about the Phoronix article "Linux Distributions vs. BSDs With netperf & iperf3 Network Performance" (http://www.phoronix.com/scan.php?page=article&item=ne...). First of all I don't doubt that the number are correct. I assume that anyone could set the same stack and see similar numbers.

However what is the actual meaning of these numbers? What can you learn from them?

The danger I see is that many not-so savy readers will treat them as a kind of absolute truth, e.g. "FreeBSD can handle 10x more TCP request/reponse sequences than Fedora".
But it doesn't mean that. First of all I assume that Michael installed Fedora Workstation (given that it had GNOME shell with wayland installed). Also I think he just used the default installation.

However a default Fedora Workstation installation really has way more software installed (+ services active) than a FreeBSD default install. Having a complicated packet filter setup (e.g. from firewalld) might explain a lot of the slowdown when FreeBSD by default doesn't have any active packet filter.

So the test results are really about: What is the performance of the default install for some Linux distributions/BSD flavors? However it doesn't say anything what these systems can do with a bit of tuning - because anyone who really cares about networking performance won't use the default configuration. Instead I assume you would very carefully benchmark different hardware, drivers and settings in the software stack.
(The alternative is that you just need a computer/server for the usual stuff where raw networking speed is often the least of your concerns.)

On the other hand you might want to have a packet filter and/or some network features enabled because they are important in your production setup (maybe TCP syn cookies or just anything really).

So if you are serious about "How does the raw *Linux compares to *BSD?" you should define a certain baseline what your system should be able to do and then do the number dance.

Measuring default settings might be interesting for use cases where users are not expected to modify their system before using it (gamers might come to mind) but I think very few of these users really care about really comparing performance numbers at all (maybe besides GPU/gamers).

What is certainly interesting is to look why numbers are vastly different and why certain regressions happen, bisecting (or at least reporting) bugs with certain drivers. But Michael doesn't do any of this. Usually he does not provide anything like a "big picture" which factors might contribute to the results he observed, he does not provide any technical interpretation. IMHO this is just useful to fuel troll attempts and rants on a forum by people who just read headlines and start typing the next rant.

AMD's Display Core difficulties

Posted Dec 14, 2016 14:19 UTC (Wed) by mmechri (subscriber, #95694) [Link] (16 responses)

I couldn't agree more with this.

Phoronix argues that they on purpose test using default settings because this is what most users tend to end up using. While this is certainly true for a desktop usage, I would be very surprised if this is true for anyone setting up a server and who cares about network performance.

AMD's Display Core difficulties

Posted Dec 14, 2016 14:51 UTC (Wed) by renox (guest, #23785) [Link] (15 responses)

I disagree: why should you have to touch at various settings to get good network performance?
Plus apparently FreeBSD was better than Linux in both latency and throughput tests..

AMD's Display Core difficulties

Posted Dec 14, 2016 15:33 UTC (Wed) by pizza (subscriber, #46) [Link] (14 responses)

Because "default settings" you'd use for a workstation are not the same as the "Default settings" you'd want for a server.

AMD's Display Core difficulties

Posted Dec 14, 2016 16:01 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (13 responses)

So why should workstations be slower than servers?

AMD's Display Core difficulties

Posted Dec 14, 2016 16:20 UTC (Wed) by pizza (subscriber, #46) [Link] (10 responses)

Because optimizing for one use case nearly always comes at the expense of another?

AMD's Display Core difficulties

Posted Dec 14, 2016 16:22 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (9 responses)

Why a workstation is different from a server?

And anyway, I have never been able to tune Linux's IP stack to be significantly faster.

AMD's Display Core difficulties

Posted Dec 14, 2016 16:40 UTC (Wed) by pizza (subscriber, #46) [Link] (8 responses)

Because they are used for different things?

(Oh, for example, A server sitting in a rack without any heads acting as a NAS on a trusted network has different needs than graphically-intensive EDA tools on a portable workstation that might connect to a public hotspot)

AMD's Display Core difficulties

Posted Dec 14, 2016 16:41 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (7 responses)

So why would one kind of a server need slower network?

AMD's Display Core difficulties

Posted Dec 14, 2016 16:56 UTC (Wed) by zdzichu (subscriber, #17118) [Link] (6 responses)

Slower network performance comes as a drawback of more strict firewall configuration and lower power usage. Probably. There is no analysis in this benchmark, neither “perf” nor any other tracing to pinpoint bottlenecks.

AMD's Display Core difficulties

Posted Dec 14, 2016 17:10 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (5 responses)

Nope. Linuxes tested by default have zero firewall rules.

That's why Phoronix test are useful - they make people to at least look at this stuff. Tests with more details would be nice, but in their current form they are quite thought-provoking.

AMD's Display Core difficulties

Posted Dec 14, 2016 17:42 UTC (Wed) by pizza (subscriber, #46) [Link]

> Nope. Linuxes tested by default have zero firewall rules.

Incorrect. I can't speak for the others, but Fedora 25 (and I believe CentOS7) default to having a firewall enabled, even on minimal installations.

> That's why Phoronix test are useful - they make people to at least look at this stuff. Tests with more details would be nice, but in their current form they are quite thought-provoking.

No.. they really aren't. But with a bit more work on their part, they could be.

AMD's Display Core difficulties

Posted Dec 14, 2016 18:11 UTC (Wed) by johannbg (guest, #65743) [Link] (3 responses)

You can get the same performance out of the same components running on the same kernel on the same hardware regardless what that distribution would be called so benchmarking tests from moronix is just comparing default generic settings, settings which are determined by the community surrounding each distribution to cater to that communities needs and maintainability thus are pretty much useless.

Truly benchmarking, tweaking, tuning and, identifying the bottleneck and report where they resides is a very time consuming process.

Those moronix benchmarks just remind me of those idiots that brag about their uptime on hw with no true sustained load over period of time.

Bragger: Duh my *nix box has been running for 400 days!
Me: Has it now, show me the sustained load on the machine?
Bragger: load average: 0.01, 0.01, 0.01
Me: so basically you have just had that machine turned on for 400 days consuming power and paying for it so you can brag for it's uptime to other people. Bragger you must be a special kind of idiot.

Moronix is just another echo chamber of pour if that site can be called journalism on the internet much like mashdot.

This seems like a good stopping point

Posted Dec 14, 2016 18:27 UTC (Wed) by corbet (editor, #1) [Link] (2 responses)

If you have a technical issue with somebody's benchmarking techniques (or whatever), those can be legitimate items for discussion — though it's kind of off-topic here. But please can we refrain from childish name calling? That doesn't help anybody and doesn't make the conversation any more fun.

Anyway, my feeling is that this thread has pretty much run its course.

This seems like a good stopping point

Posted Dec 14, 2016 19:01 UTC (Wed) by johannbg (guest, #65743) [Link] (1 responses)

Dont stop on my account my feeling is that this thread is still full of rainbow, joy and ponies and "childish name calling" is in line with the quality of the technical journalism that takes place on both of those sites so arguably it fits perfectly so no I cant see why I or others should stop themselves from calling it so. ( And no I did not come up with those names myself, this is the nick name these sites have on these parts )

If each comment is required to help somebody than you can look at it this way.
People actually can get dumber ( through misinformation ) from reading articles from those sites so just view that comment as an public service announcement for this sites readers own health and I would be doing them a disservice not pointing that out.

Now whether that triggers readers emotion response such as joy I cannot say since each emotional response varies between each individual reader behind his or her screen so meh.

This seems like a good stopping point

Posted Dec 14, 2016 20:45 UTC (Wed) by rahulsundaram (subscriber, #21946) [Link]

> Dont stop on my account my feeling is that this thread is still full of rainbow, joy and ponies and "childish name calling" is in line with the quality of the technical journalism that takes place on both of those sites so arguably it fits perfectly so no I cant see why I or others should stop themselves from calling it so.

Kinda hard to criticize someone for doing something if you are "in line" with it.

AMD's Display Core difficulties

Posted Dec 14, 2016 16:20 UTC (Wed) by clump (subscriber, #27801) [Link]

You might want a workstation to care more about power savings than performance. There are many settings that favor one over the other.

AMD's Display Core difficulties

Posted Dec 15, 2016 14:49 UTC (Thu) by jond (subscriber, #37669) [Link]

Perhaps there's a desktop-benefiting feature (such as application-layer firewalling) that has a performance cost, that was deemed to make sense on a Desktop spin?

AMD's Display Core difficulties

Posted Dec 14, 2016 16:35 UTC (Wed) by pizza (subscriber, #46) [Link]

> What is certainly interesting is to look why numbers are vastly different and why certain regressions happen, bisecting (or at least reporting) bugs with certain drivers. But Michael doesn't do any of this. Usually he does not provide anything like a "big picture" which factors might contribute to the results he observed, he does not provide any technical interpretation

Personally, I find the Phoronix benchmarks to be very superficial. In many cases that is indeed all that is called for (eg more FPS in games == better) but when there is more than one axis for a given situation they tend to not be of much use in the real world.

For network performance, the details matter quit a bit (Does FreeBSD even have a firewall by default?) What's the CPU load? Does other I/O in the system get starved out? What about handling a thousand connections going in both directions versus a single connection in one? To the same recipient or different recipients? What about using raw packet sockets, or packet filtering or QoS disciplines?

What I'm getting at -- there are *many* variables here, and mindlessly focusing on single-stream throughput is what lead to the "bufferbloat" problem that makes real-world users suffer.

(This is also why reason the server industry uses big, unweidly benchmarks that simulate real-world applications undergoing real-world loads, rather than microbenchmarks for individual components..)

AMD's Display Core difficulties

Posted Dec 14, 2016 2:51 UTC (Wed) by tau (subscriber, #79651) [Link] (8 responses)

Phoronix certainly keeps up to date with the developments in the Linux graphics space. But the owner is fond of editorializing excessively and using incendiary language. Now, I hate ads and their associated invasion of privacy, and I don't really want to pay for a subscription to a publication with this sort of style either, so I resolve this dilemma by actively avoiding Phoronix articles altogether.

I would much prefer to see LWN's more neutral and professional editorial style cover the Linux graphics space; it's certainly very exciting stuff!

AMD's Display Core difficulties

Posted Dec 14, 2016 17:12 UTC (Wed) by hifi (guest, #109741) [Link] (7 responses)

This is exactly why I subscribed to LWN.

I still follow Phoronix a lot because of the fast coverage about the graphics stack regardless of the article quality. The *constant* begging for money, blaming the readers for not subscribing is something that really annoys me to the core.

It's a shame, really, I would like to support Phoronix.

AMD's Display Core difficulties

Posted Dec 14, 2016 17:38 UTC (Wed) by zlynx (guest, #2285) [Link] (6 responses)

He posted some statistics once. The set of Linux enthusiasts and the set of people running adblockers is almost a complete overlap.

So, yeah, Phoronix has some money issues. Subscription is really the only way to go. Maybe he needs to start blocking articles for a week like LWN.

As for constant money begging, what do you think those dollar signs next to every major LWN article are about? Just a bit more tasteful maybe, but the same thing.

AMD's Display Core difficulties

Posted Dec 14, 2016 17:54 UTC (Wed) by hifi (guest, #109741) [Link]

I don't see bold paragraphs of complaining in paid LWN articles but I did see on Phoronix regardless if I was a subscriber or not. Tastefulness means a lot to some people.

Annoying the readers can have the opposite effect. I did subscribe for a year.

AMD's Display Core difficulties

Posted Dec 15, 2016 12:47 UTC (Thu) by niner (subscriber, #26151) [Link]

If Michael is that desperate for money, he'd invest some time into more payment options than just Paypal where you need an account. A simple but huge first step would be to allow for payment via Paypal with just a credit card but no account. That's indeed possible and would allow for me to subscribe. I wrote to Michael twice about this but no result. So no subscription either.

AMD's Display Core difficulties

Posted Dec 16, 2016 7:26 UTC (Fri) by mjthayer (guest, #39183) [Link]

> He posted some statistics once. The set of Linux enthusiasts and the set of people running adblockers is almost a complete overlap.

I now use PrivacyBadger instead of an Ad-Blocker. Phoronix had problems with that until recently, but those seem to be resolved. I would expect that at least some proportion of those enthusiasts might be willing to do the same if asked nicely.

AMD's Display Core difficulties

Posted Dec 16, 2016 15:02 UTC (Fri) by mstone_ (subscriber, #66309) [Link] (2 responses)

Interestingly, there's also just about complete overlap between ad networks and ad networks that periodically distribute ads with malicious content. Until the ad networks clean up their act there is no moral imperative to accept whatever crap they're trying to push down the pipe.

AMD's Display Core difficulties

Posted Dec 18, 2016 10:23 UTC (Sun) by branden (guest, #7029) [Link]

Bruce Perens is _very_ disappointed in you.

AMD's Display Core difficulties

Posted Dec 27, 2016 20:59 UTC (Tue) by Wol (subscriber, #4433) [Link]

There's also a very noticeable overlap between computer people and people with some degree of autism.

And a very strong overlap between people with autism, and people who can't stand noise and flashing images.

And there certainly was a strong overlap (maybe less so nowadays) between ad networks, and ad networks who sent loud flashy ads to all and sundry.

What's the difference between a person who uses an adblocker to read your site, and a person who at the first hint of an obnoxious ad just closes the tab and goes somewhere else? The guy who uses the ad blocker at least gets as far as reading the click-bait!

Cheers,
Wol

AMD's Display Core difficulties

Posted Dec 13, 2016 23:07 UTC (Tue) by Kayden (guest, #89093) [Link] (6 responses)

With Mesa master, recent Intel hardware is well supported. Broadwell, Skylake, Kabylake, Broxton, and Cherryview are conformant and should work great. Haswell isn't conformant yet, but I believe it's in pretty good shape and getting better all the time. Ivybridge and Baytrail are a bit sketchier still. Older hardware can't fully support Vulkan. The developers are pretty responsive to bug reports, and available on IRC.

I hear really good things about radv as well, so I wouldn't be shy about giving it a try at this point.

AMD's Display Core difficulties

Posted Dec 14, 2016 19:22 UTC (Wed) by arekm (guest, #4846) [Link] (5 responses)

And there was no intel xorg driver release for a few years...

AMD's Display Core difficulties

Posted Dec 15, 2016 1:30 UTC (Thu) by anholt (subscriber, #52292) [Link] (4 responses)

That's because we stopped building vendor-specific userspace display drivers and instead work on a cross-vendor shared KMS driver.

AMD's Display Core difficulties

Posted Dec 15, 2016 7:18 UTC (Thu) by arekm (guest, #4846) [Link] (3 responses)

What's misleading is that intel xorg driver is still actively developed (or mabe just maintained) in git, bugs are handled, fixes are commited. The difference between it and other drivers is that no releases are made.

Few people were asking here and there (me included on irc) on what's going on but never saw any answer or clear indication that intel xorg driver is obsolete.

No idea why it is maintained if (big if) it is obsolete and modesetting replaces it.

AMD's Display Core difficulties

Posted Dec 15, 2016 7:53 UTC (Thu) by thestinger (guest, #91827) [Link]

Intel's DDX driver is still an alternative to the standard infrastructure for 2D acceleration in Xorg but not Wayland. It's only the 2D acceleration and isn't relevant to OpenGL / Vulkan. It also isn't really relevant for programs implementing hardware accelerated rendering via OpenGL like Chromium.

AMD's Display Core difficulties

Posted Dec 17, 2016 8:04 UTC (Sat) by flussence (guest, #85566) [Link] (1 responses)

The modesetting driver obsoletes it — for any GPU that can be called an actual GPU. I've got several i915/i945 devices that still need the intel driver and its SNA code. The one time I got modesetting+glamor to actually run on those without X crashing at startup, the desktop was unusable.

There's no 2D equivalent of the DRM/Mesa stack, so the dark ages of Xorg drivers poking the hardware (semi-)directly will probably live on indefinitely.

AMD's Display Core difficulties

Posted Dec 17, 2016 17:22 UTC (Sat) by lamikr (guest, #2289) [Link]

For me with the modesetting driver with skylake based hd-5300 seems to work just fine. Only played one day, but also the Vulkan seems to be supported out of the box on Mageia. (No even need to add
option "DRI" "3"
line to xorg.conf, that was needed with the other Intel 915 driver.

AMD's Display Core difficulties

Posted Dec 13, 2016 21:28 UTC (Tue) by flussence (guest, #85566) [Link]

Maybe they should hire someone who has experience doing things the right way, like libv. ;-)

AMD's Display Core difficulties

Posted Dec 13, 2016 22:26 UTC (Tue) by djbw (subscriber, #78104) [Link] (5 responses)

Neil's Linux kernel anti-pattern articles had an outsized influence on the design of the libnvdimm sub-system

...and here's another example of this abstraction vs upstream pattern seen many times before: https://lwn.net/Articles/454716/

AMD's Display Core difficulties

Posted Dec 13, 2016 23:57 UTC (Tue) by Felix (subscriber, #36445) [Link] (4 responses)

also "Broadcom's wireless drivers, one year later" ( https://lwn.net/Articles/456762/ ) is about a similar situation...

AMD's Display Core difficulties

Posted Dec 14, 2016 9:15 UTC (Wed) by blackwood (guest, #44174) [Link] (3 responses)

It would be really great to have a permanent place with links to all these case-studies on midlayer fail. At least my experience is that within the kernel community the concept is so common that just blurping "midlayer!" is enough to kill a patch series.

But as soon as you talk to kernel and driver teams outside of that bubble (common from shared-across OS proprietary drivers and development), the only way to abstract stuff and share code engineers ever come up with is a midlayer. And without any experience in it, it seems to be fairly hard to explain what exactly it is, and why exactly it's needed ...

AMD's Display Core difficulties

Posted Dec 14, 2016 11:14 UTC (Wed) by liam (guest, #84133) [Link] (2 responses)

I'm not sure the broadcom example is one we should point to as a mid-layer fail.

AMD's Display Core difficulties

Posted Dec 14, 2016 12:14 UTC (Wed) by Felix (subscriber, #36445) [Link] (1 responses)

> I'm not sure the broadcom example is one we should point to as a mid-layer fail.

I think this is about a similar situation especially if you read also the comments.

For example one Broadcom developer wrote ( https://lwn.net/Articles/456872/ ):
> The brcmsmac driver has architectural alignment with our drivers for other operating systems, and we intend to to enhance and maintain this driver in parallel with drivers for other operating systems. Maintaining alignment between our Linux driver and drivers for other operating systems allows us to leverage feature and chip support across all platforms.

Sounds familiar?

AMD's Display Core difficulties

Posted Dec 21, 2016 7:10 UTC (Wed) by liam (guest, #84133) [Link]

Oh, I'm not saying that I don't see the similarities with DC. As you've noted, they are striking.
What I'd argue is that the current situation with brcm radios might be much different (better) had their work been accepted and upstreamed.
There are a few things that would've gotten us: ootb, fully functional drivers for what are ubiquitous (well, in my experience, though since I make it a point to notice and avoid them, in particular, my feel of the sampling data might be skewed) wifi/bt modules, and 2) broadcom would have had some tangible evidence that upstream can be reasoned with. That last point can't be overstated (and surely not every hardware company acts like Samsung with their exynos drop). Once their in the community you can nudge them, over time, towards the subsystem's best practices.
Currently, the situation with their hardware just sucks. That's why I'd not point to this as an example of mid-layer fail. They didn't even have a chance TO fail, and it's only a failure if the company isn't able to produce readable patches which maintain/improve functionality.
After all, the worst that could happen is that they give up and you're in at least as good a position as you were before (assuming they don't actually do a Sammy and especially if the b43 driver continues focusing development) though I certainly see how the prospect of beating their leavings into shape would be a bear.

AMD's Display Core difficulties

Posted Dec 14, 2016 6:55 UTC (Wed) by mjthayer (guest, #39183) [Link] (4 responses)

I for one certainly hope that they manage to get this merged in a way which does allow sharing code between their different operating system drivers. I think that the experience gained would be valuable for everyone involved (and I would hope, though you never know beforehand, that it would end up helping the AMD team reduce their Linux work overall and improve their Windows driver into the bargain).

I also have to say that the overall mailing list exchange was very civil and constructive on all sides.

AMD's Display Core difficulties

Posted Dec 14, 2016 9:02 UTC (Wed) by blackwood (guest, #44174) [Link] (1 responses)

The linked master plan I typed up explains how to demidlayer while still being able (hopefully, but I'm positive) code. Because code sharing is very much the goal, since common code that's debug by the hw engineers at power on is pretty much the only way you can write a gpu driver these days. At least for the low-level memory bandwidth/clock stuff. Incidentally that also means checkpatch&co will be off-limits for those files.

Roadmap: https://lwn.net/Articles/709001/

AMD's Display Core difficulties

Posted Dec 14, 2016 12:03 UTC (Wed) by moltonel (guest, #45207) [Link]

I wonder what the relative size/complexity of the midlayers for various kernels is. I imagine that for historical reasons, the common code follows the Windows driver API fairly closely and therefor the midlayer on Windows is comparatively smaller ? Is there a way to reduce/eliminate the midlayer not just for Linux, but for all kernels ?

AMD's Display Core difficulties

Posted Dec 15, 2016 22:23 UTC (Thu) by error27 (subscriber, #8346) [Link] (1 responses)

Everyone thinks their compat code is different but I can't think of a single example that hasn't sucked...

Either way, writing a Linux driver is hard. If you just write a straight up clean implementation that's pretty hard. If you try to port the windows driver to linux without changing the windows code that's also hard and the results are messier. It also doesn't
mesh at all with Linux development style which is that the Linux kernel should be upstream and anyone else (BSD, old enterprise kernels) can be downstream if they want.

AMD's Display Core difficulties

Posted Dec 16, 2016 7:50 UTC (Fri) by mjthayer (guest, #39183) [Link]

> Everyone thinks their compat code is different but I can't think of a single example that hasn't sucked...

>Either way, writing a Linux driver is hard. If you just write a straight up clean implementation that's pretty hard. If you try to port the windows driver to linux without changing the windows code that's also hard and the results are messier. It also doesn't mesh at all with Linux development style which is that the Linux kernel should be upstream and anyone else (BSD, old enterprise kernels) can be downstream if they want.

Let's hope that this will be a first then. Daniel seems hopeful that it can be, and he has lots of experience both in the drm subsystem and in large companies contributing to it. In any case this should not be compatibility code as such if and when it is ready for inclusion, but rather (if I understand right) lower-level primitives, mainly for hardware programming, which are fine-grained enough to be shared and used by otherwise disjoint higher-level driver code for different platforms. I would naively expect that for something as complex as a modern GPU there would potentially be rather a lot of those. I would also expect, given that neither GPUs nor drm are static, that the direction drm evolves in future could be adjusted to some extent to make future integration easier.

Drivers, especially GPU ones, are complex code. Any sensible code sharing with other platforms will increase test coverage and ought as such to improve the code quality for everyone. So finding sensible ways to do that is probably worth some effort.

AMD's Display Core difficulties

Posted Dec 14, 2016 9:11 UTC (Wed) by marcH (subscriber, #57642) [Link]

"All problems in computer science can be solved by another level of indirection, except of course for the problem of too many indirections."

AMD's Display Core difficulties

Posted Dec 15, 2016 13:12 UTC (Thu) by anton (subscriber, #25547) [Link] (1 responses)

If the OSes involved are Linux, Windows, Mac OS, I guess that Linux is the only one that is that picky about code organization (because it's the only one that sees the driver as part of the OS). Would it be viable to write the driver to follow Linux conventions and have mid-layers for the other OSes (which probably also have more stable driver APIs)?

AMD's Display Core difficulties

Posted Dec 15, 2016 17:30 UTC (Thu) by ksandstr (guest, #60862) [Link]

>Would it be viable to write the driver to follow Linux conventions and have mid-layers for the other OSes (which probably also have more stable driver APIs)?

It would at least be possible in terms of technology and skills, assuming that the per-company HAL that AMD built is sufficient in doing what it was intended to do. And if it isn't, off with its head already!

Given how the kernel interfaces are always changing, AMD could even have participated in, or even initiated, some kind of medium-term development series to make them suitable to whatever secret sauce AMD's silicon snowflake was deemed to require. But obviously not without "helping the competition", "diluting our proprietary advantage", "exposing clues of our trade secrets", or some other rephrasing of the well-worn corporate canard.

What was AMD going to do, anyway -- remove the HAL on its own? (what would happen if it nevertheless didn't?) Whine about time spent working on a known dead-end, and ask for special treatment?

AMD's Display Core difficulties

Posted Dec 16, 2016 15:32 UTC (Fri) by excors (subscriber, #95769) [Link] (3 responses)

Perhaps one effective way to have unified low-level driver code across all OSes (saving significant development costs, and improving quality of support for niche platforms like Linux), without being allowed to write a midlayer abstraction that lets the same source code compile against each OS's kernel API, is to move as much code as possible out of the kernel driver and into proprietary firmware running on some chip in the GPU, so the kernel driver is reduced to basically making high-level RPC calls into the firmware.

I assume GPUs already do rather a lot in firmware, but is there still a lot more shared code they could move out of the driver? Would Linux developers be happier with approach, or would that be even worse than having hard-to-maintain midlayered code in the kernel driver?

AMD's Display Core difficulties

Posted Dec 16, 2016 15:53 UTC (Fri) by micka (subscriber, #38720) [Link] (1 responses)

>Perhaps one effective way to have unified low-level driver code across all OSes [...] for niche platforms like Linux

Depends on what you call niche platform. Sure Linux is a niche platform in the case of GPUs, but for most other hardware it is not.

> [...] so the kernel driver is reduced to basically making high-level RPC calls into the firmware.

If I rememeber correctly, that's the model followed by the original RPi graphics driver.
Well it's a bit special I think linux is actually the guest, the GPU being the host.

I found http://airlied.livejournal.com/76383.html

AMD's Display Core difficulties

Posted Dec 16, 2016 18:11 UTC (Fri) by excors (subscriber, #95769) [Link]

Yeah, I meant niche in the context of desktop/laptop GPUs, and particularly for people playing games on desktops/laptops (where quality of drivers is important for performance and compatibility).

(Linux is non-niche for mobile GPUs, given the dominance of Android, though Android devices usually have millions of lines of non-upstreamed kernel code so they're happy to ignore kernel developers' preferences anyway.)

As far as I'm aware, the RPi 'GPU' is a special case since it started as essentially a multimedia-focused standalone processor that could drive a product all by itself, and the 'firmware' was the entirety of the OS and application software that ran on it. Then one particular chip had a tiny ARM core stuck onto the side because it was cheap so why not, and that chip was eventually adopted by the RPi, which supported OpenGL ES on the ARM by proxying commands across to the original drivers in the firmware - it would have been too expensive to fully port the drivers to the ARM. (Now Broadcom employs Eric Anholt to develop proper Linux drivers for the 3D and display parts of the GPU, but that's taken years of effort and still isn't quite ready, because it's a lot of hard work). So it has that architecture mainly for historical reasons. I think most other GPUs were originally designed as accelerators for a host processor, so they've developed in the other direction and have been moving more logic off the host and into the GPU.

AMD's Display Core difficulties

Posted Dec 17, 2016 3:02 UTC (Sat) by pabs (subscriber, #43278) [Link]

That would be much worse. See for example the response to the initial code dump of the RPC-style OpenGL drivers for the RPi.

AMD's Display Core difficulties

Posted Dec 20, 2016 19:01 UTC (Tue) by iabervon (subscriber, #722) [Link]

I think it might be wise to think of it as the "DC midlayer" thing being a tool that lets you run the example code from the hardware designers. It's useful in that QA tests the example code for new chipsets in ways they can't test datasheets. Of course, you don't want end users to use the DC driver, but "compare with the Windows driver" is a sufficiently common debugging technique that it might be worth having around for that purpose.


Copyright © 2016, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds