When you're driving large displays with multiple layers and moving a lot of pixels around, an overlay engine that reads from multiple layers and combines them directly into the scanout path, without requiring memory-to-memory copies is often a more efficient use of available bandwidth than using the GPU.
Not taking advantage of hardware composition blocks leaves performance on the floor, and it's performance well worth taking advantage of.
CPUs and GPUs are becoming more powerful, certainly, but displays are getting larger, software is drawing more complex stackups (with more alpha and effects between layers), and there's seldom as much graphical compute as you'd like on embedded platforms, even the higher end ones.
Also, efficient multitasking between multiple GPU clients is still rather hit or miss. I've seen beefy Win7 desktop machines start becoming unresponsive to window drags, etc, when throwing a complex load at high end desktop GPUs because the compositor and application are fighting for the available GPU and the hardware and/or drivers don't time-slice it well enough to remain smooth. The problem is typically worse on embedded platforms.
Canonical reveals plans to launch Mir display server (The H)
Posted Mar 5, 2013 22:11 UTC (Tue) by mmarq (guest, #2332)
[Link]
ummm... your parallel is far from convincing.
Embedded platforms are not real display multitasking systems. The same can be said about phones, the GUI is simply not tailored for it.
The rest seems to go in the opposite direction, i mean, is the kernel that manges the GPU memory, and all is managed as a single pool of Virtual Memory... is inherently an UMA... and is not only about HSA, and are not only its members that are about to implement "hardware features" (advanced caches, TLB, pre-fetch, etc that molds nicely to the data programming patterns) that makes those round trip memory copies a thing of the pass.
I happen to read a lot of patents(not only for work), i'm expecting something in that direction, matter of fact Sony PS 4 as revealed is a Single unified chip who's main memory is GDDR5, that is, is the GPU side that has all the memory, is the CPU side that shares, if it is HSA modeled, and more in AMD style, those memory-to-memory copies will not apply... remains to know what OS (or version) they will employ. So the more efficient use in PS 4 will be exactly the GPU not the CPU.
So in this sense the wise approach would be to move anything of those pertinent mechanisms out of the windowing systems... yes more stuff in the kernel... or more stuff in the systemd as example.
Then the DS can even be multithreading, almost like any other app, and you can have several windows open even on several displays, all with active windows with focus, and drag & past things from one to another without complications and rendering complexities, of windows getting minimized or the all operation getting cumbersome with virtual desktops.
This is not necessary for those "embedded or portable" systems, but making it the standard, and force all to follow it, IMO is like trowing Linux desktop more than 10 years back... perhaps DOS is not after all a bad idea, 1 app at a time!...
And saying it will *never* apply to embedded/portable is a risky bet, *never* is always a very risky word in IT... some more adventurous hardware vendor might yet conceive remote display for a superphone, now that wifiG is a done deal, and you have your phone GUI splashed into 2 wireless big displays... never say never, at least you'll not lose for the waiting, but where is the Linux display system for this ?
Canonical reveals plans to launch Mir display server (The H)
Posted Mar 5, 2013 22:36 UTC (Tue) by mmarq (guest, #2332)
[Link]
And if you care to ask... yes, not only advanced caches, but also pre-fetch, TLB and MMU of sorts in a GPU also, matter of fact context/exception handling, and its own "cooperative" interrupt engine to. Very little differences to a CPU.