Nothing has been moved

Posted Oct 31, 2012 10:09 UTC (Wed) by dlang (guest, #313)
In reply to: Nothing has been moved by Arker
Parent article: Airlie: raspberry pi drivers are NOT useful

phoronix has an article up in the last day or so comparing the free radeon drivers to the closed ones, the tests show a massive difference in speed.

It seems to me that if the card firmware had a high level API (like we are talking about here), I would not have to decide between using the latest kernel (self compiled, with various 'odd' config options) or getting good performance.

I would actually prefer a card like that to what I can currently buy for my systems.

Nothing has been moved

Posted Oct 31, 2012 23:36 UTC (Wed) by Arker (guest, #14205) [Link] (21 responses)

Yeah I read that, and it's not surprising. What's also not surprising, however, is that the Free drivers, while slower and having fewer features, are more stable and reliable.

It seems to me that if the card were architected as you say you would like, you might indeed get the performance of the proprietary drivers while still using a Free shim. But you would also get the instability, and there would be absolutely no way you could fix it except to get new hardware. So it doesnt seem like an improvement to me, quite the opposite. With my ATI hardware, I at least have a choice.

I would love to have Free drivers that were stable and reliable and also supported the full feature set and ran as fast or faster than the blobs - that's what we should expect, frankly, but I know we arent getting it right now. Free drivers that are stable predictable reliable at least give me the opportunity to use the ATI hardware in my system without the bugginess of the proprietary driver, at some cost. If it were architected like the pi, it sounds like I would no longer have that option, whatever binary buginess it has will be found in the GPU, which you have no access to, but which by contrast can corrupt or deliberately overwrite anything you do with the ARM chip.

No?

Nothing has been moved

Posted Nov 1, 2012 1:20 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (15 responses)

Problem is, ARM systems don't really have much computing power to spare. On a PC it's OK to lose several 80% of GPU performance - remaining GPU power is still enough to drive a composited desktop and/or a simple game. On ARM systems - not so much.

Nothing has been moved

Posted Nov 1, 2012 22:56 UTC (Thu) by Arker (guest, #14205) [Link] (14 responses)

Certainly they dont have as much computing power as one would like, but please dont pretend it's not enough to run 'a simple game' without help, that's ridiculous. The raspberry pi, without the main processor aka GPU, still ships with a 700 (1000) Mh 32-bit ARM11 cpu that is more than powerful enough to do all kinds of very interesting things with, particularly with half a gigabyte of high speed ram included. It's more powerful than a top of the line desktop CPU from only a few years back, dont be ridiculous.

Free that up from the control of this GPU and associated binary blob and it could be quite useful. If only it were as simple as tearing that chip off the die and soldering on a serial port...

Nothing has been moved

Posted Nov 1, 2012 23:49 UTC (Thu) by dlang (guest, #313) [Link] (13 responses)

you can run a simple game, but not a more complex game.

there is nowhere near enough processing power in the ARM chip to simply scale video from 320x200 to full screen without help from the GPU. I accidently triggered this with mplayer a few weeks ago, and it takes 10-20 seconds to play ONE second worth of video when you are doing the scaling on the ARM chip

Nothing has been moved

Posted Nov 2, 2012 3:20 UTC (Fri) by Arker (guest, #14205) [Link] (11 responses)

you can run a simple game, but not a more complex game.

I wrote and played very complex games on an 8bit processor at around 3mhz with 2 *kilobytes* of RAM. If you cant do the same with many thousands of times the resources, you are doing something wrong.

there is nowhere near enough processing power in the ARM chip to simply scale video from 320x200 to full screen without help from the GPU. I accidently triggered this with mplayer a few weeks ago, and it takes 10-20 seconds to play ONE second worth of video when you are doing the scaling on the ARM chip

Then you need to look at your software stack because the hardware is MORE than capable of it. I could do that smoothly with no problems well over 10 years ago on a 386sx with 1 megabyte of ram and a simple svga card. Presumably you are using an encoding with a significantly higher overhead than MPEG-1 but you are also looking at a system with many *hundreds* of times the horsepower. If it were programmed specifically for the task it could probably drive several different videos of that size to several different monitors at once without dropping a frame.

I hear all the time that things which were done routinely in earlier decades with far less power are now 'impossible' and it makes me laugh. These things arent impossible. Programmers have just been trained that their time is too valuable to spend it optimising anything, and the proper solution is to throw more hardware at it instead. It sells hardware I guess.

Nothing has been moved

Posted Nov 2, 2012 3:28 UTC (Fri) by dlang (guest, #313) [Link] (5 responses)

did your 386 have a 1920x1080 32 bit screen? or was it a 640x480 8 bit screen (i.e.VGA)?

the added screen data does make a significant difference.

I agree that lots of stuff is very bloated today, but your 386 was not expected to do full motion HD video output without GPU assistance, and it would not have been able to do so.

Nothing has been moved

Posted Nov 2, 2012 3:43 UTC (Fri) by Arker (guest, #14205) [Link] (4 responses)

It would actually go up to 1024x768 but I didnt think the monitor looked as good like that, so I usually ran it in 800x600 instead. Of course it didnt do "HD" that buzzword wasnt invented yet, but full screen full motion video without dropping a frame it could definitely do and did many times, and 'accelerated graphics' was also something yet to come, at least in my price range.

Now I had a very special software setup to do this, of course. A 'stock' configuration on the same machine would crap itself trying to play much smaller videos, in fact I started that project as a dare because the buddy that owned that machine was complaining it was obsolete because it was performing just like you described with your pi - taking a minute to play a second or two of video - with tiny lowres files even. But the hardware was still perfectly capable of doing the job.

Nothing has been moved

Posted Nov 2, 2012 16:19 UTC (Fri) by bronson (subscriber, #4806) [Link] (3 responses)

The job being talked about is pushing 1920x1080x4x24 (24 at a minimum) = ~190MB/s to the screen. You're talking about pushing 800x600x2?x24 = ~20MB/s.

Time moves on, ya know?

Nothing has been moved

Posted Nov 2, 2012 23:44 UTC (Fri) by Arker (guest, #14205) [Link] (2 responses)

I dont want to beat it to death, but please. Using your figures the video has scaled up 9.5 times (190/20). Comparing the clock rates of the processors, the hardware has increased in the same amount of time by a factor of 83 1/3rd (1000/16.)

And this blunt comparison is a *severe* underestimate of the real difference, because an ARM11 can do a lot more with a clock cycle. That 386 chip didnt even have a floating point unit, let alone tricks like SIMD, branch prediction, out of order completion... clock for clock the ARM chip would still be far more powerful. And that's before you even consider the cache architecture, the system bus... over 500 times the main memory.

I have no doubt at all that if you could get a few thousand of those arm chips in the hands of promising young programmers WITHOUT the fancy GPU to fall back on, one of them would shock you all by making it do things you think are impossible. But if he's told instead he has to use the high level interface and pass OpenGL to a blob he cannot inspect or modify, he'll probably just pass messages until he gets bored, or finds a bug he cant fix, and then move onto something less frustrating than proprietary computing, like playing football with a bunch of guys twice his size or having molars extracted for fun.

Nothing has been moved

Posted Nov 3, 2012 0:19 UTC (Sat) by dlang (guest, #313) [Link]

> I have no doubt at all that if you could get a few thousand of those arm chips in the hands of promising young programmers WITHOUT the fancy GPU to fall back on, one of them would shock you all by making it do things you think are impossible. But if he's told instead he has to use the high level interface and pass OpenGL to a blob he cannot inspect or modify, he'll probably just pass messages until he gets bored, or finds a bug he cant fix, and then move onto something less frustrating than proprietary computing, like playing football with a bunch of guys twice his size or having molars extracted for fun.

nobody is disputing that more access would be better, but you are making the assumption that doing new and interesting things with the video is the primary purpose of all users of the device.

It may surprise you that most people who use computers aren't going to try and debug video drivers or firmware, even where they do have that capability. They will usually just download the latest version to see if it's fixed, live with the problem, or revert to a prior version.

We saw this with the Intel video drivers a few years ago, fully open-source drivers, but when there were problems in the drivers in a ubuntu release, 99.999+% of the people just stuck with an older version.

For those people, the difference between a high-level API and a low-level API is meaningless. To be fair, probably 90% of them wouldn't care if the entire driver was a binary blob, but that still leaves a very large group of people who benefit from having all the kernel and userspace stuff being open, even while the firmware is closed and has a high-level API

Nothing has been moved

Posted Nov 3, 2012 15:28 UTC (Sat) by bronson (subscriber, #4806) [Link]

Comparing processor clocks is just silly. The framebuffer is not stored in L1 cache.

On the Pi, the FB is in shared RAM clocked at 400MHz. RAM probably has a bandwidth of around 250MB/sec (wild ass guess based on parts). If you're driving 1080p, that doesn't leave much bandwidth for anything else. Plus ca change, eh?

Nothing has been moved

Posted Nov 2, 2012 7:09 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (4 responses)

Nonsense. 386sx couldn't even do anything useful along with fluid full-screen animation in 13h mode (that's 320x200 with 256 colors) without palette tricks or hacks like Wolf3D's runtime code generation.

386dx was a little bit better - it could run simple games like Doom 3D, though it had to use pageflipping with non-standard display modes, because blitting 64kb framebuffer of data was too taxing for these systems.

Nothing has been moved

Posted Nov 2, 2012 11:03 UTC (Fri) by Arker (guest, #14205) [Link] (3 responses)

Nonsense.

This is exactly what I was talking about. You are so secure in your knowledge. Yet have you actually sat down and done the math?

I know it's possible because I did it, so I know that if you actually did the math it would have to be possible. There is a huge difference in what it takes to decode and display a video on top of a multi-user general purpose software stack mostly written in ultra-high languages and essentially unoptimised, versus what is actually possible given a highly optimised decoder running without interference on the bare hardware.

Even given the significant increases in resolution, and the modern codecs which require quite a bit more processor time, the increase in demand on the hardware is orders of magnitude off in comparison to the actual increase we have seen in hardware capability over time. That 386 ran at 16mghz and it was overclocked to do it, and clock for clock it was vastly inferior to your ARM11 which is running at over 60 times the clock frequency. Not to mention having 512 times the RAM on a much faster system bus...

Nothing has been moved

Posted Nov 2, 2012 15:55 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

I'm old enough to actually have used a 386sx-based computer. And to write simple demos on it - it was possible to write fluid (30fps) animation on it, but simply filling screen with a solid color was already taxing its RAM bandwidth.

Anyway, Pi's bandwidth is barely enough for full HD video as it is. If you throw in non-trivial rendering - it's simply not enough, again.

Unless, of course, you're ready to limit yourself to "Tetris" or may be "Digger".

Nothing has been moved

Posted Nov 2, 2012 17:24 UTC (Fri) by nix (subscriber, #2304) [Link] (1 responses)

Er, *none* of the video playback and decoding engines are written in 'ultra-high languages': the inner loops of most of them are handcrafted assembler. None of them are 'essentially unoptimized'. The pixel display routines in X are also, these days, mostly done by pixman and to a considerable degree done in handcrafted assembler, taking advantage of SSE and the like.

I note that your 386 almost certainly did not have to decompress compressed video and blit it at the same time as everything else.

Nothing has been moved

Posted Nov 2, 2012 23:50 UTC (Fri) by Arker (guest, #14205) [Link]

I didnt say the codecs are written in ultra-high level languages, although I wouldnt be shocked if you found an instance of it particularly on a less common architecture like ARM. But what I did say was that the rest of the system is often written so. And regardless of how good your codec is, it is still running inside of a much larger, looser system which has very significant performance costs.

Nothing has been moved

Posted Nov 2, 2012 16:56 UTC (Fri) by intgr (subscriber, #39733) [Link]

> there is nowhere near enough processing power in the ARM chip to simply scale video from 320x200 to full screen without help from the GPU. I accidently triggered this with mplayer a few weeks ago

The fact that MPlayer cannot do it isn't evidence that the CPU is incapable. Are you sure that MPlayer is actually utilizing everything that the ARM core has to offer? SIMD instructions etc?

It might just be that MPlayer on ARM is using some generic C scaling routine that nobody has bothered to optimize, because common x86 desktops are all running another assembly-optimized implementation.

Nothing has been moved

Posted Nov 1, 2012 6:29 UTC (Thu) by dlang (guest, #313) [Link] (4 responses)

the question is if the stability issues are due to the driver, or due to the driver's interaction with the rest of the kernel.

Personally, I suspect that most of the problems are in the latter category.

The closed driver is having to interact in a multi-threaded environment with other processes manipulating memory, with allocating memory in the same space as the rest of the kernel, and with all the locking that the rest of the kernel expects (and in some cases requires). And the closed driver is trying to do this without being modified from kernel version to kernel version, even though the rules for the kernel are changing (the locking rules in particular, although memory management changes somewhat as well).

If stuff running on the GPU limits itself to reading and writing buffers that are explicitly allocated for it, almost all of the problems mentioned go away, and the remaining 'shim' driver can evolve along with the rest of the kernel.

In this case, they talked about how part of the difference was the closed drivers supporting a newer version of opengl, with the high level interface this would not vary from driver to driver (unless specific drivers required different versions of the firmware), so I would expect that things would be a lot closer to feature parity between the two modes.

In any case, even with the ATI mode of doing things, if there is bugginess in the firmware, it can cause problems for the overall system

Nothing has been moved

Posted Nov 1, 2012 23:04 UTC (Thu) by Arker (guest, #14205) [Link] (3 responses)

the question is if the stability issues are due to the driver, or due to the driver's interaction with the rest of the kernel.

A distinction without a difference. The driver has no purpose and no function other than inside the kernel.

If stuff running on the GPU limits itself to reading and writing buffers that are explicitly allocated for it, almost all of the problems mentioned go away, and the remaining 'shim' driver can evolve along with the rest of the kernel.

But as long as the stuff running on the GPU is an opaque blob we cannot audit or replace there is absolutely no way we can ever have any confidence that it is limited like that.

Nothing has been moved

Posted Nov 1, 2012 23:47 UTC (Thu) by dlang (guest, #313) [Link] (2 responses)

>> the question is if the stability issues are due to the driver, or due to the driver's interaction with the rest of the kernel.

> A distinction without a difference. The driver has no purpose and no function other than inside the kernel.

Actually, in this case it is a very important distinction.

let's put it another way.

Are the bugs in the graphics logic, or in the interaction with the rest of the kernel.

If the bugs are in the graphics logic, then they would remain if they were separated the way the Pi broadcom driver is.

If the bugs are in the interaction with the rest of the kernel, then an API like the Pi has would allow us the best of both worlds, good graphics performance, and clean interaction with the kernel

The driver vendors keep wanting to have a stable API for their interaction with the kernel, and the kernel devs (for good reason) refuse to freeze the kernel internal APIs. But if the API to the device is defined and frozen by the firmware interface, everybody wins (except those people who want to make the graphics hardware do different things)

Yes, the graphics hardware could start scribbling to any part of memory that it wants, but technically, so could any bus-mastering controller card, and there have been very few cases where bus-mastering network or drive interface cards have caused problems from this.

Nothing has been moved

Posted Nov 2, 2012 3:32 UTC (Fri) by Arker (guest, #14205) [Link] (1 responses)

But if the API to the device is defined and frozen by the firmware interface, everybody wins

No, I dont agree. There is no win there for me at all (other than simply not buying it.) The current situation with my ATI card is far preferable. You may call it a win if you get what you want out of it, but you do not get to define it as a win for me. What I want is a system where there is nothing running that I did not put there, nothing that I cannot edit, no code that I cannot audit - that is the whole point to free software. The hardware I pay for should respond to my commands, not anyone elses. Your 'solution' gives me exactly zero of what I want, it's not a compromise, it's a total loss.

Nothing has been moved

Posted Nov 2, 2012 7:25 UTC (Fri) by dlang (guest, #313) [Link]

> No, I dont agree. There is no win there for me at all

you conveniently left off the caveat that covered you

>> (except those people who want to make the graphics hardware do different things)