An unexpected perf feature

Posted May 24, 2013 13:40 UTC (Fri) by dlang (guest, #313)
In reply to: An unexpected perf feature by ras
Parent article: An unexpected perf feature

you aren't going nearly far enough back

segmentation on the x86 came about with the 80286 (or possibly even the 80186, I'm not sure) CPU.

It was intended as a way to allow programs to continue to use 16 bit 8086 addressing but be able to use more than 64K of ram in a system (by setting a segment offset and then running the 16 bit code inside that segment)

It never was an effective security measure, because to avoid wasting tons of memory on programs that didn't need it, the segments overlap, allowing one program to access memory of another program.

When the 80386 came out and supported paging and real memory protection, everyone stopped using segments real fast

An unexpected perf feature

Posted May 24, 2013 23:31 UTC (Fri) by ras (subscriber, #33059) [Link] (3 responses)

> segmentation on the x86 came about with the 80286 (or possibly even the 80186, I'm not sure) CPU.

8086 actually. It was their way of extending the address space of the 8080 to 20 bits.

It was just a slightly more sophisticated take on the way CPU designers have gone about extending the range of addressable memory beyond the natural size of internal registers for eons. It had the advantage you could use separate address spaces for both code and data, giving you 64K for each. So by using a small amount of extra silicon they effectively doubled the amount of address space you had over simple "page extension" hack everybody else was doing. Kudos to them.

So you are saying it was the 80286 is where the rot set it. In the 80286 they had enough silicon to give the programmer true isolation between processes. They could of gone the 32 bits + page table route everybody else did, but no we got segmentation (without page tables instead) and retained 16 bit registers. Why? Well this was also the time the hardware designers had enough silicon to implement microcode - so they could become programmers to! And they decided they could do task switching, ACL's and god know what else better than programmers, so they did. In other words somehow they managed to forget their job was to provide hardware that ran programs quickly, and instead thought their job was to do operating system design.

It was a right royal mess. Fortunately for them the transition from DOS to Windows / OS2 (which is where the extra protections and address space matters) took a while, and by then the i386 was released. It added 32 bits and paging, so we could ignore all that 16 bit segmentation rubbish and get on with life. It turned out the transition to multitasking operating systems wasn't waiting on programmers figuring out had to do it (who would have thunk it?), but rather the price of the RAM needed to hold several tasks at once had to come down.

People here defending the segmentation model should try writing for the 80286, which could only address 64K of code and 64K of data at any one time. There was no reason for it. The 68000 family had a much better memory model at the time, so it wasn't silicon constrains. Well there would have been enough silicon if they hadn't devoted so much of it to creating their own operating system on a chip.

Intel finally came to their senses with Itanium. With it they used all that extra silicon to do what hardware designers should be doing - make programs run fast. Sadly it came along too late.

Back to your point - the time line. The 80286 was released in 1982. The iAPX432 was meant to be released in 1981. The 80286 was the fall back position. As Wikipedia points out, this is a part of its history Intel strives to forget. You will find no mention of the iAPX432 on their web site, for instance.

An unexpected perf feature

Posted May 24, 2013 23:50 UTC (Fri) by dlang (guest, #313) [Link] (1 responses)

> 8086 actually. It was their way of extending the address space of the 8080 to 20 bits.

That's right, I forgot about that. And IIRC, with the 286 they only expanded it to 24 bits

but in any case, my point was that the dawn of the AMD64 chip is eons past the point where segmentation was introduced, failed, and was rejected

P.S. I disagree with you about the Itanium. It wasn't a good design. They took far more silicon than other processors and ended up being less productive with it.

In part, the speed increases were the cause of this failure. As the speed differences between memory and the CPU core get larger, having a compact instruction set where each instruction can do more, becomes more important, and for all it's warts, the x86 instructions do rate pretty well on the size/capability graph.

but in part the programmers really should NOT have to change their programs to work well on a new CPU where the hardware designers have implemented things differently. By hiding such changes in the microcode, instructions that get used more can be further optimized, and ones that aren't can be emulated so they take less silicon. It's a useful layer of abstraction.

An unexpected perf feature

Posted May 25, 2013 0:21 UTC (Sat) by ras (subscriber, #33059) [Link]

> P.S. I disagree with you about the Itanium. It wasn't a good design. They took far more silicon than other processors and ended up being less productive with it.

I wasn't commenting whether it was a good design. I don't know, as I haven't used it. I was just saying Itanium marked the point when Intel went back to sticking to the knitting - in other words trying to design a new architecture whose sole goal was to run programs fast. If you say they failed despite having that explicit goal then that's a shame.

As I recall the Itanium tried to improve it's speed in a RISC like way - ie by keeping things simple on the hardware side and offloading decisions to the compiler. In the Itanium's case those decisions were about parallelism.

I think it's pretty clear now tuning the machine language to whatever hardware is available at the time is a mistake when you are going for speed. It might work when the hardware is first designed, but then the transistor budget doubles and all those neat optimisations don't much so much sense anymore, but you are welded to them because there are hardwired into the instruction set. Instead the route we have gone down is to implement a virtual CPU, rather like the JVM. The real CPU then compiles the instruction set on the fly into something that can be run fast with todays transistor budget. With tomorrows transistor budget it might be complied into something different.

The x86 instruction set is actually good pretty in this scenario - better than ARM. It's compact, and each instruction gives lots of opportunities to execute bits of it in parallel. If this is true, then Intel ending up with an architecture that can run fast is just dumb luck, as they weren't planning for it 30 years ago. Now that I think about it, the VAX instruction set would probably be even better again.

An unexpected perf feature

Posted May 25, 2013 10:36 UTC (Sat) by mpr22 (subscriber, #60784) [Link]

192k of data (point DS, ES, SS to different segments).