Fun with NULL pointers, part 2

Posted Jul 23, 2009 2:20 UTC (Thu) by quotemstr (subscriber, #45331)
In reply to: Fun with NULL pointers, part 2 by etienne_lorrain@yahoo.fr
Parent article: Fun with NULL pointers, part 2

Not really all that nice:

Intel's designers added the bound instruction to allow a quick check of the range of a value in a register. This is useful in Pascal, for example, which checking array bounds validity and when checking to see if a subrange integer is within an allowable range. There are two problems with this instruction, however. On 80486 and Pentium/586 processors, the bound instruction is generally slower than the sequence of instructions it would replace:
     cmp     reg, LowerBound
     jl      OutOfBounds
     cmp     reg, UpperBound
     jg      OutOfBounds
On the 80486 and Pentium/586 chips, the sequence above only requires four clock cycles assuming you can use the immediate addressing mode and the branches are not taken; the bound instruction requires 7-8 clock cycles under similar circumstances and also assuming the memory operands are in the cache. A second problem with the bound instruction is that it executes an int 5 if the specified register is out of range. IBM, in their infinite wisdom, decided to use the int 5 interrupt handler routine to print the screen. Therefore, if you execute a bound instruction and the value is out of range, the system will, by default, print a copy of the screen to the printer. If you replace the default int 5 handler with one of your own, pressing the PrtSc key will transfer control to your bound instruction handler. Although there are ways around this problem, most people don't bother since the bound instruction is so slow.

Fun with NULL pointers, part 2

Posted Jul 23, 2009 8:54 UTC (Thu) by etienne_lorrain@yahoo.fr (guest, #38022) [Link]

Nowadays, if you are not in a tight loop written in assembly, you no more count the number of cycles of instructions but the amount of time it takes to load it into the layer 1 memory cache, and the time to reload the previous cache line after executing your instruction, it is basically proportional to the size of the instruction.
The two cmp solution needs 16 bytes (in protected mode) if the out-of-bound handler is within 256 bytes of the test, and 32 bytes if not: that is a complete cache line.
The bound solution needs 8 bytes, mostly because it does not encode the out-of-bound address handler.
The difference loading the other 24 bytes is a lot more significant than the 4 cycles difference.
Even the failed branch prediction you will probably get is more important - even the fact that you have polluted the branch prediction cache is probably more important.
The default INT5 screen print handler is not accessible under Linux, BIOS is not mapped and the APIC is configured differently, if I remember well you have a SIGBUS exception in user mode and something as easy to trap/abort in kernel mode.