Fun with NULL pointers, part 2
Fun with NULL pointers, part 2
Posted Jul 23, 2009 2:20 UTC (Thu) by quotemstr (subscriber, #45331)In reply to: Fun with NULL pointers, part 2 by etienne_lorrain@yahoo.fr
Parent article: Fun with NULL pointers, part 2
Intel's designers added the bound instruction to allow a quick check of the range of a value in a register. This is useful in Pascal, for example, which checking array bounds validity and when checking to see if a subrange integer is within an allowable range. There are two problems with this instruction, however. On 80486 and Pentium/586 processors, the bound instruction is generally slower than the sequence of instructions it would replace:
cmp reg, LowerBound jl OutOfBounds cmp reg, UpperBound jg OutOfBoundsOn the 80486 and Pentium/586 chips, the sequence above only requires four clock cycles assuming you can use the immediate addressing mode and the branches are not taken; the bound instruction requires 7-8 clock cycles under similar circumstances and also assuming the memory operands are in the cache. A second problem with the bound instruction is that it executes an int 5 if the specified register is out of range. IBM, in their infinite wisdom, decided to use the int 5 interrupt handler routine to print the screen. Therefore, if you execute a bound instruction and the value is out of range, the system will, by default, print a copy of the screen to the printer. If you replace the default int 5 handler with one of your own, pressing the PrtSc key will transfer control to your bound instruction handler. Although there are ways around this problem, most people don't bother since the bound instruction is so slow.
Posted Jul 23, 2009 8:54 UTC (Thu)
by etienne_lorrain@yahoo.fr (guest, #38022)
[Link]
Fun with NULL pointers, part 2
The two cmp solution needs 16 bytes (in protected mode) if the out-of-bound handler is within 256 bytes of the test, and 32 bytes if not: that is a complete cache line.
The bound solution needs 8 bytes, mostly because it does not encode the out-of-bound address handler.
The difference loading the other 24 bytes is a lot more significant than the 4 cycles difference.
Even the failed branch prediction you will probably get is more important - even the fact that you have polluted the branch prediction cache is probably more important.
The default INT5 screen print handler is not accessible under Linux, BIOS is not mapped and the APIC is configured differently, if I remember well you have a SIGBUS exception in user mode and something as easy to trap/abort in kernel mode.