LWN.net Logo

vmsplice(): the making of a local root exploit

vmsplice(): the making of a local root exploit

Posted Feb 12, 2008 1:10 UTC (Tue) by dw (subscriber, #12017)
Parent article: vmsplice(): the making of a local root exploit

One of the several things over the years that has kept my nose firmly outside of the kernel
source tree is the strong and regular repugnance I feel when delving inside - I mean, look at
the number of ad-hoc calculations, bitshifts and whatnot mentioned in this article. Look at
the escalation of a simple bug into a major one by moving a single test to the end of the loop
in get_user_pages().

How much of a size/performance win was gained in return for sacrificing nice, simple
abstractions (or even just a bunch of well documented macros!), and simple, readable loops?

I notice that the NetBSD kernel bears some of the same resemblance. I will quite shamelessly
say that my userland C code never has, nor ever will look as ugly, unreadable, or ultimately
undocumented as most of the code in kernel space. Is this part of the
genius-hacker-pissing-contest culture? I (nor my current employer) would ever hire someone
that wrote code like this.

Sorry, rant over. Perhaps I'm just a stupid noob, or perhaps I have a hint of professionalism.
I can never quite work that out.


(Log in to post comments)

Argument goes back to Athens

Posted Feb 12, 2008 3:08 UTC (Tue) by smitty_one_each (subscriber, #28989) [Link]

As carried out by Plato and Aristotle, courtesy of Raphael:
http://abyss.uoregon.edu/~js/glossary/aristotle.html

This code is there not for performance

Posted Feb 12, 2008 7:32 UTC (Tue) by khim (subscriber, #9252) [Link]

It's easy to say in userspace code: we'll just never use structures as big as 2GiB or larger - then we can safely use signed offsets. Of you can use 64bit offsets if you want 4GB so badly. The problem with kernel is that both solutions are inadequate: 64bit offsets will require full-sized locking on many architectures - and that's sometimes can cost you 100-150% slowdown (or even more in rare cases), 2GB limit will make the system unusable (ask HURD guys who had such limit for partition size few years ago). Thus kernel code is full is strange arithmetic where you try to calculate something in range of terabytes (think disk) with just 32bit integers. Of course it's not "clean and easy to read" but it's necessary if you want production kernel and not a toy.

This code is there not for performance

Posted Feb 12, 2008 12:44 UTC (Tue) by michaeljt (subscriber, #39183) [Link]

Perhaps I misunderstood here, but I think that the original poster was saying something else -
not that you do not do all these things (strange arithmetic and suchlike), but that you keep
the code in one place (i.e. a set of macros) instead of duplicating it in lots of places.

Not that it would have helped much here, since the problem was a failure to validate user
input.

This code is there not for performance

Posted Feb 12, 2008 13:50 UTC (Tue) by ms (subscriber, #41272) [Link]

I'm obviously going to sound like a broken record again. The problem is validating incoming
data. Now you next need to ask, "why should we validate user input?" to which the answer is
"because the user can supply silly values" to which the next question is "well why can the
user to supply silly values?" to which the answer is "because the type of the values the user
is supplying are too wide". So basically, you're using the wrong types. Users should not be
allowed to present "bad" input: the type system should prevent it. In an ideal world...

This code is there not for performance

Posted Feb 12, 2008 14:38 UTC (Tue) by nix (subscriber, #2304) [Link]

That's nice in theory, but I'm not sure it's entirely practical to require the users to
provide a different pointer type for every possible VM range!

(Also, kernel/user boundaries are necessarily special places: typing across the boundary is a
matter of consensus as much as enforcement.)

This code is there not for performance

Posted Feb 12, 2008 14:49 UTC (Tue) by ms (subscriber, #41272) [Link]

Yeah, I don't disagree. It is a hard thing to do - you tend to get into messes with
dependently typed languages and so forth - one could easily argue that they're not quite ready
for writing kernels in!

Typically, you first implement maths in the type system, then you can implement a basic "is
smaller than" so then you could effectively arbitrarily refine the int range to be "the value
must be an int and must also be smaller than x". That kinda thing. Of course, as soon as you
hit "the value must be a function which will terminate", you're in trouble...!

This code is there not for performance

Posted Feb 12, 2008 15:05 UTC (Tue) by nix (subscriber, #2304) [Link]

There are a good few languages (Cayenne and Qi spring to mind) with type systems that are so
powerful that they themselves trip the halting problem: compilation is no longer guaranteed to
terminate. :)

I suppose a simple ranged length type (length must be >0) would have sufficed here: you
wouldn't need separate types for every possible valid pointer range.

Halting problem ? Ha!

Posted Feb 12, 2008 20:17 UTC (Tue) by khim (subscriber, #9252) [Link]

C++ has them beat: it's type system is not just trigger halting problem, it's turing-complete!

Halting problem ? Ha!

Posted Feb 12, 2008 21:00 UTC (Tue) by nix (subscriber, #2304) [Link]

C++'s template expander is modelled on ML's pattern matching. Cayenne and 
Qi are both perhaps two generations beyond that (both type systems being 
more powerful than Haskell's) in different directions: personally I prefer 
Qi's, but part of that is probably because it's possible to bootstrap the 
Qi implementation without being an ultra-guru).

Of course, C++ compilers often *appear* to not halt when compiling 
anyway ;}

Halting problem ? Ha!

Posted Feb 14, 2008 22:01 UTC (Thu) by lysse (subscriber, #3190) [Link]

I don't know if you're joking about C++, but one of the notable things about Qi is that its
type system *is* Turing-complete, by intent and proof (someone implemented SK combinators in
it).

Halting problem ? Ha!

Posted Feb 15, 2008 0:06 UTC (Fri) by ms (subscriber, #41272) [Link]

No, nix wasn't joking. If you use C++ templates and limit yourself to numbers then it really
is turing complete. Haskell, with the right flags (undecidable instances and overlapping
instances) is also Turing complete. Cayenne is deliberately so and many people are now really
thinking that it's just better to permit Turing completeness and let the programmer take
responsibility.

The alternative is more like what Epigram is looking at (amongst others) where you limit
recursion to on the structure of terms and prevent infinite structures. That way, you can
still guarantee termination. I fear we may now be some way from the original issue though ;)

Halting problem ? Ha!

Posted Feb 15, 2008 21:48 UTC (Fri) by nix (subscriber, #2304) [Link]

I used the wrong terms, really. What Haskell, Cayenne and Qi provide over 
C++ is (radically) greater *expressiveness*. The syntax of Qi type 
definitions is especially strange, but it's a hell of a lot saner than 
trying to define anything complicated using C++ templates.

This code is there not for performance

Posted Feb 13, 2008 15:45 UTC (Wed) by werth1 (subscriber, #48435) [Link]

The answer to the second question "why can the user supply silly values?" is more like:
Because he can.
The user is free to chose any language he wants to interface with the kernel.
In particular the user is free to chose a language without or limited type checks.
So any range checks on system functions have to be in the kernel itself.

This code is there not for performance

Posted Feb 16, 2008 11:28 UTC (Sat) by ernest (subscriber, #2355) [Link]

It is unfortunate that the CPU cannot enforce signedness and size types. 
Anybody programming in assembly can bypass any higher level language type 
checks you have in mind. This is true even if the users has the best 
intentions.

Ernest.

This code is there not for performance

Posted Feb 17, 2008 19:50 UTC (Sun) by giraffedata (subscriber, #1954) [Link]

It is unfortunate that the CPU cannot enforce signedness and size types.

The unfortunateness is at a lower level than that. It's unfortunate that a CPU can't do ordinary integer math, where 2 + 2 = 4. I understand why the very first CPUs wrapped around integers -- it happens naturally with the simplest implementations. But I don't get why no CPU today provides even the option of trapping on an arithmetic overflow instead of wrapping around silently. They do it for floating point, but not for integers.

Trapping on overflow

Posted Feb 23, 2008 21:45 UTC (Sat) by anton (guest, #25547) [Link]

But I don't get why no CPU today provides even the option of trapping on an arithmetic overflow.
MIPS and Alpha have separate arithmetic instructions that trap on signed overflow (e.g., ADD on MIPS and ADDV on Alpha). IA-32 has INTO which traps if OF is set. Apparently this instruction was so rarely used by programmers, that AMD64 removed it in order to free up some opcode space, and did not even bother to allocate another (multi-byte) opcode for it; but you can still implement the functionality by combining JO (or JNO) with INT.

The existence of INTO has not helped against this security hole, though.

Trapping on overflow

Posted Feb 23, 2008 22:24 UTC (Sat) by giraffedata (subscriber, #1954) [Link]

MIPS and Alpha have separate arithmetic instructions that trap on signed overflow ...

Nice. Do you know if there is any way to make GCC (or any other C compiler) generate such instructions?

I can understand people resisting adding instructions to handle overflow, but if I could declare in my C program "no arithmetic in here is supposed to wrap around" and get signalled to death if it does, I'd do it a lot.

Trapping on overflow

Posted Feb 28, 2008 21:23 UTC (Thu) by anton (guest, #25547) [Link]

Apart from asm statements and modifying gcc I don't know of a way to get gcc or other compilers to use the trapping instructions for C code.

Concerning "no arithmetic in here is supposed to wrap around", unsigned arithmetic is supposed to wrap around in standard C, only signed arithmetic is allowed to trap (or do anything else) on overflow.

Magic exists by necessity alone

Posted Feb 13, 2008 0:19 UTC (Wed) by jd (guest, #26381) [Link]

Well, yes, but ideally you'd have some kind of abstraction. Since the numbers and arithmetic are "magic", they must also be impermanent and subject to continual experimentation. Possibly by a coder, possibly by the person compiling, possibly by the OS itself. As such, those need to be the components that are most easily identified and changed in a consistant way - in terms of calculations and type ranges.

Making this "pure" beyond a certain level is, agreed, problematic. You don't have infinite CPU cycles and although the compiler can do some optimization, it can't turn an abstract, general solution into something tuned to a considerably narrower range of special cases that are usable in practice that are further constrained by the implementation details of a given architecture, which is all any real-world physical computer can ever be.

A way round this would be to have some sort of hypothetical generic architecture, which implemented the formal "pure" solution but was never - and could never - actually used in practice. As all usable architectures would necessarily be perfect subsets of the "pure" solution, it no longer matters if it is easy to read, you know it's just a re-arranged and contracted form.

However, here you run into a problem. This assumes a totally generic "pure" solution even exists, wholly independent of any architecture. Since you can't use such a solution, test-run such a solution or in many cases reverse-map onto such a solution, it's not obvious you could ever show a pure solution was indeed generic or was the solution the specific implementations were specific implementations of.

Magic exists by necessity alone

Posted Feb 15, 2008 23:23 UTC (Fri) by vonbrand (subscriber, #4458) [Link]

Even if a "perfect architecture" existed such that "real machines" are "perfect subsets" (real architectures are very different, sometimes in very weird ways), one of the problems writing a kernel is that real hardware is as buggy (or more!) as the next software (in the end, it is algorithms implemented in silicon, with the added burden that bugs can rarely patched and the rebuilt package distributed to the users).

vmsplice(): the making of a local root exploit

Posted Feb 12, 2008 13:18 UTC (Tue) by tialaramex (subscriber, #21167) [Link]

In my experience the kernel is documented where it needs to be.

Like my own code, it doesn't meet some arbitrary "programming by the book" rule about how many
comments there should be, but instead provides comments where it is expected that they would
be helpful to subsequent maintainers. Every year when I was assisting in the undergraduate
first programming class we had to first beat it into new students that they need to write
comments explaining what's going on in their code, and then beat it into them that useless
(e.g. i++; /* increment i */) explanations are worse than none at all.

Good documentation is like a good alarm system - it doesn't tell you about the ordinary, the
routine, which you are expected to already understand. Contrast AWS (a railway safety system
which alerts and requires action for all aspects except proceed, meaning it becomes more of a
routine annoyance than an actual safety aid) with ATP or TPWS (systems which do nothing until
an apparently unsafe condition occurs). Every comment is taking up space, both on the screen
and in the maintenance programmer's mind. A short function with one comment pointing at an
unusual condition for a loop may as a result be easier to comprehend than a similar function
with sixteen multi-line C++ style comments explaining mundane things you ought to be able to
pick up at a glance from the surrounding code ("this is a loop counter") or know as domain
knowledge for the system you're working on.

For example, if you work on code related to SCSI or filesytems or otherwise connected with
disks, you're expected to recognise that (bytes >> 9) converts from a byte count to a sector
count, since sectors are 512 bytes. Thus each incidence of this expression is not expected to
attract a comment even though it may initially seem inexplicable to someone who has never
worked with disks.

I've had to work on several parts of the Linux kernel (though I don't think any code I wrote
is currently in a Linus git repo) and I found it all satisfactorily documented. Unlike
Tanenbaum if I were grading this project I would give it at least an A minus. Maybe you're
just too sensitive - do you have trouble eating good cheese? Are you revolted by the thought
that game isn't freshly killed when its made into pies ? Not everything needs to look like
pristine sample code in order to be maintainable.

code documented where it needs to be

Posted Feb 12, 2008 16:12 UTC (Tue) by rfunk (subscriber, #4054) [Link]

Reminds me of Steve Yegge's latest.

vmsplice(): the making of a local root exploit

Posted Feb 12, 2008 16:42 UTC (Tue) by utoddl (subscriber, #1232) [Link]

I basically agree with what you're saying, but I'd like to make just a couple of counter-points.
...beat it into them that useless (e.g. i++; /* increment i */) explanations are worse than none at all.
True, that's useless to you and me, but for the noob programmer who has come through elementary and high school maths, the expression i++ is not immediately obvious. By the second or third C program he should be familiar with it, certainly. But my point is that the "document what's not obvious" standard requires a judgment call where "what's obvious" varies greatly with the experience of the coder.
For example, if you work on code related to SCSI or filesytems or otherwise connected with disks, you're expected to recognise that (bytes >> 9) converts from a byte count to a sector count, since sectors are 512 bytes.
If you say so. I don't happen to work in that domain often, so it isn't obvious to me. However, a well-crafted macro BYTES2SECTORS(bytes) would give me a clue, and the magic numbers and operations on them would live in the definition of the macro, so you've got one place to maintain the conversion and it's clear to us noobs what it's related to. Again, the point isn't about this particular case, but if you can give magic numbers and operations on them a name, you can make the intent more clear (and perhaps avoid a problem, like (bytes<9)). It's already clear to somebody, but will it be clear next week, or to the next programmer who may be less experienced?

vmsplice(): the making of a local root exploit

Posted Feb 12, 2008 18:14 UTC (Tue) by iabervon (subscriber, #722) [Link]

It is useless to the noob programmer who is looking at a piece of code with hundreds of
thousands of occurrences of the increment operator if they're all commented. It's worse than
useful to that same programmer if some but not all of them are commented (are the commented
ones somehow different?). The novice should have access to a tutorial and reference guide for
the language, and that should explain the notation. Even things which are unique to a
project's style shouldn't be documented in situ at every (or any) place they're used; they
should be documented once in a central location. So the standard should really be "comment
what's unusual about this location, given the context of the rest of the project; also
document what's unusual about this project, given the language it's written in; also have
language reference works available; and certainly don't duplicate any of this information,
since it will just get out of sync."

vmsplice(): the making of a local root exploit

Posted Feb 14, 2008 6:38 UTC (Thu) by jimparis (subscriber, #38647) [Link]

> > For example, if you work on code related to SCSI or filesytems or otherwise connected with
disks, you're expected to recognise that (bytes >> 9) converts from a byte count to a sector
count, since sectors are 512 bytes.

> However, a well-crafted macro BYTES2SECTORS(bytes) would give me a clue, and the magic
numbers and operations on them would live in the definition of the macro, so you've got one
place to maintain the conversion and it's clear to us noobs what it's related to.

But in the context of the kernel, hiding things in a BYTES2SECTORS macro can be downright
dangerous.  What types does it handle for input and output?  Does it sleep or have any locking
requirements?  If I pass an expression, do I have to worry about side effects if it gets
evaluated twice?  How does it behave if bytes is not a multiple of one sector?
Is it talking about the canonical 512-byte defintion of a sector or the actual 2048 byte
sectors used on CD-ROMs?

"bytes >> 9" is absolutely clear.  I see instantly that it handles integer types and matches
the signedness.  It is free from side effects and fast.  I know that the answer is rounded
down to the nearest whole sector.  Sectors are 512 bytes.

I don't see why you would want to hide all of that information from a developer.  Your version
is only simpler at a quick glance to a casual observer, not someone who actually has to deal
with the code and what the statement actually does.

vmsplice(): the making of a local root exploit

Posted Feb 12, 2008 19:20 UTC (Tue) by bronson (subscriber, #4806) [Link]

I agree with everything you said, except...

> you're expected to recognise that (bytes >> 9) converts from a byte count to a sector count

Then why not define BYTES_TO_SECTORS(n) ((n)>>9)?  Putting this bare shift into code
introduces an informal lingo that, as it builds up, can really get in the way for new
maintainers.  True, one instance is no big deal, but trying to get up to speed on code that
has more than five or ten undocumented idioms like this is a real drag.

Even if you understand the lingo, you will likely start to think, "Ah, those are now sectors"
whenever you see >>9 in the code.  That's bad too.

vmsplice(): the making of a local root exploit

Posted Feb 13, 2008 7:27 UTC (Wed) by JoeF (guest, #4486) [Link]

For example, if you work on code related to SCSI or filesytems or otherwise connected with disks, you're expected to recognise that (bytes >> 9) converts from a byte count to a sector count, since sectors are 512 bytes.

Hmm, no. The use of magic numbers is something I beat out of undergrads. Use a define, or a macro.
When I do code reviews at my job, I always latch onto the use of magic numbers.
And, your example is flawed, anyway. While hardware sectors may be 512 bytes (currently), filesystems usually deal with larger blocks.

vmsplice(): the making of a local root exploit

Posted Feb 14, 2008 6:40 UTC (Thu) by jimparis (subscriber, #38647) [Link]

> While hardware sectors may be 512 bytes (currently), filesystems usually deal with larger
blocks.

All the more reason to be just specify it clearly at the point of use rather than hiding it
away in a #define!  See my response above too.

vmsplice(): the making of a local root exploit

Posted Feb 14, 2008 19:44 UTC (Thu) by JoeF (guest, #4486) [Link]

You either need to add a comment, or use a define.
bytes >> 9 without a clarifier is something that should never be in production-quality code.
Magic numbers should never be in code without a clear explanation what they mean.
If you don't want a macro that can hide things, write something like bytes >> SECTOR_SHIFT.

vmsplice(): the making of a local root exploit

Posted Feb 14, 2008 20:42 UTC (Thu) by jimparis (subscriber, #38647) [Link]

> You either need to add a comment, or use a define.
bytes >> 9 without a clarifier is something that should never be in production-quality code.
Magic numbers should never be in code without a clear explanation what they mean.  If you don't
want a macro that can hide things, write something like bytes >> SECTOR_SHIFT.

If one doesn't want to hide things, hide it inside "SECTOR_SHIFT"?
I'm sorry, I remain unconvinced.
Using ">> 9" is essentially a comment saying "I'm converting this to a 512-byte sector count".
It's such a fundamentally basic operation that anyone working with filesystem or disk code
would understand it immediately.

Abstracting common constants is most useful when they might change, and they're just arbitrary
to begin with -- HZ for example.  Or things that actually have no real meaning, like magic
numbers that identify a filesystem.  For a number that has a real, well-known meaning, and for
which changing would involve huge logic changes in the code, I think macros like that are just
extra levels of obfuscation.

vmsplice(): the making of a local root exploit

Posted Feb 15, 2008 6:54 UTC (Fri) by JoeF (guest, #4486) [Link]

Using ">> 9" is essentially a comment saying "I'm converting this to a 512-byte sector count".

No. It only says that you shift a value by 9 bits to the right.
It may say to you that you are converting a value to a 512-byte sector count. It does not necessarily say that to others.

It's such a fundamentally basic operation that anyone working with filesystem or disk code would understand it immediately.

That mindset is what results in a lot of the flaws in all kinds of code. There will always be somebody working on the code for whom it isn't obvious. I venture that if you can't see that, you must still be in your first job and haven't yet had the task to maintain somebody else's code. I can tell you from (painful) experience that this kind of stuff is among the worst stuff out there.
">>9" makes implicit assumptions, and implicit assumptions, even if they may seem reasonable to the original author, are often not obvious to maintainers, possibly years down the road.
Oh, and just in case, I am writing filesystem code. I would never even get the idea to write something like ">>9" without a comment, or better, in a macro.

vmsplice(): the making of a local root exploit

Posted Feb 15, 2008 7:43 UTC (Fri) by jimparis (subscriber, #38647) [Link]

> I venture that if you can't see that, you must still be in your
> first job and haven't yet had the task to maintain somebody else's code.

And I venture that, in my experience (which is not as limited as you impolitely presume),
"somebody else's code" is a whole lot more difficult to work with when I have to dig through
arbitrary levels of opaque macros to figure out what they're really trying to do.

Clearly we have a difference in opinion.  I prefer in code that is written clearly enough to
be self-explanatory.  Consider:
  inode->i_blocks = bytes >> 9;
vs:
  inode->i_blocks = BYTES_TO_BLOCKS(bytes);

If you maintain that the second version conveys more information than the first, then I'm
afraid we will just have to disagree.

vmsplice(): the making of a local root exploit

Posted Feb 16, 2008 1:52 UTC (Sat) by dododge (subscriber, #2870) [Link]

> Consider:
>   inode->i_blocks = bytes >> 9;
> vs:
>   inode->i_blocks = BYTES_TO_BLOCKS(bytes);

It's a bit of an unfair example, though, because you're computing
against a value called "bytes" and assigning it to something called
"blocks".  You've put enough context around the expression to make
it clear what the shift is trying to accomplish.

The problem is when someone assumes ">> 9" is inherently
self-documenting and throws it into the middle of a much more
complex statement.  Consider:

  process_frag_2(sig,(get_ent(curr) >> 9) + 2,HEX_ENCODE);
vs:
  process_frag_2(sig,BYTES_TO_BLOCKS(get_ent(curr)) + 2,HEX_ENCODE);

When I'm reading code, I'd much rather see the latter.  It doesn't
just tell me why the shift is being done; it even adds useful
information about the APIs for get_ent() and process_frag_2().





vmsplice(): the making of a local root exploit

Posted Feb 16, 2008 17:13 UTC (Sat) by bronson (subscriber, #4806) [Link]

I guess we will disagree.  Your way is still hostile to new maintainers because it carries so
much implicit information.  What if your code code has this?

   inode->i_blocks = bytes >> 8;

What is a new maintainer to think?  Even if he does recognize that this expression is
different from the others, he's still confused.  Did you auto-optimize the expression (bytes
>> 9)*2?  Is it typoed or a thinko?

Will every potentially confusing use of the shift operator in your code include a comment
stating the author's intention?  If so, then will that convention still be true once someone
else has modified your code?

Also, you've given yourself a rather heavy restriction on variable naming haven't you?  Will
all your variable names really contain "blocks" and "bytes" as appropriate?  You'll probably
want to add this restriction as a comment to each file to which it applies or it won't last
long once other people start committing patches to your code!

Finally, your technique only works for trivial expressions like assignment.  What happens if
you need to actually perform a calculation?

A BYTES_TO_BLOCKS macro solves all these problems by making the conversion explicit.  Implicit
rules and unwritten conventions always make life hard for new maintainers.  Always.

vmsplice(): the making of a local root exploit

Posted Feb 13, 2008 17:09 UTC (Wed) by landley (subscriber, #6789) [Link]

It's easier for you to read what you wrote than it is for you to read what 
someone else wrote, therefore the problem is with the other person?

Reading code is always harder than writing code.  When you write code, by 
definition you have a working model of the code in your head and 
understand all the concepts involved (or it won't work).  When you read 
someone else's code, even well commented source code, you're trying to 
reverse engineer their thought process based on a machine they built which 
was primarily designed to function, not to teach.

Beyond that, working code is often complicated because it has to deal with 
the real world rather than a theoretical model.

Go read this:
http://www.joelonsoftware.com/articles/fog0000000069.html


vmsplice(): the making of a local root exploit

Posted Feb 13, 2008 17:34 UTC (Wed) by landley (subscriber, #6789) [Link]

So code other people wrote is harder for you to read than code you wrote.  
Join the club.  For code you wrote, you'd better have a complete 
theoretical model in your head or it won't work.  For other people's code, 
you're trying to reverse engineer their thought process by taking apart a 
machine they built.

So if you haven't figured out yet that source code is inherently harder to 
read than it is to wrote, or that working code accumulates complexity as 
it has to deal with a real world that does not cleanly match simple 
theoretical models, you're definitely on the "newb" side rather than 
the "professional" side.

Read this:
http://www.joelonsoftware.com/articles/fog0000000053.html

And note how much else out there agrees with it:
http://www.spinellis.gr/codereading/
http://withoutane.com/rants/2007/when-you-read-code
http://blogs.msdn.com/oldnewthing/archive/2007/04/06/2036...

and so on...

vmsplice(): the making of a local root exploit

Posted Feb 13, 2008 18:03 UTC (Wed) by dw (subscriber, #12017) [Link]

As a matter of diligence I (like many, many people) document large swathes of my code in the
form of comments. Basically anywhere it has taken me more than a moment to think about, or
where the purpose of the code cannot be inferred by reading API documentation for the calls
made inside it.

Magical integer literals and bitshifts fall well within this purview, and I cannot see it as
"noobness" in these two specific cases. Imagine someone had to go back and fix all those
bitshifts when we move to Some New Compiler (in Some Hypothetical Future). Can't be regexed,
no central place where a change can ripple through the tree, and utterly dangerous, say, if
this New Compiler takes advantage of the fact that bitwise right shift on a signed number is
undefined according to ANSI C (I'm taking this as one, single example. There are hundreds
more).

I understand that other peoples' code is difficult to comprehend. Hell, I've read more than my
fair worth of Other Peoples' Code. I'm talking specifically about why the Linux kernel seems
to be so full of this, but other commenters have given good reasons for some of this already.

code obfuscation

Posted Feb 16, 2008 1:21 UTC (Sat) by man_ls (subscriber, #15091) [Link]

True. Any idiot can obfuscate his or her code, but it takes work and wit to make legible code.

Copyright © 2008, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds