User: Password:
|
|
Subscribe / Log in / New account

vmsplice(): the making of a local root exploit

vmsplice(): the making of a local root exploit

Posted Feb 12, 2008 13:18 UTC (Tue) by tialaramex (subscriber, #21167)
In reply to: vmsplice(): the making of a local root exploit by dw
Parent article: vmsplice(): the making of a local root exploit

In my experience the kernel is documented where it needs to be.

Like my own code, it doesn't meet some arbitrary "programming by the book" rule about how many
comments there should be, but instead provides comments where it is expected that they would
be helpful to subsequent maintainers. Every year when I was assisting in the undergraduate
first programming class we had to first beat it into new students that they need to write
comments explaining what's going on in their code, and then beat it into them that useless
(e.g. i++; /* increment i */) explanations are worse than none at all.

Good documentation is like a good alarm system - it doesn't tell you about the ordinary, the
routine, which you are expected to already understand. Contrast AWS (a railway safety system
which alerts and requires action for all aspects except proceed, meaning it becomes more of a
routine annoyance than an actual safety aid) with ATP or TPWS (systems which do nothing until
an apparently unsafe condition occurs). Every comment is taking up space, both on the screen
and in the maintenance programmer's mind. A short function with one comment pointing at an
unusual condition for a loop may as a result be easier to comprehend than a similar function
with sixteen multi-line C++ style comments explaining mundane things you ought to be able to
pick up at a glance from the surrounding code ("this is a loop counter") or know as domain
knowledge for the system you're working on.

For example, if you work on code related to SCSI or filesytems or otherwise connected with
disks, you're expected to recognise that (bytes >> 9) converts from a byte count to a sector
count, since sectors are 512 bytes. Thus each incidence of this expression is not expected to
attract a comment even though it may initially seem inexplicable to someone who has never
worked with disks.

I've had to work on several parts of the Linux kernel (though I don't think any code I wrote
is currently in a Linus git repo) and I found it all satisfactorily documented. Unlike
Tanenbaum if I were grading this project I would give it at least an A minus. Maybe you're
just too sensitive - do you have trouble eating good cheese? Are you revolted by the thought
that game isn't freshly killed when its made into pies ? Not everything needs to look like
pristine sample code in order to be maintainable.


(Log in to post comments)

code documented where it needs to be

Posted Feb 12, 2008 16:12 UTC (Tue) by rfunk (subscriber, #4054) [Link]

Reminds me of Steve Yegge's latest.

vmsplice(): the making of a local root exploit

Posted Feb 12, 2008 16:42 UTC (Tue) by utoddl (subscriber, #1232) [Link]

I basically agree with what you're saying, but I'd like to make just a couple of counter-points.
...beat it into them that useless (e.g. i++; /* increment i */) explanations are worse than none at all.
True, that's useless to you and me, but for the noob programmer who has come through elementary and high school maths, the expression i++ is not immediately obvious. By the second or third C program he should be familiar with it, certainly. But my point is that the "document what's not obvious" standard requires a judgment call where "what's obvious" varies greatly with the experience of the coder.
For example, if you work on code related to SCSI or filesytems or otherwise connected with disks, you're expected to recognise that (bytes >> 9) converts from a byte count to a sector count, since sectors are 512 bytes.
If you say so. I don't happen to work in that domain often, so it isn't obvious to me. However, a well-crafted macro BYTES2SECTORS(bytes) would give me a clue, and the magic numbers and operations on them would live in the definition of the macro, so you've got one place to maintain the conversion and it's clear to us noobs what it's related to. Again, the point isn't about this particular case, but if you can give magic numbers and operations on them a name, you can make the intent more clear (and perhaps avoid a problem, like (bytes<9)). It's already clear to somebody, but will it be clear next week, or to the next programmer who may be less experienced?

vmsplice(): the making of a local root exploit

Posted Feb 12, 2008 18:14 UTC (Tue) by iabervon (subscriber, #722) [Link]

It is useless to the noob programmer who is looking at a piece of code with hundreds of
thousands of occurrences of the increment operator if they're all commented. It's worse than
useful to that same programmer if some but not all of them are commented (are the commented
ones somehow different?). The novice should have access to a tutorial and reference guide for
the language, and that should explain the notation. Even things which are unique to a
project's style shouldn't be documented in situ at every (or any) place they're used; they
should be documented once in a central location. So the standard should really be "comment
what's unusual about this location, given the context of the rest of the project; also
document what's unusual about this project, given the language it's written in; also have
language reference works available; and certainly don't duplicate any of this information,
since it will just get out of sync."

vmsplice(): the making of a local root exploit

Posted Feb 14, 2008 6:38 UTC (Thu) by jimparis (subscriber, #38647) [Link]

> > For example, if you work on code related to SCSI or filesytems or otherwise connected with
disks, you're expected to recognise that (bytes >> 9) converts from a byte count to a sector
count, since sectors are 512 bytes.

> However, a well-crafted macro BYTES2SECTORS(bytes) would give me a clue, and the magic
numbers and operations on them would live in the definition of the macro, so you've got one
place to maintain the conversion and it's clear to us noobs what it's related to.

But in the context of the kernel, hiding things in a BYTES2SECTORS macro can be downright
dangerous.  What types does it handle for input and output?  Does it sleep or have any locking
requirements?  If I pass an expression, do I have to worry about side effects if it gets
evaluated twice?  How does it behave if bytes is not a multiple of one sector?
Is it talking about the canonical 512-byte defintion of a sector or the actual 2048 byte
sectors used on CD-ROMs?

"bytes >> 9" is absolutely clear.  I see instantly that it handles integer types and matches
the signedness.  It is free from side effects and fast.  I know that the answer is rounded
down to the nearest whole sector.  Sectors are 512 bytes.

I don't see why you would want to hide all of that information from a developer.  Your version
is only simpler at a quick glance to a casual observer, not someone who actually has to deal
with the code and what the statement actually does.

vmsplice(): the making of a local root exploit

Posted Feb 12, 2008 19:20 UTC (Tue) by bronson (subscriber, #4806) [Link]

I agree with everything you said, except...

> you're expected to recognise that (bytes >> 9) converts from a byte count to a sector count

Then why not define BYTES_TO_SECTORS(n) ((n)>>9)?  Putting this bare shift into code
introduces an informal lingo that, as it builds up, can really get in the way for new
maintainers.  True, one instance is no big deal, but trying to get up to speed on code that
has more than five or ten undocumented idioms like this is a real drag.

Even if you understand the lingo, you will likely start to think, "Ah, those are now sectors"
whenever you see >>9 in the code.  That's bad too.

vmsplice(): the making of a local root exploit

Posted Feb 13, 2008 7:27 UTC (Wed) by JoeF (subscriber, #4486) [Link]

For example, if you work on code related to SCSI or filesytems or otherwise connected with disks, you're expected to recognise that (bytes >> 9) converts from a byte count to a sector count, since sectors are 512 bytes.

Hmm, no. The use of magic numbers is something I beat out of undergrads. Use a define, or a macro.
When I do code reviews at my job, I always latch onto the use of magic numbers.
And, your example is flawed, anyway. While hardware sectors may be 512 bytes (currently), filesystems usually deal with larger blocks.

vmsplice(): the making of a local root exploit

Posted Feb 14, 2008 6:40 UTC (Thu) by jimparis (subscriber, #38647) [Link]

> While hardware sectors may be 512 bytes (currently), filesystems usually deal with larger
blocks.

All the more reason to be just specify it clearly at the point of use rather than hiding it
away in a #define!  See my response above too.

vmsplice(): the making of a local root exploit

Posted Feb 14, 2008 19:44 UTC (Thu) by JoeF (subscriber, #4486) [Link]

You either need to add a comment, or use a define.
bytes >> 9 without a clarifier is something that should never be in production-quality code.
Magic numbers should never be in code without a clear explanation what they mean.
If you don't want a macro that can hide things, write something like bytes >> SECTOR_SHIFT.

vmsplice(): the making of a local root exploit

Posted Feb 14, 2008 20:42 UTC (Thu) by jimparis (subscriber, #38647) [Link]

> You either need to add a comment, or use a define.
bytes >> 9 without a clarifier is something that should never be in production-quality code.
Magic numbers should never be in code without a clear explanation what they mean.  If you don't
want a macro that can hide things, write something like bytes >> SECTOR_SHIFT.

If one doesn't want to hide things, hide it inside "SECTOR_SHIFT"?
I'm sorry, I remain unconvinced.
Using ">> 9" is essentially a comment saying "I'm converting this to a 512-byte sector count".
It's such a fundamentally basic operation that anyone working with filesystem or disk code
would understand it immediately.

Abstracting common constants is most useful when they might change, and they're just arbitrary
to begin with -- HZ for example.  Or things that actually have no real meaning, like magic
numbers that identify a filesystem.  For a number that has a real, well-known meaning, and for
which changing would involve huge logic changes in the code, I think macros like that are just
extra levels of obfuscation.

vmsplice(): the making of a local root exploit

Posted Feb 15, 2008 6:54 UTC (Fri) by JoeF (subscriber, #4486) [Link]

Using ">> 9" is essentially a comment saying "I'm converting this to a 512-byte sector count".

No. It only says that you shift a value by 9 bits to the right.
It may say to you that you are converting a value to a 512-byte sector count. It does not necessarily say that to others.

It's such a fundamentally basic operation that anyone working with filesystem or disk code would understand it immediately.

That mindset is what results in a lot of the flaws in all kinds of code. There will always be somebody working on the code for whom it isn't obvious. I venture that if you can't see that, you must still be in your first job and haven't yet had the task to maintain somebody else's code. I can tell you from (painful) experience that this kind of stuff is among the worst stuff out there.
">>9" makes implicit assumptions, and implicit assumptions, even if they may seem reasonable to the original author, are often not obvious to maintainers, possibly years down the road.
Oh, and just in case, I am writing filesystem code. I would never even get the idea to write something like ">>9" without a comment, or better, in a macro.

vmsplice(): the making of a local root exploit

Posted Feb 15, 2008 7:43 UTC (Fri) by jimparis (subscriber, #38647) [Link]

> I venture that if you can't see that, you must still be in your
> first job and haven't yet had the task to maintain somebody else's code.

And I venture that, in my experience (which is not as limited as you impolitely presume),
"somebody else's code" is a whole lot more difficult to work with when I have to dig through
arbitrary levels of opaque macros to figure out what they're really trying to do.

Clearly we have a difference in opinion.  I prefer in code that is written clearly enough to
be self-explanatory.  Consider:
  inode->i_blocks = bytes >> 9;
vs:
  inode->i_blocks = BYTES_TO_BLOCKS(bytes);

If you maintain that the second version conveys more information than the first, then I'm
afraid we will just have to disagree.

vmsplice(): the making of a local root exploit

Posted Feb 16, 2008 1:52 UTC (Sat) by dododge (subscriber, #2870) [Link]

> Consider:
>   inode->i_blocks = bytes >> 9;
> vs:
>   inode->i_blocks = BYTES_TO_BLOCKS(bytes);

It's a bit of an unfair example, though, because you're computing
against a value called "bytes" and assigning it to something called
"blocks".  You've put enough context around the expression to make
it clear what the shift is trying to accomplish.

The problem is when someone assumes ">> 9" is inherently
self-documenting and throws it into the middle of a much more
complex statement.  Consider:

  process_frag_2(sig,(get_ent(curr) >> 9) + 2,HEX_ENCODE);
vs:
  process_frag_2(sig,BYTES_TO_BLOCKS(get_ent(curr)) + 2,HEX_ENCODE);

When I'm reading code, I'd much rather see the latter.  It doesn't
just tell me why the shift is being done; it even adds useful
information about the APIs for get_ent() and process_frag_2().





vmsplice(): the making of a local root exploit

Posted Feb 16, 2008 17:13 UTC (Sat) by bronson (subscriber, #4806) [Link]

I guess we will disagree.  Your way is still hostile to new maintainers because it carries so
much implicit information.  What if your code code has this?

   inode->i_blocks = bytes >> 8;

What is a new maintainer to think?  Even if he does recognize that this expression is
different from the others, he's still confused.  Did you auto-optimize the expression (bytes
>> 9)*2?  Is it typoed or a thinko?

Will every potentially confusing use of the shift operator in your code include a comment
stating the author's intention?  If so, then will that convention still be true once someone
else has modified your code?

Also, you've given yourself a rather heavy restriction on variable naming haven't you?  Will
all your variable names really contain "blocks" and "bytes" as appropriate?  You'll probably
want to add this restriction as a comment to each file to which it applies or it won't last
long once other people start committing patches to your code!

Finally, your technique only works for trivial expressions like assignment.  What happens if
you need to actually perform a calculation?

A BYTES_TO_BLOCKS macro solves all these problems by making the conversion explicit.  Implicit
rules and unwritten conventions always make life hard for new maintainers.  Always.

vmsplice(): the making of a local root exploit

Posted Jan 11, 2009 4:06 UTC (Sun) by dibbles_you (guest, #45004) [Link]

cd /to/linux/kernel
# find . -name "*.c" | xargs grep ">>" | awk -F ">>" '{ for(n=2;n<=NF;n++) { $c=$n; sub(/^=/,"",$c); sub(/ /,"",$c); gsub(/\(/,"",$c); gsub(/\)/,"",$c); $c=substr($c,1,1); if ($c>='0' && $c<='9') print "number"; else print "text"; } }' > a
# cat a| grep text | wc -l
# cat a| grep number | wc -l

number=19877
text=4936

so it would seem the linux kernel believes numbers are usually clearer for others to read and understand.

if you believe otherwise submit a patch to the kernel removing all occurrences and replacing them with meaningful #defines.

#define 'ing all numbers is stupid, using identifiers that describe what they are should allow anybody to understand their context, if you use lots of operators and numbers in a single expression split them up to use multiple variables with appropriate names, if they are in conditions use multiple conditions with comments if necessary. The compiler will optimize all these away and make your code more readable without have 15 editors open or using code navigators to work out that the actual value of CLUSTERS_DIVIDE_BY_BYTES_IN_SECTORS_MULTIPLIED_BY_PI(x) is 2.

PS. the script is an approximation.


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds