LWN.net Logo

Hmm... It's usually hard to understand the code from diff...

Hmm... It's usually hard to understand the code from diff...

Posted Dec 7, 2011 8:39 UTC (Wed) by khim (subscriber, #9252)
In reply to: Fedora security update to nginx by tpo
Parent article: Fedora security update to nginx

Have you tried to look on the whole ngx_resolver_copy function? Looks kind of simple if you know how compression pointers are used in DNS response. It's described in RFC 1035 (look for string "4.1.4. Message compression").

Not really sure what you've tried to say here... and how can one make it more readable then that...


(Log in to post comments)

trying to work out some problem points

Posted Dec 7, 2011 9:29 UTC (Wed) by tpo (subscriber, #25713) [Link]

> Hmm... It's usually hard to understand the code from diff...

True. That doesn't exclude the possibility to do better however.

> Have you tried to look on the whole ngx_resolver_copy function?

Not until now.

> Looks
> kind of simple if you know how compression pointers are used in DNS
> response. It's described in RFC 1035 (look for string "4.1.4. Message
> compression").
>
> Not really sure what you've tried to say here... and how can one make it
> more readable then that...

First adding the same comment you just gave here, would help the reader to understand which algorithm exactly the code is trying to implement.

Second one classic programming sin is to use uncommented, not symbolic magic numbers. Instead the reader should be supported with some kind of hint about what those "0xc0" and "0x3f" constants mean. Be that a previous commented #define or a comment/explanation on the right of the code or something other.

Can you see the problem I'm trying to point to?

trying to work out some problem points

Posted Dec 7, 2011 11:07 UTC (Wed) by khim (subscriber, #9252) [Link]

> Have you tried to look on the whole ngx_resolver_copy function?

Not until now.

So you want not just understand the code easily but to understand some random piece of it out of context. This is tall order: I'm not sure anything is written to this ideal. If you pull some random passage from the book then you'll hit the same problem even if book is obviously written to be read...

First adding the same comment you just gave here, would help the reader to understand which algorithm exactly the code is trying to implement.

Are you sure? This is change in file ngx_resolver.c and it fixes compression pointer processing. This either gives you enough background or sends you to google which gives you enough context...

Second one classic programming sin is to use uncommented, not symbolic magic numbers. Instead the reader should be supported with some kind of hint about what those "0xc0" and "0x3f" constants mean.

Did you meant to say "reader should be confused with some kind of define" ? Take a look on original code: it compares n to "0xc0" and of n is equal to "0xc0" then it takes low 6 bits out of it and... does something. These bits are quite obviously zeros so the whole thing makes no sense. Replacement code looks quote obviously correct: you check top two bits with "n & 0xc0" and is they are set it checks the low six bits. Well, this feels kind of suspicious: there are four possibilities and we lump three of them together. This was the only strange and suspicious part of code but then if you've read the RFC and know that these three possibilities usually considered equal (which you should probably do before you'll try to understand what goes on here), then it's obvious, too.

Be that a previous commented #define or a comment/explanation on the right of the code or something other.

Well, I sure will appreciate comment "01 and 10 are reserved and should not be used but we'll treat them as 11 for now" near if, but I'm not sure it's all that big of a deal.

I think it's the same thing as what was already discussed: if writer assumes fluency in C (and C is better IMO when you describe bit manipulations) and reader is not fluent in C (so it looks like hexdump to him) then it's really hard to reconcile these POVs.

I think we just live in different worlds. When I want to understand what goes on in code like this one but peppered with tons of defines the first thing I do is "gcc -E" to strip out all that scrap away: this gives me easy-to-understand (if sometimes overly verbose) code to read (does not work with C++ and it's inline functions, grrr). I doubt you ever do that...

Can you see the problem I'm trying to point to?

Well, sure: you are no fluent enough in C. Got that. But try to understand the other side, too! This code is obviously written by person who's VERY fluent in C and not all that fluent in English (this does not follow, but I knew the author personally so know it's true). Of course he'll try to write clean C code which requires as few English comments as possible. IMNSHO he more-or-less succeeded here so I don't see what's your problem.

trying to work out some problem points

Posted Dec 7, 2011 11:46 UTC (Wed) by johill (subscriber, #25196) [Link]

I'm quite fluent in C, but still the code is puzzling. You're confusing "fluent in C" and "fluent in DNS" and assuming that everybody who reads that code must understand the DNS protocol with all its bits in its entirety, and completely know what each bit means.

Take an example from a different domain, and consider this code I'm writing just now:

       switch (mgmt->u.action.category) {
       case WLAN_CATEGORY_HT:
...
               /* verify action & smps_control are present */
               if (len < IEEE80211_MIN_ACTION_SIZE + 2)
                       goto invalid;

               switch (mgmt->u.action.u.ht_smps.action) {
               case WLAN_HT_ACTION_SMPS: {
It seems like you're suggesting it is equivalent to:
       switch (*(frame + 24)) {
       case 7:
...
               if (len < 27)
                       goto invalid;

               switch (*(frame + 25)) {
               case 1: {
which is true on a binary/compiler level (there's a good chance it would compile to the same binary unless I made some small errors in my manual replacement), but not for people.

When I said "fluent in C", I meant "fluent in C", sorry...

Posted Dec 7, 2011 13:20 UTC (Wed) by khim (subscriber, #9252) [Link]

You're confusing "fluent in C" and "fluent in DNS" and assuming that everybody who reads that code must understand the DNS protocol with all its bits in its entirety, and completely know what each bit means.

I assume nothing of the sort. I knew very little about DNS protocol. In fact I've read C code first (it was quite simple, really), got confused by one small thing (why do we ignore top two bits totally?), and only then opened the RFC to see what exact structure we are parsing here.

It seems like you're suggesting it is equivalent to:

       switch (*(frame + 24)) {
       case 7:
...
               if (len < 27)
                       goto invalid;

               switch (*(frame + 25)) {
               case 1: {

The code you've showed includes "random numbers" which have no internal logic. It's easy to imagine different RFC where IEEE80211_MIN_ACTION_SIZE will be 15 or 30 and where WLAN_CATEGORY_HT will be equal to 10. Worse: it's hard to imagine that you implement your logic using the textual description which refer to "24th byte" or "7th category"!

But the code which started the discussion is radially different. It implements process which description uses terms like the first two bits of the octet (which is quite literally "c & 0xc0" in C) and the remaining six bits (which, again, is "c & 0x3f" in C).

If, indeed, the canonical description in your case includes 7th category and not the "WLAN HT category" (or something like that) then yes, I think it's better to keep "magical numbers" in your code: it'll be easier to understand then try to constantly convert named constants to "magic numbers" and back ("gcc -E" can be a big help here as I've noted). Except 27. I think "25 + 2" will be more readable even in this case...

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds