User: Password:
|
|
Subscribe / Log in / New account

Quotes of the week

I've seen programs that end up swapping bytes two, three, even four times as layers of software grapple over byte order. In fact, byte-swapping is the surest indicator the programmer doesn't understand how byte order works.
-- Rob Pike

Try to imagine yourself in the IPMC, being asked to vote for the release of [Apache OpenOffice] 3.4. You want to make sure the release follows Apache policies and guidelines. You want to protect the ASF. You want to ensure that users, including developers using our source code packages, get the greatest benefit from the release. But you are faced with a 10 million line code project, larger and more complex than anything you've faced before at Apache.

What do you do? Where do you start?

Honestly, I have absolutely no idea.

-- Rob Weir

Regular ls output, tuned as it was for 9600 baud terminals or so, is really too verbose for modern media such as twitter and cell phones. This new output format, enabled by the -j switch (or --format=jam, but you don't want to type all that on a cell phone!), brings ls into the 21st century with an appropriate level of conciseness.
-- Joey Hess
(Log in to post comments)

Quotes of the week

Posted Apr 5, 2012 11:31 UTC (Thu) by juliank (subscriber, #45896) [Link]

Rob Pike is absolutely right. Swapping bytes in C code is just wrong in almost all cases (probably in all cases). If you use clang and want the slightly faster code using byteswapping to be created, you can just wrap a n-byte char array in a union with an n-byte integer and then extract the characters as Rob described. Clang will automatically transform this into a byte swap instruction or a move instruction. This even works if you use a loop (to create more generic code). Here's also some example code showing this:
typedef union {
    uint32_t value;
    unsigned char buf[4];
} uint32_le_t;

uint32_t read32le(uint32_le_t in)
{
    uint32_t out = 0;
    for (unsigned int i = 0; i < sizeof(in); i++)
        out |= (uint32_t) in.buf[i] << (8 * i);
    return out;
}
For those who like C++, a template for reading integers in a specific byte order (byte order decided by the template parameter 'little'):
template<typename T, bool little> struct jak__byte_swapping_ {

    inline jak__byte_swapping_(T in)  {
        u.value = in;
        u.value = read(u);
    }
    inline operator T () const {
        return read(u);
    }

private:
    union u_type {
        T value;
        unsigned char buf[sizeof(T)];
    } u;

    static inline T read(const u_type in) {
        T out = 0;
        if (little) {
            for (unsigned int i = 0; i < sizeof(in); i++)
                out |= (T) in.buf[i] << (8 * i);
        } else {
            for (unsigned int i = 0; i < sizeof(in); i++)
                out |= (T) in.buf[sizeof(in) - 1 - i] << (8 * i);
        }
        return out;
    }
};
It obviously misses a constructor for dealing with the specified endianness data (currently it takes native integers), but that could be added. In it's current form, it is mostly useful to use as a struct member to describe a binary format (you read the file into the struct and then you can just treat the specified-endianness integer like a native one).

Quotes of the week

Posted Apr 5, 2012 17:16 UTC (Thu) by khim (subscriber, #9252) [Link]

Sadly GCC is more common then Clang and while it knows how to produce mov in the case where native order and swapped order are the same it creates a lot of code when they are different.

And there are other popular systems and other popular compiles thus I doubt ntohX functions (which are producing fast code with ages-old GCC) will be retired any time soon.

The correct way was to first make sure the code produced is fast and then start advocating it. After all if someone does not care about speed then there are plenty of other languages besides C/C++.

Quotes of the week

Posted Apr 5, 2012 19:20 UTC (Thu) by RobSeace (subscriber, #4435) [Link]

> Swapping bytes in C code is just wrong in almost all cases (probably in all cases).

Well, those of us writing lots of socket-related code are pretty well stuck with byte-swapping (or doing nothing on big-endian systems) via {n,h}to{h,n}{s,l}() for many uses... If you want to pull a port# from a sockaddr_{in,in6}, you pretty much have to ntohs() it to work with it in any sane manner... If you want to set one, you have to htons() it... If you want to set the raw sockaddr_in.sin_addr.s_addr to one of the INADDR_* values (like INADDR_LOOPBACK), you need to htonl() them... Etc...

Now, you could argue that having those sockaddr_* fields be in network byte order was a mistake, and I'd probably agree... But, there's nothing much that can be done about it but live with it at this point... (And, thankfully getaddrinfo() makes it less necessary to poke with the sockaddr_* internals these days, anyway...)

Quotes of the week

Posted Apr 5, 2012 19:47 UTC (Thu) by samroberts (guest, #46749) [Link]

I think you missed Rob's point, the ntoh*() functions are the equivalent of the code he posted. You never need to know or check your host's endianness when calling them, if you are doing:

#ifdef host_is_little_endian
i = ntohl(i)
#endif

you are doing it wrong. That the libc implementation of them might do some cool thing depending on the host byte order, but that is its business.

Quotes of the week

Posted Apr 5, 2012 19:53 UTC (Thu) by RobSeace (subscriber, #4435) [Link]

> I think you missed Rob's point, the ntoh*() functions are the equivalent of the code he posted.

Well, not really, since internally they do byte-swap (if you're on a little-endian host)... They are just abstracting away the ugly #ifdef behavior out of your code directly and hiding it in libc... It doesn't really change its wrongness...

> #ifdef host_is_little_endian
> i = ntohl(i)
> #endif

I've never seen anyone do anything that nutty... It's basically just duplicating exactly what ntohl() already does for you!

Quotes of the week

Posted Apr 5, 2012 20:30 UTC (Thu) by samroberts (guest, #46749) [Link]

> They are just abstracting away the ugly #ifdef

Again, Rob's point was that writing code conditional on the *host's* byte-order is unnecessary for dealing with external data.

libc doesn't necessarily have an ifdef, ntoh* can be written in portable code with no ifdef. Plan9's libc, according to Rob, had no such ifdef in it.

> I've never seen anyone do anything that nutty...

https://github.com/sam-github/libnet/blob/master/libnet/s...

Now you have. And no, I didn't write that.

Quotes of the week

Posted Apr 5, 2012 21:00 UTC (Thu) by RobSeace (subscriber, #4435) [Link]

> libc doesn't necessarily have an ifdef, ntoh* can be written in portable
> code with no ifdef. Plan9's libc, according to Rob, had no such ifdef in it.

I suppose so... But, glibc at least does this:

uint16_t
htons (x)
uint16_t x;
{
#if BYTE_ORDER == BIG_ENDIAN
return x;
#elif BYTE_ORDER == LITTLE_ENDIAN
return __bswap_16 (x);
#else
# error "What kind of system is this?"
#endif
}

And, then defines ntohs() as the same thing... And, htonl()/ntohl() is the same but with __bswap_32() of course...

Which I believe is exactly what he was talking about... So, like I said, you're merely abstracting away the ugliness... (Which isn't a bad thing in any way! If you need to have ugliness, better to hide it away...)

> Now you have.

Ok, that's just stupid... And, unless I'm missing something, it does absolutely nothing... Because on big-endian, htons() is effectively a no-op function... So, it seems to me it's doing the same direct assignment on both big and little-endian systems... Perhaps they wanted the reverse test? Which would then be merely redundant (since htons() effectively does it for you already) rather than flat out insane...

Anyway, that seems less like someone doing byte-swapping when not needed, but more just someone fundamentally misunderstanding how htons() works... Maybe they thought it was always a byte-swap routine, and didn't realize that on big-endian systems it's going to be a no-op? That would make their code make some sense (they're trying to force it to little-endian always)...

Quotes of the week

Posted Apr 5, 2012 23:30 UTC (Thu) by viro (subscriber, #7872) [Link]

Egads... OK, here's the deal: anybody thinking in terms of "swapped" and "native" is bloody wrong. Doing it that way is *the* recipe for hard to find bugs. There are 3 things: native, little-endian and big-endian. That's what one ought to keep track of, not "have I done byteswap odd or even number of times". Forget about the fact that conversions are involutions. Sure, on any realistic platform it will be true that htonl(x) is always equal to ntohl(x). Which is not interesting for anything other than low-level implementation of those primitives. bswap32 is the wrong thing to do outside of the internals of their implementation.

BTW, code doing n = ntohl(n) is asking for trouble, period. Don't do that - if I ever see that kind of crap in patches touching fs/*, you can be sure that you'll get it back with unkind comments. Don't reuse variables between e.g. native and big-endian; sooner or later you'll get confused and do something really dumb like mismatched number of flips on different paths. And that's not just a theory - I've seen quite a few bugs of that kind.

uint32_t is _not_ "one of {__le32,__be32}, depending on host endianness". It's a separate type and should be treated as such. If the variable can be reused, leave doing that to compiler.

Having the same struct used both with host- and fixed-endian contents is a really bad practice; better split it in two types and take care about which one is getting used in any particular place. And no, "I'll just convert these fields in place here" is *not* a good idea. Been there, fixed quite a few of resulting bugs. Hadn't been fun at all...

Quotes of the week

Posted Apr 6, 2012 0:51 UTC (Fri) by RobSeace (subscriber, #4435) [Link]

Did you mean to reply to my comment? Because I agree with everything you said... If I gave any impression otherwise, I apologize... But, I'm confused as to where I gave that impression...

Quotes of the week

Posted Apr 5, 2012 14:15 UTC (Thu) by patrick_g (subscriber, #44470) [Link]

>You want to ensure that users (..) get the greatest benefit from the release (..) What do you do? Where do you start?

You shut down Apache OpenOffice and you move on LibreOffice.

Quotes of the week

Posted Apr 7, 2012 22:54 UTC (Sat) by man_ls (guest, #15091) [Link]

Nice way to undo that Gordian knot. Not sure if that option is available to the original poster -- he had in mind a slightly more bureaucratic route, as evidenced in this further quote:
But I think it is in our best interest as a PPMC to make our AOO 3.4 Release Candidate easy to review for the IPMC.
So instead of dumping AOO he decided to create a document. Yay!

Quotes of the week

Posted Apr 5, 2012 20:13 UTC (Thu) by smurf (subscriber, #17840) [Link]

> the -j switch

Nice. Now only -z is missing.
I propose using that for NUL-terminating output lines (instead of \n).

This would actually an option that's rather useful; "ls -1z | xargs -0 ..." is a whole lot more understandable than "find . -mindepth 1 -maxdepth 1 -name .\* -o -print0 | ..."

Adding -0 as an alias for -z is left as an exercise. :-/

Quotes of the week

Posted Apr 6, 2012 8:34 UTC (Fri) by rvfh (subscriber, #31018) [Link]

> Date: Sat, 31 Mar 2012 18:13:32 -0400

Shame he didn't wait a few hours to post his message, it would have made a nice April fool :)

Quotes of the week

Posted Apr 10, 2012 15:01 UTC (Tue) by jwakely (guest, #60262) [Link]

the bug report was one of three done to setup the April Fool post: http://kitenet.net/~joey/blog/entry/ls:_the_missing_options/

Quotes of the week

Posted Apr 13, 2012 21:51 UTC (Fri) by njs (guest, #40338) [Link]

I was happier before I clicked around there and learned that Rob Pike is pro-software-patents :-(


Copyright © 2012, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds