Fixing the unfixable autofs ABI

By Jonathan Corbet
April 30, 2012

One of the few hard rules of kernel development is that breaking the user-space binary interface is not acceptable. If there is user-space code that depends on specific behavior, that behavior must be maintained regardless of how inconvenient that may be. But what is to be done if two different programs depend on mutually-incompatible behaviors, so that it is seemingly impossible to keep them both working? The answer may be to violate another rule by putting an ugly hack into the kernel—or to do something rather more tricky.

The "autofs" protocol is used to communicate between the kernel and an automounter daemon. It allows the automounter to set up special virtual filesystems that, when referenced by user space, can be replaced by a remote-mounted real filesystem. Much of this protocol is implemented with ioctl() calls on a special autofs device, but it also makes use of pipes between the kernel and user space when specific filesystems are mounted.

This protocol is certainly part of the kernel ABI, so its components have been defined with some care. One of the key elements of the autofs protocol is the autofs_v5_packet structure, which is sent from the kernel to user space via a pipe; it is used, among other things, to report that a filesystem has been idle for some time and should be unmounted. This structure looks like:

    struct autofs_v5_packet {
	struct autofs_packet_hdr hdr;
	autofs_wqt_t wait_queue_token;
	__u32 dev;
	__u64 ino;
	__u32 uid;
	__u32 gid;
	__u32 pid;
	__u32 tgid;
	__u32 len;
	char name[NAME_MAX+1];
    };

The size of every field is precisely defined, so this structure should look the same on both 32- and 64-bit systems. And it does, except for one tiny little problem. The size of the structure as defined is 300 bytes, which is not divisible by eight. So if two of these structures were to be placed contiguously in memory, the 64-bit ino field would have to be misaligned in one of them. To avoid this problem, the compiler will, on 64-bit systems, round the size of the structure up to a multiple of eight, adding four bytes of padding at the end. So sizeof() on struct autofs_v5_packet will return 300 on a 32-bit system, and 304 on a 64-bit system.

That disparity is not a problem most of the time, but there is an exception. Automounting is one of the many tasks being assimilated by the systemd daemon. When systemd reads one of the above structures from the kernel, it checks the size of what it read against its idea of the size of the structure to ensure that everything is operating as it should be. That check works just fine, as long as systemd and the kernel agree on that size. And normally they do, but there is an exception: if systemd is running as a 32-bit process on a 64-bit kernel, it will get a 304-byte structure when it is expecting 300 bytes. At that point, systemd concludes that something has gone wrong and gives up.

In February, Ian Kent merged a patch to deal with this problem. One could be forgiven for calling the solution hacky: on 64-bit systems, the kernel's automount code will subtract four from the size of that structure if (and only if) it is talking with a user-space client running in 32-bit mode. This patch makes systemd work in this situation; it was merged for 3.3-rc5 and fast-tracked into the various stable kernel releases. Everybody then lived happily ever after.

...except they didn't. It seems that the automount program from the autofs-tools package, which is still in use on a great many systems, had run into this problem a number of years ago. At that time, the autofs-tools developers decided to work around the problem in user space. So, if automount determines that it is running in 32-bit mode on a 64-bit kernel (Linus has little respect for how that determination is done, incidentally), it will correct its idea of what the structure size should be. If the kernel messes with that size, the automount "fix" no longer works, so Ian's patch fixes systemd at the cost of breaking automount.

So we are now in a situation where two deployed programs have different ideas of how the autofs protocol should work. On pure 32- or 64-bit systems, both programs work just fine, but, depending on which kernel is being run, one or the other of the two will break in the 32-on-64 configuration. If Ian's patch remains, some users will be most unhappy, but reverting it will upset other users. It is, in other words, a somewhat unfortunate situation.

Unfortunate, but not necessarily unrecoverable. One possible way to fix things can be seen in this patch from Michael Tokarev. In short, this patch looks at the name of the current command (current->comm) and compares it against "automount". If the currently-running program is called "automount," the structure-size tweak is not applied and things work again. For any other program (including systemd), the previous fix remains. So things are fixed at the expense of having the kernel ABI depend on the name of the running program. At best, this solution can be described as "inelegant." At worst, there may be some other, unknown program with a different name that breaks in the same way automount does; any such program will remain broken with this fix in place.

Still, Linus has conceded that "it's probably what we have to go with". But he preferred to look for a less kludgy and more robust solution. One possibility was for the kernel to look at the size of the read() operation that would obtain the autofs_v5_packet structure prior to writing that structure; if that size is either 300 or 304, the kernel could give the calling program the size it is expecting. The problem here is that the read() operation is hidden behind the pipe, so the autofs code does not actually have access to the size of the buffer provided by user space.

So Linus came up with a different solution, the concept of "packetized pipes". A packetized pipe resembles the normal kind with a couple of exceptions: each write() is kept in a separate buffer, and a read() consumes an entire buffer, even if the size of the read is smaller than the amount of data in the buffer. With a packetized pipe, the kernel can always just write the larger (304-byte) structure size; if user space is only trying to read 300 bytes, then it will get what it expects and be happy. So there is no need for special hacks in the kernel, just a slightly different type of pipe dynamics. Following a suggestion from Alan Cox, Linus made an open with O_DIRECT turn on the packetized behavior, so user space can create such pipes if need be.

After a couple of false starts, Linus got this patch working and merged it just prior to the 3.4-rc5 release. So the 3.4 kernel should work fine for either automount or systemd.

The kernel community got a bit lucky here; it was possible for a suitably clever and motivated developer to figure out a way to give both programs what they expect and make the system work for everybody. The next time this kind of problem arises, the solution may not be so simple. Maintaining ABI stability is not always easy or fun, but it is necessary to keep the system viable in the long term.

Index entries for this article
Kernel	Development model/User-space ABI
Kernel	Filesystems/autofs

Fixing the unfixable autofs ABI

Posted Apr 30, 2012 17:39 UTC (Mon) by kilpatds (subscriber, #29339) [Link] (9 responses)

I have to enquire: why not use the "packed" attribute? The pipe-hack would still have to be used to keep backwards compatibility, but declaring the structure to be packed would then be the (IMHO: obviously correct) way of keeping future users from hitting the issue.

Packed

Posted Apr 30, 2012 17:44 UTC (Mon) by corbet (editor, #1) [Link]

Because it's way too late to add "packed," that would have done nothing to fix the current users.

There has been some talk in the discussion of a version 6 of the autofs protocol that would address a number of shortcomings, but there's no particular hurry to get that into place.

Fixing the unfixable autofs ABI

Posted May 3, 2012 14:18 UTC (Thu) by stevem (subscriber, #1512) [Link] (7 responses)

Or even better: don't define structs in this way in the first place - keep them automatically naturally aligned... :-(

Fixing the unfixable autofs ABI

Posted May 3, 2012 15:12 UTC (Thu) by jzbiciak (guest, #5246) [Link] (6 responses)

The twist here is that the i386 ABI only requires 4-byte alignment for 64-bit types, which is why the structure size differs. Since i386 only supported accesses up to 32 bits, this seemed "natural" at the time.

So, what do you define as natural alignment?

Fixing the unfixable autofs ABI

Posted May 3, 2012 15:28 UTC (Thu) by stevem (subscriber, #1512) [Link] (4 responses)

I've always tried to arrange structs in decreasing order of size to do that:

64-bit types
32-bit types
16
8

and it's worked for me. It *might* not be the best layout for caching performance, but for API structures like ioctls I think it's a clean way to go.

Alternatively, add explicit padding in the structures to make them alignment-neutral. Much less clean, but better than having differently-sized structs depending on architecture.

Fixing the unfixable autofs ABI

Posted May 3, 2012 15:41 UTC (Thu) by jzbiciak (guest, #5246) [Link] (3 responses)

That doesn't fix this structure, as I understand it. The total size of the elements is 300 bytes, but the whole struct must be aligned to 8 bytes on 64 byte architectures, but only needs alignment 4 on the i386 ABI.

Consider this much simpler structure:

struct broken
{
    __u64 f1;
    __u32 f2;
};

On the i386 ABI, sizeof(struct broken) is 12. On the x86-64 ABI, it's 16.

Fixing the unfixable autofs ABI

Posted May 3, 2012 15:50 UTC (Thu) by stevem (subscriber, #1512) [Link]

Ah, yes. :-(

Fixing the unfixable autofs ABI

Posted May 3, 2012 18:46 UTC (Thu) by nybble41 (subscriber, #55106) [Link] (1 responses)

Yes, but if you add padding fields you can ensure the same alignment on all reasonable architectures:

struct fixed
{
    __u64 f1;
    __u32 f2;
    __u32 pad1;
};

This way sizeof(struct fixed) is 16 bytes on i386 and x86-64. Of course, that doesn't help with the structure from this article, because the smaller i386 alignment was written into the ABI. IMHO, any structures exposed through the ABI should be naturally aligned and marked as "packed", with explicit padding fields where necessary; alignment should not be left to the compiler where ABI compatibility is a concern.

Fixing the unfixable autofs ABI

Posted May 3, 2012 19:10 UTC (Thu) by jzbiciak (guest, #5246) [Link]

That would generally be the best idea, but as you stated, it doesn't fix this structure, because the i386 ABI doesn't require natural alignment for 64-bit types, as per the definition you gave in your other comment. The reason I asked is that i386's ABI is perverse in requiring natural alignment for some, but not all data types. So, I wasn't sure if the concept of "natural" had different meaning in the context of i386's ABI.

IMHO, any structures exposed through the ABI should be naturally aligned and marked as "packed", with explicit padding fields where necessary; alignment should not be left to the compiler where ABI compatibility is a concern.

I tend to agree, and do this myself when I care about the actual layout of a struct. Ok, I usually forego 'packed' since it isn't portable. I instead make the struct a multiple of the largest data type, and force fields onto natural boundaries for their types.

This practice is especially necessary when you have any "wire formats" or "on disk format." You can almost think of packets crossing the kernel/user-space boundary as having "wire format" semantics in this case, with the minor difference that pointers are more meaningful in user-space exchanges than they usually are over, say, a TCP link.

Fixing the unfixable autofs ABI

Posted May 3, 2012 18:30 UTC (Thu) by nybble41 (subscriber, #55106) [Link]

> So, what do you define as natural alignment?

"Natural alignment" is a term of art meaning that the address (or in this case, offset) of the data is a multiple of its size. It can be used independently of the target, and for any size, though it's usually applied to powers of two. For example, on most platforms with multiple page sizes, pages must be naturally aligned; you can't construct a 1MB hugepage from an arbitrary span of 256 4kB pages, but instead have to use a region aligned to a multiple of 1MB.

In this case it would mean aligning 64-bit fields to an offset which is a multiple of eight bytes, even on i386 where only 32-bit alignment is required.

Fixing broken userspace is NOT the kernel's job

Posted Apr 30, 2012 17:46 UTC (Mon) by pr1268 (guest, #24648) [Link] (10 responses)

I think autofs and systemd should get their s**t together. While I think it's proper and appropriate for the kernel to avoid breaking userspace, I firmly believe it's also not the kernel's job to fix broken userspace!

</rant>

Fixing broken userspace is NOT the kernel's job

Posted Apr 30, 2012 18:19 UTC (Mon) by drag (guest, #31333) [Link] (5 responses)

That's like telling somebody that just fell down the stairs and broke their arm that they were stupid for slipping and they just need to walk back to the top the of the stairs and do it right this time. It's all fine and dandy, but it doesn't do anything to fix the current problem.

Plus I don't understand why it's so terrible that systemd checks to make sure that the data it's getting from the kernel is the correct size.

Magic numbers

Posted Apr 30, 2012 21:16 UTC (Mon) by man_ls (guest, #15091) [Link] (3 responses)

But why check only for exactly 300 bytes? It would be understandable if systemd would check that the size of the data is >=300. That way there would be room to add padding or even new fields at the end, if needed.

Magic numbers

Posted Apr 30, 2012 21:19 UTC (Mon) by felixfix (subscriber, #242) [Link]

Might be just general paranoia, to catch changes it doesn't know about. It won't catch all since there's no check for the layout of the data, but if someone adds or removes a member, or changes the size of some, it has a chance of noticing.

Magic numbers

Posted May 1, 2012 8:03 UTC (Tue) by mjt (guest, #1515) [Link] (1 responses)

It is not a magic number, it is sizeof(kernel packet), which turned out to be different on 32bit and 64bit. It is the unit of data kernel exchanges data with usespace, it is the protocol definition. The protocol itself wasn't designed properly, that's the whole issue.

Now, reading just 300 bytes (size of that packet on 32bit userspace) instead of 304 bytes which a 64bit kernel sent to userspace does the trick, but the pipe buffer now holds 4 bytes, which will be read by userspace as a NEW packet. Even if new packet actually arrives, for the userspace the result will be complete garbage, since the packet boundary got lost.

At any rate, no userspace trick will fix already released versions of userspace applications, which is the whole point. Yes it could be made differently, but what's done is done and we don't want to break it.

Magic numbers

Posted May 1, 2012 18:34 UTC (Tue) by dlang (guest, #313) [Link]

with the new packetized interface if you read 300 bytes, the other 4 bytes are forever lost, so the next read will get the correct packet boundary.

Fixing broken userspace is NOT the kernel's job

Posted May 4, 2012 17:44 UTC (Fri) by giraffedata (guest, #1954) [Link]

That's the wrong analogy for this debate. I'm sure you can think of a 100 cases where users got burned by a design decision in a user space program 5 years ago, and a kernel change today could compensate for it, but the consensus is the users should suffer.

Here's a closer analogy: John has 12 steps in his house. He likes to walk down them with his eyes closed and count the steps. Mary invites John over to her house and he resists, saying "I know where everything is in my house and I'm afraid I would get lost in your house." Mary assures him that her house is just like his in every relevant detail, and on that basis John accepts. But Mary has 13 steps and John, walking down with his eyes closed and counting to 12, falls and breaks his arm. Should Mary pay for his medical bills?

Did Mary break her promise? Is having one more step in the staircase is a relevant detail? Is a person entitled to walk down stairs with his eyes closed?

We typically say you can move a program from a 32 bit computer to a 64 bit one without change because the environment is the same in every relevant detail. We also say you can move a program from kernel release N to N+1 for the same reason. Automount and Systemd users are relying on some interpretation of those promises.

Fixing broken userspace is NOT the kernel's job

Posted Apr 30, 2012 18:26 UTC (Mon) by vrfy (guest, #13362) [Link]

Systemd just did not like to get any "shit" together in userspace:
http://lists.freedesktop.org/archives/systemd-devel/2011-...

Adding runtime matches for architectures in userland tools to find out what to expect from a running kernel interface is just not acceptable these days, and people should stop doing that. We need to fix what's broken, and not paper over it.

It's surely not the kernel's job to fix userland tools, but it's even less the job of userspace to work around obviously non-working kernel interfaces, when they should be fixed instead.

Fixing broken userspace is NOT the kernel's job

Posted Apr 30, 2012 18:27 UTC (Mon) by arjan (subscriber, #36785) [Link]

but reality happens as well......

Fixing broken userspace is NOT the kernel's job

Posted May 1, 2012 23:59 UTC (Tue) by mezcalero (subscriber, #45103) [Link] (1 responses)

Sorry, but systemd is not the place to put workarounds for broken APIs. Fix the stuff where it is broken, don't litter systemd code with it, thank you very much.

Fixing broken userspace is NOT the kernel's job

Posted May 2, 2012 5:02 UTC (Wed) by dlang (guest, #313) [Link]

I have seen a very good companies product basically destroyed over many years of fixing things in the wrong place.

In opensource software there is almost never a reason to do this, and never without discussing the problem first.

That said, I have no problem with the sanity check that systemd is doing. They are not working around anything, they are just checking that what they are getting has a chance of being what they expect to get. They know the size of the object that they are getting, and so they only get that much data. fetching more would potentially cause other grief.

not all 32 bit compat, just x86

Posted Apr 30, 2012 18:48 UTC (Mon) by arnd (subscriber, #8866) [Link]

Note that part of the problem here is that 32 bit x86 handles the alignment of 64 bit integers different from everyone else. While there was originally no problem at all on the other architectures with compat support (powerpc, mips, s390, parisc and sparc), some user space tools added workarounds for x86 compat code that actually broke compat mode on all other architectures.

Nobody can know everything, but...

Posted Apr 30, 2012 20:26 UTC (Mon) by bdale (subscriber, #6829) [Link] (12 responses)

I'm a bit surprised that the apparent root cause of the conflict, that a patch needed for systemd was "fast-tracked" without a good understanding of the consequences for previously existing user space code, didn't even get a mention in the article.

Nobody can know everything, but...

Posted Apr 30, 2012 20:47 UTC (Mon) by ernstp (guest, #13694) [Link]

Yeah, that was what I was thinking also...

Nobody can know everything, but...

Posted May 1, 2012 9:14 UTC (Tue) by etrusco (guest, #4227) [Link] (10 responses)

After reading Linus' comments I can't avoid being biased against automount (well, this and maybe the automount's old-age fame of being hacky and buggy).
As I see it systemd simply "discovered" a bug in the kernel and the maintainers decided a general solution in the kernel was the best. And why do you say it was "fast-tracked"? 5 moths don't seem that rushed to me - but of course it's a bit unpleasant that they failed to notice that the other obvious user of the API had a different "fix" during all that time.

Nobody can know everything, but...

Posted May 1, 2012 10:05 UTC (Tue) by cortana (subscriber, #24596) [Link] (7 responses)

AIUI the problem only crops up when i386 userland is used on an amd64 kernel. Perhaps I'm being hopelessly naive, but I'd hope that such a setup is not particularly commonplace.

Nobody can know everything, but...

Posted May 1, 2012 10:29 UTC (Tue) by Fowl (subscriber, #65667) [Link]

Why? Seems like a perfectly reasonable situation. No bloated 64bit pointers but 4gb per process.

Nobody can know everything, but...

Posted May 1, 2012 15:09 UTC (Tue) by bdale (subscriber, #6829) [Link] (2 responses)

FWIW, 64 bit kernel and 32 bit userspace is what I run on my notebook.

Nobody can know everything, but...

Posted May 4, 2012 14:37 UTC (Fri) by bronson (subscriber, #4806) [Link] (1 responses)

How do you set that up?

Nobody can know everything, but...

Posted May 10, 2012 12:36 UTC (Thu) by nye (subscriber, #51576) [Link]

>How do you set that up?

aptitude install linux-image-amd64

Actually, it might even be the default these days if you install the i686 version of Debian on a 64-bit machine.

Nobody can know everything, but...

Posted May 1, 2012 16:45 UTC (Tue) by drag (guest, #31333) [Link]

I am running 64bit kernel with 32bit userland on a couple of my systems.

It's a very good option.

Nobody can know everything, but...

Posted May 11, 2012 13:21 UTC (Fri) by WolfWings (subscriber, #56790) [Link]

It's becoming the norm actually, lots of distro's do that by default or support it on their install/rescue media since it means they no longer need to build two separate userspaces, they just include two kernels to boot from on one disc, so for 99% of actual users there's only one specific disc image to download as a result.

Especially since a LOT of the time you _need_ a 32-bit userland to deal with a lot of binary packages out there or to get stuff running under WINE properly, it's easier for the "end user" images to stick to 32-bit userland even on 64-bit platforms. And even if you have a 64-bit userland you'll find it littered with various 32-bit-compat libraries.

Servers? Those are a different story, but many of those folks don't even need 32-bit compatibility so they can end up with a cruft-free pure-64-bit userland without all the 'compat' crap bolted in. So it reduces the size of the true-64-bit install media too, since there's no 32-bit cruft now.

Nobody can know everything, but...

Posted May 16, 2012 23:04 UTC (Wed) by steffen780 (guest, #68142) [Link]

IIRC that is what Linus recommends for many situations.

Nobody can know everything, but...

Posted May 1, 2012 17:14 UTC (Tue) by jzbiciak (guest, #5246) [Link] (1 responses)

I guess that's the real issue here: Linux accidentally broke the user-space ABI, and user-space hacked around it. (For me, that raises the question: Did they ever report this breakage to the kernel folks? If not, shame on them. If so, why didn't we fix this then?)

In any case, that effectively enshrined the bug as a new facet of the Linux kernel ABI.

Nobody can know everything, but...

Posted May 1, 2012 17:21 UTC (Tue) by raven667 (subscriber, #5198) [Link]

That communication and feedback loop from user space developers is really a critical factor. I would theorize that this explains a lot in the internals of older proprietary software such as Oracle or Solaris or Windows where the development team is kept away from customers. Raymond Chen has many curious examples of weird compatibility hacks in Windows for example which closely mirror this situation.

Fixing the unfixable autofs ABI

Posted Apr 30, 2012 21:09 UTC (Mon) by quotemstr (subscriber, #45331) [Link] (4 responses)

Packetized pipes seem like a bad idea.

1) O_DIRECT doesn't what this patch says it should mean. O_DIRECT means "no buffering", not "weird read(2) games". O_DIRECT on a pipe ought to mean something like "block write(2) until someone blocks in read(2), then memcpy directly to the reader's buffer", and with this patch, O_DIRECT can never mean that for pipes.

2) This "packetized pipe" system looks like an invitation for data loss. It's one thing to create a pipe mode that preserves message boundaries (like NT's PIPE_TYPE_MESSAGE named pipes), but the right behavior should be to cause read(2) to fail unless the reader gets the whole packet. (That's how NT message pipes work.) Because of the silent data loss issue, these packetized pipes aren't truly general-purpose.

In general, this approach seems like an overly general solution to a very particular problem.

Fixing the unfixable autofs ABI

Posted May 1, 2012 2:46 UTC (Tue) by neilbrown (subscriber, #359) [Link] (3 responses)

Applying an old flag to a new context is always a best-effort sort of thing. It is rarely perfect.

O_DIRECT typically has significant alignment restrictions, and this new semantic has some sort of alignment restrictions (if you squint a bit).

O_SYNC is also available for pipes and the man page say "Any writes on the resulting file descriptor will block the calling process until the data has been physically written to the underlying hardware", and if we understand "the memory in the reading process" to be "the underlying hardware", then your suggested feature (which I like) would fit to some degree with O_SYNC.

The data-loss issue sounds problematic, but we seem to have lived with it with SOCK_DGRAM sockets so I suspect we will survive with pipes. Hopefully the FIONREAD ioctl will provide the number of bytes that can be read, so a well written program can avoid losing data.

Fixing the unfixable autofs ABI

Posted May 1, 2012 7:26 UTC (Tue) by butlerm (subscriber, #13312) [Link] (2 responses)

Discarding data isn't so much of a problem as knowing when data has been discarded. recvmsg(2) is nice for that. You generally can't have message per read semantics for arbitrary size messages in any case because the kernel buffers must store an entire message to avoid unusual complexities. (Ask the SCTP people about the fun problems you get trying to implement a point-to-multipoint socket that supports arbitrary size messages.)

Considering that reality, all you really need is some reliable way to know whether your message has been truncated. Sockets have MSG_PEEK / MSG_TRUNC for that, so I guess the question is why wouldn't it be practical to make recvmsg(2) work on a packetized pipe so that you can do the same thing?

Fixing the unfixable autofs ABI

Posted May 3, 2012 19:30 UTC (Thu) by kleptog (subscriber, #1183) [Link] (1 responses)

Interestingly, packetised pipes have existed for a while, see SOCK_SEQPACKET (possibly only supported for AF_UNIX). They're indeed useful in cases where you want to be sure that clients don't lose sync when the packets change size. Though a well designed protocol doesn't have the problem in the first place.

I suppose changing the type of pipe to userspace would be an even greater ABI change.

Fixing the unfixable autofs ABI

Posted May 10, 2012 5:53 UTC (Thu) by kevinm (guest, #69913) [Link]

Well, yes - changing the type of file descriptor used wouldn't fix the existing binaries, which is the whole problem (in the autofs protocol, the userspace program creates the pipe with pipe(2)).

I think it was mentioned somewhere in the thread that an SOCK_SEQPACKET socket would have been a better design, but hindsight is always an exact science.

Fixing the unfixable autofs ABI

Posted Apr 30, 2012 23:18 UTC (Mon) by renox (guest, #23785) [Link] (1 responses)

Bleah, this API seems really ugly: when they created an API which can return different structures according to the kernel configuration, why didn't they add a word in the structure identifying the structure returned?

Fixing the unfixable autofs ABI

Posted May 1, 2012 15:40 UTC (Tue) by juliank (guest, #45896) [Link]

The different size was apparently an accident, not intended.

Fixing the unfixable autofs ABI

Posted Apr 30, 2012 23:59 UTC (Mon) by ewan (guest, #5533) [Link] (5 responses)

How does this even solve the problem? Surely any future releases of autofs using tools that are aware enough of the issue to use O_DIRECT to turn on the new, fixed, behaviour, would also be aware enough of the problem to work around it themselves? And any old releases won't be using O_DIRECT, and won't get the fix.

Fixing the unfixable autofs ABI

Posted May 1, 2012 0:36 UTC (Tue) by hamjudo (guest, #363) [Link] (4 responses)

This is a two part kludge. Part one is adding the "unusual" behavior to pipes. Part two, is making it so that the kernel side of the autofs code will set the flag in the right situations.

The history of operating systems is filled with examples of bad design decisions that weren't caught in time, and thus got stuck in the interface definition.

Fixing the unfixable autofs ABI

Posted May 1, 2012 0:55 UTC (Tue) by corbet (editor, #1) [Link] (3 responses)

Almost. The kernel side always turns on packetized behavior so that things work as expected with all known user-space binaries.

Fixing the unfixable autofs ABI

Posted May 1, 2012 15:08 UTC (Tue) by ewan (guest, #5533) [Link] (2 responses)

I'm sure I'm being dense, but when the article says "Linus made an open with O_DIRECT turn on the packetized behavior" surely that rather implies that it's off unless you use O_DIRECT?

Fixing the unfixable autofs ABI

Posted May 1, 2012 15:38 UTC (Tue) by hamjudo (guest, #363) [Link]

It is off unless, something turns on O_DIRECT. The kernel side of the autofs interface turns on O_DIRECT. As far as I know, it isn't turned on anywhere else in the kernel (yet).

Nothing, but common sense, prevents users from writing programs that use this feature.

Fixing the unfixable autofs ABI

Posted May 1, 2012 15:41 UTC (Tue) by juliank (guest, #45896) [Link]

As far as I understand it, if you create such a a pipe yourself, then you use O_DIRECT; and for autofs, the kernel creates such a pipe itself and you thus automatically get that.

Fixing the unfixable autofs ABI

Posted May 7, 2012 11:23 UTC (Mon) by mjthayer (guest, #39183) [Link] (4 responses)

Not that it would have helped immediately in this situation, but I wonder why kernel ABIs are not designed in a way that they can be easily be emulated in user space. Then when people decide that an ABI is old and broken it could be replaced with a better one in the kernel and optional user space emulation could be added for the old ABI and any remaining users. Which given the way Linux distributions are organised would mean that the emulation would go into a package which would become a dependency of the packages for those users.

Fixing the unfixable autofs ABI

Posted May 7, 2012 14:04 UTC (Mon) by dgm (subscriber, #49227) [Link] (1 responses)

What you're describing is a library, just like the Standard C Library, or pthreads. Your hard drive is full of them.

Fixing the unfixable autofs ABI

Posted May 7, 2012 14:24 UTC (Mon) by mjthayer (guest, #39183) [Link]

I agree, shared libraries in user space tend to have that property.

Fixing the unfixable autofs ABI

Posted May 10, 2012 12:41 UTC (Thu) by slashdot (guest, #22014) [Link] (1 responses)

It's simpler and faster to just do the "emulation" in the kernel and keep it there forever.

Fixing the unfixable autofs ABI

Posted May 11, 2012 16:49 UTC (Fri) by amonnet (guest, #54852) [Link]

a solution which leads to the current problem ...

The Only Natural Alignment Is Byte Alignment

Posted May 16, 2012 10:03 UTC (Wed) by ldo (guest, #40946) [Link] (3 responses)

This is why alignment on greater than one-byte boundaries should never be included in any interchange format or protocol, because invariably the architectural considerations that make them look like a good idea at one end in one situation will end up looking stupid at the other end or in another situation. And because it’s an interchange format, you cannot just simply change it without breaking compatibility.

Alignment only makes sense for internal program in-memory structures, which can be changed with a simple recompile and restart. Interchange formats should always be defined with no alignment padding. End of story.

The Only Natural Alignment Is Byte Alignment

Posted May 16, 2012 10:33 UTC (Wed) by amonnet (guest, #54852) [Link] (2 responses)

Look's like many protocol (think IPv4, etc) have been carefully crafted to provide natural alignment bigger than a byte. No padding but a nice choice of field size and position.
Anyway, it's not as if 16, 32, 64, .. are just some random numbers.

The Only Natural Alignment Is Byte Alignment

Posted May 16, 2012 10:50 UTC (Wed) by amonnet (guest, #54852) [Link]

To make things clear, alignment paddings in protocol are called private/unused/future fields.

The Only Natural Alignment Is Byte Alignment

Posted May 16, 2012 12:15 UTC (Wed) by paulj (subscriber, #341) [Link]

Sure, but all the fields' positions are still explicitly specified independently of any alignment concerns. I.e. the fields may have been constructed with regard for alignment, however the protocol doesn't depend on any alignment concerns.