The trouble with struct sockaddr's fake flexible array
The many faces of struct sockaddr
The sockaddr structure dates back to the beginning of the BSD socket API; it is used to hold an address corresponding to one side of a network connection. The 4.2 BSD networking implementation notes from 1983 give its format as:
struct sockaddr { short sa_family; char sa_data[14]; };
The sa_family field describes which address family is in use —
AF_INET for an IPv4 address, for example. sa_data holds
the actual address, the format of which will vary depending on the family.
The implementation notes say that: "the size of the data field,
14 bytes, was selected based on a study of current address
formats
". In other words, 14 bytes — much longer than the four
needed for an IPv4 address — should really be enough for anybody.
Need it be said that 14 bytes turned out not to be enough? As new protocols came along, they brought address formats that were longer than would fit in sa_data. But the sockaddr structure was firmly set in stone as user-space API and could not be changed. It appears in essentially the same form in any modern Unix-like system; on Linux the type of sa_family is now sa_family_t, but otherwise the structure is the same.
The result was one of the (many) historic kludges of the Unix API. New protocol implementations typically brought with them a variant of struct sockaddr that was suitably sized for the addresses in use; struct sockaddr_in6 for IPv6 addresses, for example, or struct sockaddr_ax25 for AX.25. All of the socket API interfaces still specified struct sockaddr, but implementations on both sides would use the appropriate structure for the protocol in use. Code on both sides of the API would cast pointers to and from struct sockaddr as needed.
Even now, the documented APIs for system calls like connect() and library functions like getaddrinfo() use struct sockaddr. As a result, both user-space programs and the kernel contain a whole set of casts between that type and the type they are (hopefully) actually using. Needless to say, these casts can be error prone; casting a pointer between different structure types is also deemed to be undefined behavior in current C. But that's the price we pay for API compatibility.
The advent of IPv6 also brought another type: struct sockaddr_storage; it is defined as starting with the same sa_family field, but being large enough to hold any of the other sockaddr variants. Code dealing with network addresses can allocate a structure of this type and be sure of having enough space to store any address. This structure is now what is often allocated, but it never appears explicitly in the system-call interface.
Making the flexible array explicit
The C language has accumulated a few idioms for the declaration of flexible arrays over the years; specifying a dimension of zero or one are both common (though deprecated) examples. The syntax blessed by the language standard, though, is to omit the dimension entirely:
struct something { /* ... */ int flex_member[]; /* A flexible array */ };
This syntax makes it clear that a flexible array is in use and that the type declaration cannot be used, on its own, to check for overflows of that array. In no convention is it deemed reasonable to use a dimension of 14 for a flexible array, but that is exactly what now happens with struct sockaddr. The actual length of sa_data is not known, and has a good chance of being larger than the declared size. It is a flexible array disguised as an ordinary array.
That usage complicates checking of struct sockaddr usage for overflows, but the effects go beyond that; it makes detection of flexible arrays harder across the kernel. As Kees Cook noted in this 2022 patch:
One of the worst offenders of "fake flexible arrays" is struct sockaddr, as it is the classic example of why GCC and Clang have been traditionally forced to treat all trailing arrays as fake flexible arrays: in the distant misty past, sa_data became too small, and code started just treating it as a flexible array, even though it was fixed-size.
As long as this usage remains, the checking tools built into both compilers must treat any trailing array in a structure as if it were flexible; that can disable overflow checking on that array entirely.
It would be nice to change this usage but, as was noted above, the layout of struct sockaddr is wired deeply into the socket interface and cannot be changed without breaking applications. But that doesn't mean that the kernel must treat sa_data as anything but a flexible array. To enable that without changing the binary interface, Cook redefined struct sockaddr within the kernel to:
struct sockaddr { sa_family_t sa_family; union { char sa_data_min[14]; DECLARE_FLEX_ARRAY(char, sa_data); }; };
(The DECLARE_FLEX_ARRAY() macro jumps through some hoops needed to declare a flexible array within a union). This change made it clear that sa_data is a flexible array, which helped, in turn, in the goal of allowing the compilers to treat trailing arrays as non-flexible unless they are explicitly declared as such.
This patch was merged for the 6.2 release, and all seemed to be well. But,
as Gustavo A. R. Silva pointed out in this patch
series, there is a problem with this approach. There are many places
in the kernel where struct sockaddr is embedded within another
structure, usually not at the end. That has the result of placing a
flexible array in the middle of the embedding structure, which is
problematic for fairly obvious reasons; the compiler no longer knows what
the offsets to the members after struct sockaddr should be. That
has resulted in "thousands of warnings
" when the suitable check is
enabled in the compiler.
Silva's solution was to introduce yet another variant with a familiar form:
struct sockaddr_legacy { sa_family_t sa_family; char sa_data[14]; };
This structure, which lacks the flexible-array member, was then embedded in the other structures, making the warning go away. Since the embedding cases did not use sa_data as a flexible array (otherwise things would never have worked in the first place), this change was deemed safe to make.
Networking maintainer Jakub Kicinski was not convinced
about this change, though. He suggested that perhaps Cook's patch should
be reverted instead, and a new type should be added for places where a
flexible array is actually needed. Cook acknowledged this
suggestion as "a pretty reasonable position
" and started to ponder
on alternatives. He concluded: "Now, if we want to get to a place with
the least ambiguity, we need to abolish sockaddr from the kernel
internally, and I think that might take a while.
"
Leaving struct sockaddr behind
In early November, Cook returned with a brief patch series meant to show what that approach would look like. It begins by reverting the 2022 patch, returning struct sockaddr to its original non-flexible form. There is a patch adding comments to places in the networking code that are known to use this structure within its original bounds; they do not need to be changed, and do not need sa_data to be flexible. But that still leaves many uses of struct sockaddr where the data area may, in reality, be larger than 14 bytes.
The solution for many of those places is just to use struct sockaddr_storage instead. Indeed, parts of the network stack already use that structure, but then cast pointers to struct sockaddr for functions that expect that type. One example is inet_addr_is_any(), which takes a struct sockaddr * argument, but is only called by functions using struct sockaddr_storage. In this case, the solution is to change the prototype of the function to match what is really being passed to it and remove the casts from the callers.
Some changes will require more churn, even if they are conceptually simple.
The getname() callback (in the proto_ops
structure) has long expected a pointer to a sockaddr_storage
structure, but its prototype was never changed to match. The patch
eliminating the use of struct sockaddr for getname()
mostly consists of name changes and cast removal, but it touches
66 files. It also, as Cook noted in the cover letter, is still lying to
the compiler in cases where the backing structure is actually smaller than
struct sockaddr_storage, "these remain just as safe as they
used to be. :)
"
This series shows that truly eliminating the use of this structure's
sa_data field as a flexible array in disguise will involve a fair
amount of work and code churn. Even so, Kicinski commented that it
"feels like the right direction
". So, while struct
sockaddr will likely remain part of the kernel's system-call API
forever, its use within the kernel can be expected to fade away over time.
A design miscalculation made over 40 years ago may finally stop
impeding the use of modern memory-safety tools.
Index entries for this article | |
---|---|
Kernel | Flexible arrays |
Posted Nov 7, 2024 19:37 UTC (Thu)
by NYKevin (subscriber, #129325)
[Link] (1 responses)
> The sa_family field describes which address family is in use — AF_INET for an IPv4 address, for example. sa_data holds the actual address, the format of which will vary depending on the family. The implementation notes say that: "the size of the data field, 14 bytes, was selected based on a study of current address formats". In other words, 14 bytes — much longer than the four needed for an IPv4 address — should really be enough for anybody.
A size_t (or the moral equivalent in pre-standards C) has never been 10+ bytes on any real platform that people have used for serious purposes, with the possible exception of a small handful of natively 128-bit supercomputers. If you're willing to waste 10 bytes in the extremely common case of IPv4, you should be willing to spend 4 bytes on a size_t/long/whatnot (or however many bytes a size_t-ish was in 1983), since that would net out to 6 bytes saved in the case of IPv4. I think the more plausible explanation is that somebody didn't feel like writing the extra code to initialize the size field.
(And no, using something other than a size_t due to its lack of standardization would not have caused problems today. Unless there's some weird networking protocol I don't know about that requires addresses bigger than 64K, a 16-bit size field would still be more than adequate even if it would be a bit nonstandard.)
Posted Nov 7, 2024 20:53 UTC (Thu)
by khim (subscriber, #9252)
[Link]
The simple explanation is that was 1983. Time where backward compatibility started becoming important but no one really felt it would become critical. Just five years before that FORTRAN removed some pieces from standard without anyone batting an eye! Surely it's a software, it's malleable, people would just fix it later, when requirements would change? It's only when people started dragging feet with upgrade to FORTRAN 77, when people essentially boycotted MS-DOS 4.0 (in 1988!) because it changed some internal data structures which “programs were not supposed to ever touch”, only when OS/2 completely imploded because someone decided to “improve” Presentation Manager API… only then developers of operation systems realized that keeping ABI stable would make or break them (and even after that moment Linux developers denied the obvious for decades, but that's another thing). I would bet that no one expected to see
Posted Nov 7, 2024 20:19 UTC (Thu)
by smurf (subscriber, #17840)
[Link]
Well since the subtype of "struct sockaddr" that's actually used in these places is not and never was 14+2 bytes long, why was the option to replace these with the sockaddr type that's actually used there not considered?
Seems like a no-brainer to me.
Posted Nov 7, 2024 23:52 UTC (Thu)
by jengelh (guest, #33263)
[Link] (1 responses)
I am surprised no one has mentioned that the sockaddr struct family essentially forms a class hierarchy. Think of it like so:
struct sockaddr { sa_family_t family; } struct sockaddr_in6 : public sockaddr { uint16_t port; ... } struct sockaddr_storage : public sockaddr { char moarroom[126]; };
And then it's obvious why bind() and connect() take a struct sockaddr. It works with any sockaddr type, and in that sense, userland should have no need to care about the size of sockaddr::sa_data, or that this member even exists. sockaddr::sa_data (as well as sockaddr_storage) may help code reduction, fewer casts, or performance optimizations, but it is not needed conceptually.
>Code dealing with network addresses can allocate a [sockaddr_storage] and be sure of having enough space to store any address
sockaddrs are inherently variable-length (as evidenced by the length argument to bind/connect/accept), and sockaddr_storage does not fix that. It's just a conveniently-sized buffer for "this operating system's most common sockaddrs at the time of compilation", but it is not completely future-proof.
In fact, we could have limitness AF_UNIX paths already if we did not constrain ourselves to using sockaddr_un (in both userland and kernelspace):
const char *path = argv[1];
Posted Nov 8, 2024 1:37 UTC (Fri)
by wahern (subscriber, #37304)
[Link]
Some systems, like macOS, do allow providing a path larger than sizeof .sun_path (104 on my laptop). On those systems, when querying an address using getsockname or getpeername, you should check the returned addrlen and try again if it's larger than the provided buffer. Though in the case of macOS the path length isn't limitless, but 252 characters (address structure size of 255, including a trailing NUL character).
Posted Nov 8, 2024 2:59 UTC (Fri)
by clugstj (subscriber, #4020)
[Link] (13 responses)
How can this be anything but a bug? It's not big enough for many address formats.
Posted Nov 8, 2024 3:41 UTC (Fri)
by iabervon (subscriber, #722)
[Link] (12 responses)
Posted Nov 8, 2024 7:02 UTC (Fri)
by smurf (subscriber, #17840)
[Link] (11 responses)
So why do the short variants need padding up to an arbitrary 14, which happens to not match *any* actual address type's length?
Posted Nov 8, 2024 7:58 UTC (Fri)
by johill (subscriber, #25196)
[Link] (10 responses)
Sure, wireless extensions wouldn't need 14 bytes for a MAC address there, only ever 6. But now it's stuck with 14 and can't use 6, and shouldn't use "14 or variable" either because it's embedded in other types.
Posted Nov 8, 2024 21:31 UTC (Fri)
by magfr (subscriber, #16052)
[Link]
Would it be confirming if the kernel returned a size of 8 and only filled in the sin_family, sin_port, and sin_addr fields?
I suppose the kernel does nothing with the padding as it is today so beeing explicit about it should be ok.
It should be noted that posix says sa_data must exist but it says nothing about it's size.
Then there is the issue that all the sockaddr_xx types must have the same alignment.
What I am going for is if the following is standards conforming:
typedef alignas(16) int16_t sa_family_t;
with an obvious caveat for what the alignment of sa_family_t should be.
In particular this makes sizeof(struct sockaddr) == 2 and sizeof(struct sockaddr_in) == 8 but note that
Posted Nov 9, 2024 8:33 UTC (Sat)
by smurf (subscriber, #17840)
[Link] (8 responses)
So why does the kernel really need padded addresses anywhere? Just use / copy however many bytes the address's type requires and be done with it.
Posted Nov 9, 2024 23:15 UTC (Sat)
by johill (subscriber, #25196)
[Link] (7 responses)
But for some APIs the address is _embedded_ into other structures (which is what most of the article is about), and changing the size now would change the position of everything that comes after it, breaking the ABI.
Posted Nov 10, 2024 9:17 UTC (Sun)
by smurf (subscriber, #17840)
[Link] (6 responses)
I'm not aware of any system call that embeds a sockaddr in some other data structure, but maybe that's just me.
Posted Nov 10, 2024 19:20 UTC (Sun)
by johill (subscriber, #25196)
[Link] (5 responses)
You can take a look at https://lore.kernel.org/netdev/cover.1729802213.git.gusta... for probably most/all of the APIs affected.
Posted Nov 10, 2024 21:18 UTC (Sun)
by smurf (subscriber, #17840)
[Link] (4 responses)
Oh well.
Posted Nov 10, 2024 21:32 UTC (Sun)
by johill (subscriber, #25196)
[Link] (3 responses)
Posted Nov 11, 2024 9:21 UTC (Mon)
by Sesse (subscriber, #53779)
[Link] (2 responses)
Posted Nov 11, 2024 10:36 UTC (Mon)
by mb (subscriber, #50428)
[Link] (1 responses)
Posted Nov 11, 2024 12:34 UTC (Mon)
by johill (subscriber, #25196)
[Link]
As for the question about why it "must" be coupled with changes in the CLI, well ... it doesn't really need to be, but when writing a completely new tool that for the most part has completely new semantics, keeping the CLI intact really isn't the first priority.
If you want to submit patches to make 'iw' have a mode where it behaves like iwconfig/iwlist for a subset of functionality, I guess I'd even apply them if they're reasonably well implemented.
But you can keep using it for eternity, for all I care, just don't expect new hardware support etc., and then you can just keep using an old kernel version too.
Posted Nov 8, 2024 6:23 UTC (Fri)
by magnus (subscriber, #34778)
[Link] (3 responses)
Beside legacy issues, could it make sense to have a minimum size for flex arrays in some cases, like to know you can always use a bigger access to fetch it or use the first n bytes as a hash or something (provided the unused bytes are always 0)?
Posted Nov 8, 2024 6:57 UTC (Fri)
by intelfx (subscriber, #130118)
[Link] (2 responses)
Isn't short more-or-less always 2 bytes long? (Checked x86-64, aarch64, mips64 and ppc64, what did I miss?)
Posted Nov 8, 2024 10:20 UTC (Fri)
by jengelh (guest, #33263)
[Link]
Posted Nov 8, 2024 13:19 UTC (Fri)
by magnus (subscriber, #34778)
[Link]
Posted Nov 8, 2024 8:57 UTC (Fri)
by taladar (subscriber, #68407)
[Link] (14 responses)
Posted Nov 8, 2024 19:48 UTC (Fri)
by jeremyhetzler (subscriber, #127663)
[Link]
Because intuitively I see your point, but demonstrably Linux is very *very* committed to the concept. MS maybe less so, but still more than I imagine is convenient for them.
Posted Nov 8, 2024 20:25 UTC (Fri)
by rweikusat2 (subscriber, #117920)
[Link] (4 responses)
:-)
Posted Nov 9, 2024 8:38 UTC (Sat)
by smurf (subscriber, #17840)
[Link]
Everything else should literally stay the same. I have no idea how many programs go splat because their address padding is no longer overwritten but if the number is nonzero then a compatibility mode can be added to the place where the address is copied to userspace. Not spread across the whole of the kernel.
Posted Nov 14, 2024 5:40 UTC (Thu)
by dvdeug (guest, #10998)
[Link] (2 responses)
Posted Nov 14, 2024 7:01 UTC (Thu)
by smurf (subscriber, #17840)
[Link] (1 responses)
Posted Nov 14, 2024 21:29 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link]
A lot of the sockets API needs to be rewritten. It needs to support Happy Eyeballs, it needs to expose the DNS data explicitly (TXT queries, etc.), it needs to be able to deal with changing/multiple addresses for the bound sockets, etc.
Posted Nov 10, 2024 18:33 UTC (Sun)
by lunaryorn (subscriber, #111088)
[Link] (7 responses)
Posted Nov 10, 2024 21:31 UTC (Sun)
by intelfx (subscriber, #130118)
[Link] (1 responses)
How would you do one better? (Architecturally, not just minor C technicalities like the one discussed in the article.)
Posted Nov 23, 2024 18:15 UTC (Sat)
by anton (subscriber, #25547)
[Link]
Posted Nov 10, 2024 21:35 UTC (Sun)
by smurf (subscriber, #17840)
[Link]
Posted Nov 17, 2024 19:35 UTC (Sun)
by skissane (subscriber, #38675)
[Link] (3 responses)
The IETF is working on a "successor to the sockets API": https://datatracker.ietf.org/doc/draft-ietf-taps-interface/
Although who knows who will actually implement it.
Posted Nov 17, 2024 21:05 UTC (Sun)
by Cyberax (✭ supporter ✭, #52523)
[Link] (1 responses)
Posted Nov 18, 2024 6:18 UTC (Mon)
by smurf (subscriber, #17840)
[Link]
Posted Nov 17, 2024 21:13 UTC (Sun)
by smurf (subscriber, #17840)
[Link]
Well. Eight people have written 26 RFC drafts, during the last 6 years, for this one.
> Although who knows who will actually implement it.
IMHO the comment by Roman Danyliw (see the RFC history) is telling:
> […] I am abstaining on this document because there is a degree of specificity in the abstract API in this document that I cannot reconcile with proposed standard status. […]
… meaning that it's way not specific enough that, given this RFC, somebody'd be able to write a program that's expected to work with any specific implementation. Thus, more RFCs and system-specific (thus mostly-incompatible) scaffolding will be needed.
Thus, again IMHO, this will not replace the Socket API any time soon.
Posted Nov 9, 2024 13:08 UTC (Sat)
by jd (guest, #26381)
[Link] (1 responses)
It would have required a fair bit of work for legacy code, yes, but you might have expected new code to use it as IPv6 was quite difficult to handle.
This suggests that there's a degree of conservatism in programming, a preference for The Old Ways. This is going to complicate efforts to do anything new that is visible.
Posted Nov 17, 2024 19:24 UTC (Sun)
by skissane (subscriber, #38675)
[Link]
Is that available somewhere still? Would be an interesting bit of history to look at.
BSD didn't even save a size_t by doing this
> I think the more plausible explanation is that somebody didn't feel like writing the extra code to initialize the size field.
BSD didn't even save a size_t by doing this
sockaddr
, casually introduced in 1983 just for the “large-scale experiment” (as Vint Cerf attests IPv4
was envisioned as large-scale experiment, no one expected it to be used outside of academy and there were expectation that another thing or yet another thing would be used by everyone else) – would still be in use 40 years later and would affect hundreds of billions devices… if they have heard about it they would have assumed you have lost all your marbles.Why oh why …?
sockaddr API design
socklen_t addrlen = offsetof(struct sockaddr, sa_data) + strlen(path) + 1;
char *addr = malloc(addrlen);
((struct sockaddr *)addr)->sa_family = AF_UNIX;
strcpy(&addr[2], argv[1]);
int ret = bind(sk, (struct sockaddr *)addr, addrlen);
sockaddr API design
sockaddr used in the kernel?
sockaddr used in the kernel?
sockaddr used in the kernel?
sockaddr used in the kernel?
sockaddr used in the kernel?
struct sockaddr { sa_family_t sa_family; char sa_data[]; };
struct sockaddr_in { sa_family_t sin_family; uint16_t sin_port; struct inaddr sin_addr; };
sizeof(struct sockaddr[2]) == 32 due to the alignment.
sockaddr used in the kernel?
sockaddr used in the kernel?
sockaddr used in the kernel?
sockaddr used in the kernel?
sockaddr used in the kernel?
sockaddr used in the kernel?
sockaddr used in the kernel?
sockaddr used in the kernel?
So it's exactly the opposite of what you say: The fundamental API has changed, but the obsolete CLI and its obsolete API have not.
So, if almost two decades is not enough for you to adapt to the "new" iw CLI, then you can keep using iwconfig/iwlist.
sockaddr used in the kernel?
minimum size for flexible arrays
minimum size for flexible arrays
minimum size for flexible arrays
minimum size for flexible arrays
Backwards compatibility at any cost?
Backwards compatibility at any cost?
Backwards compatibility at any cost?
Backwards compatibility at any cost?
Backwards compatibility at any cost?
Backwards compatibility at any cost?
Backwards compatibility at any cost?
Backwards compatibility at any cost?
Backwards compatibility at any cost?
Backwards compatibility at any cost?
How would you do one better?
By looking at what Ken Thompson did, i.e. Plan 9. The Berkeley socket API appears totally alien to Unix, and was probably designed to have as little to do with the rest of the kernel as possible. E.g., send() instead of write() and recv() instead of read(); at least that was corrected later, but we still have send() and recv().
Backwards compatibility at any cost?
Backwards compatibility at any cost?
Backwards compatibility at any cost?
Backwards compatibility at any cost?
They can also eat your machine's performance for breakfast (and have room left over) when you do them naïvely.
Backwards compatibility at any cost?
That's four people, five years, and 20 drafts too many IMHO.
Also, the latest of those drafts expired two months ago, but at least it's in Approved state and in the RFC editor queue by now.
Interesting problem
Interesting problem