LWN: Comments on "strscpy() and the hazards of improved interfaces" https://lwn.net/Articles/659214/ This is a special feed containing comments posted to the individual LWN article titled "strscpy() and the hazards of improved interfaces". en-us Thu, 11 Sep 2025 15:55:03 +0000 Thu, 11 Sep 2025 15:55:03 +0000 https://www.rssboard.org/rss-specification lwn@lwn.net strscpy() and the hazards of improved interfaces https://lwn.net/Articles/767571/ https://lwn.net/Articles/767571/ klossner <div class="FormattedComment"> That's eight extra bytes (on a 32-bit machine), four each for the length and pointer. C was developed when userspace RO data + heap couldn't exceed about 56K bytes (on a PDP-11, the first popular Unix machine).<br> <p> A more likely design choice at the time would have been the UCSD Pascal scheme in which the first bytes of the char array contain the length. On the PDP-11, a two-byte length was sufficient. But that would have given up the convenience that a pointer to string is a pointer to the first byte of its data, allowing the same function to take either a string or a non-string buffer, e.g.<br> write(1, "Hello, world\n", 13).<br> <p> Buffer overflow wasn't on the radar much for 16-bit machines. We had so little space that we took better care of it. Then the 32-bit VAX came along and that discipline went by the wayside.<br> <p> </div> Thu, 04 Oct 2018 00:47:43 +0000 LT's C++ rants https://lwn.net/Articles/662255/ https://lwn.net/Articles/662255/ hummassa <div class="FormattedComment"> How come? Linus' 2007 C++ rant, analyzed:<br> <p> 1. C++ programmers are idiots<br> <p> this is one of LT's least brilliant moments, IMNSHO.<br> <p> 2. C++ leads to bad design choices<br> <p> yeah, and still does somewhat but look: kernel-C uses more or less the same design choices, and loses the correctness checks that C++ would give. RAII, people. RAII.<br> <p> 3. STL &amp; Boost are non-portable, non-stable OR<br> <p> old-school<br> <p> 4. STL &amp; Boost can be nice, but are full of hidden surprises<br> <p> old-school<br> <p> 5. C++ is unportable<br> <p> where there is gcc, there is g++ and libstdc++; and probably clang++ and libc++ too.<br> <p> He had reservations (in a 2004 post IIRC) about exceptions, too (because at the time they were expensive even when non-used... which nowadays is ancient history). Other thing he had ample reason (and still has, in some way) is that the type system causes ginourmous illegible error messages (but this has been mitigated a lot both by clang++ and g++ lately).<br> </div> Wed, 28 Oct 2015 15:39:39 +0000 strscpy() and the hazards of improved interfaces https://lwn.net/Articles/660988/ https://lwn.net/Articles/660988/ ksandstr <div class="FormattedComment"> Linux has tens and tens of megabytes of source c^W^Wtest debt. Its various systems and subsystems are well interlinked and mostly without formal characterization even in isolation, let alone in combination. Writing tests to cover even a significant core (e.g. mm, vfs, ipc, block, char, sockets, ip stack) would be an exercise in archaeology as much as engineering, and the timetable for such an effort would easily reach into the early to mid 2020s.<br> <p> In addition, tests rot and turn crusty. Old tests accrete and become difficult to validate (even if test validation were automated) as the collective knowledge used as their basis fades. At that point one could insist on formal specs, and then derive tests from those specs concurrently with the implementation, so that test rot is preceded by spec rot by at least one rung on a metaphorical ladder of things I'll climb tomorrow (honest!) because The Boss wants results last week.<br> <p> Not to mention that there's at least two incompatible schools of test-writing, just divided by style of tool: the multiple-assertion tests (as in Check, JUnit, etc), and the multiple-point tests (Perl's Test Anywhere Protocol). The latter has great density, produces an useful output for successful tests, and offers many comforts for the programmer, whereas the former is popular and well familiar to the Java generation (those who studied after 2000).<br> <p> Running all these tests as part of regular development, i.e. so that a TDD-ey test replaces operator interactions, would also become more difficult as coverage increases. Tens of megabytes of code means hundreds of megabytes of test code, and if each of (say) 20,000 tests runs for five seconds, then the grand suite would execute serially in ~28 hours. That's long enough that 1/100 of it is too long a break to sustain programmer attention over an edit-compile-test cycle.<br> <p> So there's quite a few obstacles there, with few gains early on (the first decade?). I've not even touched on the architectural testability issues, i.e. whether it's even possible to programmatically explore significant areas of the (state * parameter) space for various important operations within the kernel.<br> <p> </div> Fri, 16 Oct 2015 05:40:14 +0000 strscpy() and the hazards of improved interfaces https://lwn.net/Articles/660961/ https://lwn.net/Articles/660961/ PaXTeam <div class="FormattedComment"> strncpy overwrites the entire destination buffer (or rather, whatever the size parameter says) whereas strscpy may not. while this should have no observable effect on a conforming program, it can have undesired (side)effects if the entire destination buffer is sent to a different privilege domain (say, kernel-&gt;userland or userland-&gt;remote system).<br> </div> Thu, 15 Oct 2015 22:47:50 +0000 strscpy() and the hazards of improved interfaces https://lwn.net/Articles/660902/ https://lwn.net/Articles/660902/ dfsmith <p>Is strscpy functionally equivalent (though presumably faster than) the following?<br/> <tt>#define strscpy(DEST,SRC,LEN) ((strncpy(DEST,SRC,LEN)[(LEN)-1]='\0',strnlen(SRC,LEN)&gt;=(LEN))?-E2BIG:strlen(DEST))</tt></p> Thu, 15 Oct 2015 17:02:11 +0000 strscpy() and the hazards of improved interfaces https://lwn.net/Articles/660389/ https://lwn.net/Articles/660389/ renox <div class="FormattedComment"> And with 'pointer + length' strings you can do it without having to replace delimiter character with nulls..<br> </div> Mon, 12 Oct 2015 12:05:55 +0000 strscpy() and the hazards of improved interfaces https://lwn.net/Articles/660381/ https://lwn.net/Articles/660381/ JdGordy <div class="FormattedComment"> 2 problems... <br> <p> 1) There is going to be someone somewhere which depends on that behaviour(!)<br> 2) The code might be there but not in the immediate area so you'd miss it with mass replace (i.e the safety checks are performed up the call chain so you'd end up adding unnecessary code because "you" thought there was a preexisting bug because the code isnt understood enough.<br> </div> Mon, 12 Oct 2015 09:00:13 +0000 strscpy() and the hazards of improved interfaces https://lwn.net/Articles/660367/ https://lwn.net/Articles/660367/ jwarnica <div class="FormattedComment"> Depends on what you mean by "interesting". I'm not even the slightest a kernel hacker, but given the existence of paravirtualization (e.g. fake, contrived) hardware, it isn't difficult to believe most of the OS isn't drivers, and, from that, that writing a stub driver is too difficult to stub out.<br> <p> If they wanted code coverage, they would have it.<br> <p> </div> Mon, 12 Oct 2015 03:57:13 +0000 strscpy() and the hazards of improved interfaces https://lwn.net/Articles/660254/ https://lwn.net/Articles/660254/ pabs <div class="FormattedComment"> My comment was about truncating the string at the end of the buffer in the case where the whole string doesn't fit into the buffer. In userspace at least this can have various consequences from breaking UTF-8 characters to incorrect filesystem access. Not sure about kernelspace though.<br> </div> Sat, 10 Oct 2015 15:20:02 +0000 strscpy() and the hazards of improved interfaces https://lwn.net/Articles/660217/ https://lwn.net/Articles/660217/ zlynx <div class="FormattedComment"> I wrote some nice and fast URL parsing code which builds an array of {size_t len, char *buf} structures out of an immutable string. Its just like creating pointers to the start of strings except I don't scribble over the source string which means I don't have to copy it.<br> </div> Sat, 10 Oct 2015 02:07:35 +0000 strscpy() and the hazards of improved interfaces https://lwn.net/Articles/660062/ https://lwn.net/Articles/660062/ dlang <div class="FormattedComment"> There are also a lot of times when you take a string in and need to split it up. With C strings, you can frequently do this in place by replacing delimiter characters with nulls and creating pointers to the starts of the strings.<br> </div> Fri, 09 Oct 2015 00:55:20 +0000 strscpy() and the hazards of improved interfaces https://lwn.net/Articles/660056/ https://lwn.net/Articles/660056/ Richard_J_Neill <div class="FormattedComment"> In addition, functions such as strlen() would be very much faster, and counting back from the end of a string (e.g. to check a file extension) would be more efficient. But we would have the problem of wasting 4 extra bytes (we'd probably need to keep the null-termination for safety/compatibility), and there would be issues when a string's length exceeded 4 GB.<br> </div> Fri, 09 Oct 2015 00:12:31 +0000 strscpy() and the hazards of improved interfaces https://lwn.net/Articles/660046/ https://lwn.net/Articles/660046/ ncm <div class="FormattedComment"> Last I heard glibc was going to cave on strlcpy, more's the pity.<br> <p> Almost always just "if (0 &gt; strscpy" suffices. But if somebody meant to adopt a sensible interface, that would always suffice, because it would take two more arguments, a size_t* and a size_t, with the latter an offset and the former a place to record the offset of the NUL after the copy. But sensible is too much to expect when strlcpy seems like a good idea to many. (And, no, the pointer alone isn't enough; it's too easy to forget to initialize what it points at before the call.). The name of the sensible interface would be strto(), because long, cryptic names with mystifying initials do nobody any good.<br> </div> Thu, 08 Oct 2015 22:32:39 +0000 strscpy() and the hazards of improved interfaces https://lwn.net/Articles/660025/ https://lwn.net/Articles/660025/ robclark <div class="FormattedComment"> <font class="QuotedText">&gt; It is a pity that the kernel doesn't have any automated testing. Refactoring/cleanups are much less scarier when test coverage of a program is good.</font><br> <p> you mean like <a href="http://kernelci.org/">http://kernelci.org/</a> or 0-day kbuild robot thing (which iirc is doing boot tests on qemu?)<br> <p> (granted that probably doesn't scratch the surface when it comes to driver coverage w/ all the different hw that is out there)<br> <p> <p> </div> Thu, 08 Oct 2015 20:43:55 +0000 strscpy() and the hazards of improved interfaces https://lwn.net/Articles/660012/ https://lwn.net/Articles/660012/ lsl <div class="FormattedComment"> Why? Linus' more well-known C++ rants are just as relevant to modern C++ as they are for C with classes.<br> </div> Thu, 08 Oct 2015 19:23:49 +0000 strscpy() and the hazards of improved interfaces https://lwn.net/Articles/660001/ https://lwn.net/Articles/660001/ Karellen <div class="FormattedComment"> I think the comment was in reference to the behaviour of strscpy() in earlier versions of the patchset, which, rather than truncating the destination string (or overrunning, or leaving unterminated), instead placed a NUL byte at the very start, giving you a totally empty string.<br> <p> If the problem with strlcpy() is that people would still use the truncated string after ignoring the return value being too big, then I'm not entirely convinced that just changing the return value to -E2BIG will fix that. Making an overrun provide an empty string to the caller seems like it would be even less ignorable and more likely to result in even more robust code.<br> <p> Is truncation a better alternative than NULing the string? I am not yet convinced - but I have not read the thread where such arguments are likely to have been explored.<br> </div> Thu, 08 Oct 2015 18:33:19 +0000 strscpy() and the hazards of improved interfaces https://lwn.net/Articles/659993/ https://lwn.net/Articles/659993/ Karellen <div class="FormattedComment"> As someone who's written a lot of C over the years, and is used to str*cpy(), mem*cpy(), memmove(), *s*printf(), strftime(), fgets() and fread(), having the destination buffer first seems very natural to me.<br> <p> Yes, I know that read(), gettimeofday(), getrusage(), getrlimit() and others have the destination buffer last, but they seem like the odd ones out to me.<br> <p> (My sense of tidyness is less aggravated by the *_r() functions, as creating an updated API by appending a parameter to the new function seems less disruptive than reordering the existing parameters.)<br> </div> Thu, 08 Oct 2015 18:19:31 +0000 strscpy() and the hazards of improved interfaces https://lwn.net/Articles/659979/ https://lwn.net/Articles/659979/ reubenhwk <div class="FormattedComment"> Truncation, by itself, is not a good idea...but truncation is a better idea than overrunning an array or leaving a string unterminated in the case of an error...In those cases I pick truncation as a better alternative.<br> </div> Thu, 08 Oct 2015 16:43:16 +0000 strscpy() and the hazards of improved interfaces https://lwn.net/Articles/659922/ https://lwn.net/Articles/659922/ pabs <div class="FormattedComment"> Why is truncation a good idea?<br> </div> Thu, 08 Oct 2015 12:28:19 +0000 strscpy() and the hazards of improved interfaces https://lwn.net/Articles/659894/ https://lwn.net/Articles/659894/ PaXTeam <div class="FormattedComment"> there's no third case, i gave an exhaustive categorization ;). your 'third case' is part of my second case and that is exactly where most of the strlcpy uses get it wrong. your remaining comment shows the sad consequences of the mindset that strlcpy advocates have kept spreading: instead of solving the problem properly, just replace one problem with another (which is pretty ironic when you consider that many BSD people used to advocate for 'solutions, not hacks' in the past).<br> </div> Thu, 08 Oct 2015 09:15:23 +0000 strscpy() and the hazards of improved interfaces https://lwn.net/Articles/659891/ https://lwn.net/Articles/659891/ epa <blockquote>either those silent truncations cannot occur in which case strlcpy is utterly useless or they can occur in which case the use of strlcpy is wrong.</blockquote> I would suggest a third: they "cannot occur" as far as the programmer knows, and as far as anyone who has reviewed the code knows - but since the programmer is only human, it is possible he or she has made a mistake. In the case of such a mistake, a silent truncation of the string is less bad than allowing a buffer overflow. <p> I'm not saying this is the best way to do things. Personally, if the truncation "cannot occur", I would rather use a string-copying function which just panics the kernel if that "impossible" condition should ever happen in practice. But I guess the programmers of OpenBSD have their reasons for preferring the way they do it. This is defensive programming: code which is demonstrably wrong, since it is predicated on something which will "never" happen (an input not meeting a precondition, an out-of-range value in something which has already been range-checked earlier, and so on), but which given human fallibility may have some value in practice. Thu, 08 Oct 2015 08:40:23 +0000 strscpy() and the hazards of improved interfaces https://lwn.net/Articles/659885/ https://lwn.net/Articles/659885/ kleptog <div class="FormattedComment"> <font class="QuotedText">&gt; But I could just as easily envision people doing n &lt;= 0 or n == E2BIG instead of n &lt; 0 or n == -E2BIG with strscpy. What matters is the _idiom_ that people will use and adopt.</font><br> <p> This is the kernel we're talking about and there error codes less than zero are used throughout to handle exceptions. Arguably this means this function will fit in perfectly with the rest of the kernel thus reducing errors of this sort. The idiom is the same as for the rest.<br> <p> Now, if they were proposing that *user-space* use this function, you'd have a point because error handling in C is all over the place, even within the C library, so that chance of issues is higher. Fortunately, no-one is proposing strscpy for user space.<br> </div> Thu, 08 Oct 2015 07:37:08 +0000 strscpy() and the hazards of improved interfaces https://lwn.net/Articles/659880/ https://lwn.net/Articles/659880/ epa <div class="FormattedComment"> Isn't it more over what a weak kind of 'string' type C, and C programmers, use by default? If K&amp;R back in the day had defined struct string { size_t s; char * buf } and not resorted to the kludge of 0-termination, most of these issues would go away.<br> </div> Thu, 08 Oct 2015 06:47:13 +0000 strscpy() and the hazards of improved interfaces https://lwn.net/Articles/659869/ https://lwn.net/Articles/659869/ nybble41 <div class="FormattedComment"> <font class="QuotedText">&gt; The advantage is that those lower level bits of code can expose safe function with unsafe blocks inside them, meaning the implementer takes responsibility for correctness but the user does not have to worry, unlike something like Haskell's IO Monad which requires making everything up the call chain also in IO.</font><br> <p> ... but more or less exactly like Haskell's `unsafePerformIO` primitive, which takes an IO block and executes it inside pure code, leaving it up to the implementer to take responsibility for correctness while exposing a safe (i.e. pure) interface.<br> <p> The IO type is for the majority of IO-based functions which do _not_ expose a pure interface.<br> <p> There is also the ST type, which permits mutation of local variables without the possibility of I/O. In this case the language enforces the purity of the interface; only pure values are permitted to escape the ST computation via the `runST` primitive. By taking advantage of rank-2 polymorphism, the language ensures that any attempt to pass a reference to an ST variable outside `runST` and into pure code results in a type error.<br> </div> Thu, 08 Oct 2015 03:37:05 +0000 strscpy() and the hazards of improved interfaces https://lwn.net/Articles/659862/ https://lwn.net/Articles/659862/ jameslivingston <div class="FormattedComment"> Not quite, it's more "exclusive mutability" than immutability. You can mutate memory provided that there cannot be any aliased pointers to the memory (using C terms), because it would be unsafe to mutate memory that something else could be reading to or writing from. If you want to allow that, you need to put the structure inside a sync::Mutex, where getting the reference to the interesting structure acquires the lock for you (and releases the lock when the reference goes out of scope).<br> <p> <p> There are a lot of places in low-level code (collection implementations, concurrency things, etc) where you want to allow multiple mutable references to memory, and Rust lets you do that provided you do it in an block marked "unsafe", which turns off the compiler checking and is the programmer promising to maintain memory safety. The advantage is that those lower level bits of code can expose safe function with unsafe blocks inside them, meaning the implementer takes responsibility for correctness but the user does not have to worry, unlike something like Haskell's IO Monad which requires making everything up the call chain also in IO.<br> <p> <p> Rust used in a kernel would use little to none of the standard Rust library. Unlike most recent languages, it's a library not a runtime - just like C, and the kernel doesn't use lib's malloc().<br> </div> Thu, 08 Oct 2015 02:21:33 +0000 strscpy() and the hazards of improved interfaces https://lwn.net/Articles/659844/ https://lwn.net/Articles/659844/ wahern <p> Per the paper: </p> <blockquote><pre> The standard recommendation in rust is to never write a function that directly returns a boxed object[5]. Instead, the function should return the object by value and the user should place it in a box using the box keyword. This is because (as mentioned in subsubsection 3.1.1) rust will automatically rewrite many functions returning objects to instead use outpointers to avoid a copy. (3.3.1p2 of http://scialex.github.io/reenix.pdf) </pre></blockquote> <p> Rust is designed around the notion of immutability and copy-by-value, with the compiler optimizing the copies away. If you have to use mutable references everywhere you would use a pointer in C (because Rust no longer has pointers at all, AFAIU), wouldn't it make it impossible to write idiomatic Rust code? Mutable types come with significant constraints regarding what you can do with the value and when, so it seems to me like it'd be a heavy burden. But I haven't used Rust before, so probably I'm missing something here. Wed, 07 Oct 2015 22:55:23 +0000 strscpy() and the hazards of improved interfaces https://lwn.net/Articles/659843/ https://lwn.net/Articles/659843/ PaXTeam <div class="FormattedComment"> <font class="QuotedText">&gt; Almost always used wrongly? Care to back that claim up with data?</font><br> <p> check their kernel tree for strlcpy, it's called over 1800 times, 10 of which check the return value (8 of which are in ofdev.c), 2 of them blindly accumulate its return value and the rest do exactly nothing which means potential silent truncation. from here on the argument can go two ways, neither of which is good for your case. either those silent truncations cannot occur in which case strlcpy is utterly useless or they can occur in which case the use of strlcpy is wrong.<br> <p> <font class="QuotedText">&gt; Regarding not checking the return value: many times strlcpy is replacing code that already didn't check for the return value, and added as a stop-gap.</font><br> <p> you've just proved how strlcpy encouraged even more sloppy programming. the copy-paste hoard turned one kind of bug into another. i wouldn't call that progress let alone an example to set for others to follow.<br> <p> <font class="QuotedText">&gt; And strscpy is hardly better relative to strlcpy weak points.</font><br> <p> it is infinitely better as it doesn't waste cycles to compute a useless strlen that pretty much no caller cares about. in other news, you must have never written code that tries to copy out substrings from a big one using strlcpy. face it, strlcpy is a design mistake that should just die.<br> <p> <font class="QuotedText">&gt; And nothing inherent in strscpy forces a developer to check for truncation.</font><br> <p> that's a strawman, nothing forces anybody to check anything at this rate. if people care about callers doing the right thing then there's __must_check in linux (a gcc attribute so it's not really specific to the kernel) but given how nobody cares about truncation (or at least doesn't want to learn about it from str*cpy) i can understand why it's not enforced. IOW, i don't think the return value even matters, if there's a potential of truncation and it matters, the callers will already have to do something else and the return value from str*cpy is irrelevant.<br> <p> <font class="QuotedText">&gt; It's a shame that so many otherwise rational people have adopted such an irrational aversion to strlcpy.</font><br> <p> it's a shame that so many otherwise rational people have adopted such an irrational affection to strlcpy. how about you resort to rational arguments instead of cheap rhetoric?<br> </div> Wed, 07 Oct 2015 22:54:42 +0000 strscpy() and the hazards of improved interfaces https://lwn.net/Articles/659833/ https://lwn.net/Articles/659833/ wahern <div class="FormattedComment"> Almost always used wrongly? Care to back that claim up with data? Anecdotes don't count. But FWIW when OpenBSD first settled on strlcpy, they surveyed their ports tree for misuses of strcpy and strncpy, as they tend to do when developing and adopting these kinds of interfaces.<br> <p> Regarding not checking the return value: many times strlcpy is replacing code that already didn't check for the return value, and added as a stop-gap. More importantly, not checking for a return value is not always a bug, or even usually a bug, in the contexts where you add strlcpy. Checking the return value is neither necessary nor sufficient as a general matter; it's context specific.[1] If the semantics of the code are garbage-in, garbage-out, then not checking for truncation is not necessary. If the semantics are that truncated input could subvert some condition, then checking for truncation is not always sufficient. Coding an algorithm so that garbage-in is safely garbage-out is arguably the most robust way to write secure code. If the data is highly structured, you probably shouldn't be using C strings much. Separating constraint checks from the core algorithm that processes input is an anti-pattern when writing secure code--it's usually better to put constraints on the _actual_ state of the algorithm processing the input. This is why, for example, Lua removed it's bytecode validator--it was neither necessary nor sufficient, and in practice added needless complexity; remove needless complexity and it becomes easier to focus on finding and fixing bugs in the code that matters.<br> <p> And strscpy is hardly better relative to strlcpy weak points. strscpy overloads the return value just like strlcpy does. If you don't check for a failure condition, the length will be too large (extremely large in the case of (size_t)-E2BIG). And nothing inherent in strscpy forces a developer to check for truncation.<br> <p> You could argue truncation checking is slightly less prone to bugs. The obvious issue with strlcpy is using n &gt; lim instead of n &gt;= lim to compare, an off-by-one. But I could just as easily envision people doing n &lt;= 0 or n == E2BIG instead of n &lt; 0 or n == -E2BIG with strscpy. What matters is the _idiom_ that people will use and adopt. And in any event, in both case the misuses are fairly easy to locate using pattern matching.<br> <p> It's a shame that so many otherwise rational people have adopted such an irrational aversion to strlcpy. There's so much poor reasoning involved. Even if it was the worst interface in the world, the situation is compounded by creating conditions where people constantly reimplement it, often with bugs. It's widely included in many projects; attempts at berating people into not using it have manifestly failed. It's like the war on drugs. Yes, drugs harm society--largely as a result of abuse by a small subset of individuals (sorta like the specter of strlcpy misuses). Yes society would be better off without drugs. And yet banning them does't work.<br> <p> <p> [1] Grepping for an unchecked strlcpy in the OpenBSD source tree, the first hit brings me to line 776 in bin/csh/csh.c. rechist is copying a command-line into the history buffer. It's old, crufty code, dating to the 1980s. That edit was made in 2003 and replaced a use of strcpy. There's no easy way to bubble up a truncation error, and panicing or exiting failure on a truncation could break existing code; indeed it could introduce security issues of its own by obscuring the exit status of the command. And truncation in this context is about as benign as these things come in the real world. Would you suggest refactoring all of the logic in csh related to management of the history buffer? Make it all dynamicalliy allocated, as GNU developers insist upon? Reject command-lines longer than a record buffer when in interactive sessions? Yeesh.<br> <p> <p> </div> Wed, 07 Oct 2015 22:05:52 +0000 strscpy() and the hazards of improved interfaces https://lwn.net/Articles/659836/ https://lwn.net/Articles/659836/ josh <div class="FormattedComment"> That depends heavily on what interfaces you're used to. Leaving aside the longstanding convention of "strcpy", which would make swapping the order confusing, putting the destination on the left evokes an assignment: "dest = src;".<br> <p> (See also opinions on AT&amp;T versus Intel assembly syntax.)<br> </div> Wed, 07 Oct 2015 21:11:44 +0000 strscpy() and the hazards of improved interfaces https://lwn.net/Articles/659831/ https://lwn.net/Articles/659831/ reubenhwk <div class="FormattedComment"> Seems like a stretch to use 'wrong' and 'natural' here. It's just a matter of perspective and opinion... Words like "works", "is appropriate", "conforms to", would be much more suitable in this case....<br> </div> Wed, 07 Oct 2015 21:06:10 +0000 strscpy() and the hazards of improved interfaces https://lwn.net/Articles/659812/ https://lwn.net/Articles/659812/ wahern <p> Huh. So that's what's meant by supposedly being free of a race condition? </p> <p> But if src isn't properly NUL terminated there are much bigger problems afoot. strscpy is claiming to the unwary user that it's somehow safer (or worse, safe without qualification) in a context where the src may be corruptible. But that's a guarantee it can't make. An attacker isn't always limited to writing forward in step with the loop in strscpy. I would think such a constraint would be the exception rather than the rule. As src is only being read, what we're concerned with here are information leaks or reads of unmapped memory, both of which are still possible if you can't trust the src. </p> <p> strscpy seems especially problematic for making such a claim. It has the same problems as strlcpy when the user makes the wrong assumptions about the return value, but also intentionally misleads them about the semantics. And for all the ridicule, it still adopts strlcpy's poorly ordered argument list inherited from strncpy which has _actually_ been a factor or cause in many misuses of strlcpy--accidentally passing a src length (or otherwise src-derived value) instead of a destination buffer size. </p> <p> If people were serious about the claimed strlcpy deficiencies, I would think they'd adopt something like error_t strscpy(char *dst, size_t *len, size_t lim, const char *src). It's not a drop-in replacement for strncpy (or strlcpy), requiring people to pay attention to any code refactor, but just as easy to use given that to use the computed length of either strscpy or strlcpy properly you already need a named variable. The error condition and length are communicated via two separate values, significantly reducing the likelihood that the length will be used uncheck. Indeed, the length could even be set to 0 on error. And when inlined it should perform just as well as either alternative. </p> <p> Otherwise, all the debate just seems like bike shedding and NIH syndrome. It seem disingenuous to fault strlcpy for problems that aren't fixed by the alternative. The only difference from strlcpy in the implied misuse scenario is the possibility of a compiler warning, notwithstanding that signed-to-unsigned conversions are legal. And strscpy could be worse because (size_t)-E2BIG is likely to be much bigger than the src length. (OTOH, it could be better, leading to segfaults more quickly.) strlcpy was a pragmatic compromise. It would seem so is strscpy, reflecting a certain set of preferred esthetics and compromises that certainly aren't objectively better than strlcpy. </p> <p> At the very least the inclusion adds weight to the argument that glibc's stance has been misguided all along. These sorts of routines have utility and address a legitimate gap in C's string handling API. The real issue comes down to having to wade through the bike shedding, which I guess I can't fault glibc maintainers for wanting to avoid. musl libc has added strlcpy, though, and by most accounts musl reflects exceptional code quality. Maybe glibc will acquiesce, or at least propose an objectively better interface. </p> Wed, 07 Oct 2015 21:04:48 +0000 strscpy() and the hazards of improved interfaces https://lwn.net/Articles/659827/ https://lwn.net/Articles/659827/ reubenhwk <div class="FormattedComment"> <font class="QuotedText">&gt; strlcpy needs to read src beyond the specified length.</font><br> <p> Only when dst is shorter than src right?<br> <p> <font class="QuotedText">&gt; Not only this slows thing down for no good reason</font><br> <p> The reason is to inform the caller how large dst needs to be for success in the case of a failure. That seems like a very good reason to me. I doubt the extra slowness would be measurable in a typical use case.<br> </div> Wed, 07 Oct 2015 20:57:24 +0000 strscpy() and the hazards of improved interfaces https://lwn.net/Articles/659823/ https://lwn.net/Articles/659823/ riddochc <div class="FormattedComment"> I think this would be a particularly good use of QEMU - one person can write a (relatively) high-level simulation of the hardware that a driver's targeting, and another can write the kernel driver to talk to it. This could help with validating that the kernel works against the way you expect the hardware's interfaces to work.<br> <p> That said, it doesn't mean that the hardware actually works that way. As far as testing goes, this would be a big step forward, but still not quite a substitute for having the actual hardware to test against.<br> <p> QEMU is a remarkable piece of software, made even better by KVM. It enables lots of things that would have been difficult or tedious before, like this sort of testing. It could use more documentation, but it's well worth spending some time with to see what it can offer.<br> <p> </div> Wed, 07 Oct 2015 20:49:58 +0000 strscpy() and the hazards of improved interfaces https://lwn.net/Articles/659822/ https://lwn.net/Articles/659822/ utoddl One could argue that, in each case it is introduced, if the code to handle -E2BIG isn't already written, then there's an existing bug waiting to be hit. Wed, 07 Oct 2015 20:43:19 +0000 strscpy() and the hazards of improved interfaces https://lwn.net/Articles/659821/ https://lwn.net/Articles/659821/ Cyberax <div class="FormattedComment"> Which is wrong. It's natural to place the source first and destination second.<br> </div> Wed, 07 Oct 2015 20:37:16 +0000 strscpy() and the hazards of improved interfaces https://lwn.net/Articles/659820/ https://lwn.net/Articles/659820/ arielb1@mail.tau.ac.il <div class="FormattedComment"> Rust's standard collections library is very intentionally not designed to handle OOM, to keep implementation complexity under control. Rust is perfectly usable without that library - kernel applications probably want to use different data structures anyway.<br> </div> Wed, 07 Oct 2015 20:31:34 +0000 strscpy() and the hazards of improved interfaces https://lwn.net/Articles/659818/ https://lwn.net/Articles/659818/ PaXTeam <div class="FormattedComment"> strlcpy isn't standard anywhere except maybe OpenBSD and even there it's almost always wrongly used (what's with the lack of return value checking?). so no, that thing should not live any longer and it's a pity that it had infected so many minds over time. strscpy is almost sane save for the badly named and placed destination size argument and allowing a size over SSIZE_MAX (both of which are easily fixable of course).<br> </div> Wed, 07 Oct 2015 20:23:30 +0000 strscpy() and the hazards of improved interfaces https://lwn.net/Articles/659816/ https://lwn.net/Articles/659816/ hummassa <div class="FormattedComment"> <font class="QuotedText">&gt; Was there a rant about Rust by Torvalds that I missed, or are you referencing his rants against C++?</font><br> <p> I don't know if there was any specific Rust rant, but his C++ looks really dated these days.<br> </div> Wed, 07 Oct 2015 19:57:19 +0000 strscpy() and the hazards of improved interfaces https://lwn.net/Articles/659814/ https://lwn.net/Articles/659814/ Karellen <div class="FormattedComment"> I don't like the parameter order. I'd have preferred:<br> <p> ssize_t strscpy(char * dest, size_t destlen, char const * src);<br> <p> where the destination buffer and its size are together as the first two parameters, as they are for snprintf(), fgets(), fread(), strftime(), and probably some others that I'm forgetting right now.<br> <p> I realise that this is to be consistent with the parameter order/meaning of strncpy() and strlcpy() (and memset()) - but I think the parameter order for those functions is daft too. Obviously we can't change them, but that doesn't stop them from being Wrong, and it doesn't mean we have to copy them in the future.<br> </div> Wed, 07 Oct 2015 19:49:30 +0000 strscpy() and the hazards of improved interfaces https://lwn.net/Articles/659796/ https://lwn.net/Articles/659796/ madscientist <div class="FormattedComment"> &lt;obpedant&gt;Using "count" in the function declaration is needlessly confusing, IMHO; it wasn't immediately obvious to me that this would or would not include the nul byte. I would prefer a name like "destsz" or similar to make it more clear that it's the size of the destination buffer. Obviously this is basically just documentation, but documentation is important...&lt;/obpedant&gt;<br> </div> Wed, 07 Oct 2015 18:36:48 +0000