C considered dangerous

Posted Aug 29, 2018 22:44 UTC (Wed) by iabervon (subscriber, #722)
Parent article: C considered dangerous

I'm not sure another argument to memcpy would help; unlike with strcpy, you have the number of bytes that would be copied anyway. The common cases for getting it wrong are (a) you copy something into a buffer that you think is big enough, but is actually not the size you expected, in which case you'd just pass the wrong length and (b) you copy something into a buffer that's big enough, but you're not putting it at the beginning, in which case you'd probably use the total length instead of the remaining length.

Also, chances seem slim that the caller will do something secure in the case where memcpy stops early, given that they didn't think to compare lengths and do something particular in that situation.

I could see having an "end of allocation" value that you get from your allocator and pass around along with your pointers, which has the advantage that getting it wrong is likely to be extremely obvious, it's hard to synthesize it yourself (which you might get wrong), and you don't need to adjust it (ditto).

C considered dangerous

Posted Aug 30, 2018 1:12 UTC (Thu) by balkanboy (guest, #94926) [Link] (1 responses)

Sound good, though ultimately I'd like kernel devs to adopt Rust as their main Linux kernel development language. Beats the crap out of C and C++ combined.

C considered dangerous

Posted Sep 2, 2018 11:23 UTC (Sun) by xav (guest, #18536) [Link]

> There are a number of challenges in doing security development for the kernel, Cook said. There are cultural boundaries due to conservatism within the kernel community;

C considered dangerous

Posted Aug 30, 2018 4:44 UTC (Thu) by quotemstr (subscriber, #45331) [Link] (14 responses)

What you want is basically memcpy_s: https://docs.microsoft.com/en-us/cpp/c-runtime-library/re...

It's surprisingly useful. I wish C11 Annex K (the standard that describes the *_s functions) had taken off outside the Windows world. The annex K functions have much nicer properties than the BSD-style "safe" ones (like strlcpy). In particular...

> I'm not sure another argument to memcpy would help; unlike with strcpy, you have the number of bytes that would be copied anyway

Sometimes, you use only part of a larger buffer. Passing in the buffer size can catch situations in which your partial-buffer size accidentally grows larger than the buffer.

> Also, chances seem slim that the caller will do something secure in the case where memcpy stops early

You're right: there's nothing secure that can be done generically in this situation. That's why memcpy_s just aborts. In the kernel, that would be a panic. A panic is just a DoS and is much better than an escalation of privilege attack.

C considered dangerous

Posted Aug 30, 2018 16:48 UTC (Thu) by epa (subscriber, #39769) [Link] (11 responses)

They look useful, but...

If the source and destination overlap, the behavior of memcpy_s is undefined.

How hard would it have been to fix that sharp edge at the same time as the other things? It could just be defined as equivalent to memmove_s() in that case. Hopefully a future version of the standard will tidy that up.

C considered dangerous

Posted Aug 30, 2018 21:49 UTC (Thu) by zlynx (guest, #2285) [Link] (10 responses)

Only people using x86 compatible Core2 or later CPUs think this is a good idea.

memcpy and memmove are *NOT IDENTICAL*.

That extra comparison to determine copy direction costs time and branch prediction failure. The only reason everyone seems to think it's free is that common CPU types now run ahead and prime the branch prediction.

I had evidence from oprofile in 2005 that showed memmove was most definitely slower than memcpy. That was on Pentium 3 and 4. Pipeline bubbles on P4 were painful.

Just like all the other sharp edges in C like pointer aliasing, programmers need to *read the documentation* and *know what they're doing* when using memcpy. Having it blow up their program is entirely fair.

C considered dangerous

Posted Aug 31, 2018 6:07 UTC (Fri) by epa (subscriber, #39769) [Link] (1 responses)

Having it blow up the program would be fine. But the standard says the behaviour is undefined. That allows for silent memory corruption or indeed anything else. If I wanted something marginally faster but which would magnify the effect of programmer error, I would use plain memcpy(). The whole point of the _s variants is to add extra safety, even if it does cost a couple of extra instructions.

C considered dangerous

Posted Aug 31, 2018 6:40 UTC (Fri) by iabervon (subscriber, #722) [Link]

It would probably be good enough to define memcpy_s() to write to only locations in the destination, and write values that are in the corresponding point in the source either before the operation or afterwards, independently for each location. If you don't realize the buffers overlap, and you intend to keep using the source buffer, memmove doesn't save you anyway. It's worth allowing memcpy_s to have the obvious performance optimization without also allowing the compiler to infer that the buffers don't overlap and start doing really unexpected things.

C considered dangerous

Posted Sep 4, 2018 5:19 UTC (Tue) by ncm (guest, #165) [Link] (7 responses)

Generally nowadays memmove is *faster* than memcpy, by a remarkably wide margin for smallish N.

Complicating the interface for this and similar functions amounts to piling deck chairs on the Titanic into an obstacle course.

It is more than a little stupid to write a new program in C. All the typical pitfalls of pointer manipulations just don't arise in C++. C++ does still have integer overflow UB, as does Rust, and anyway unsigned integers wrapping around is nowhere near so benign as many like to believe.

There would be no problem with C++ in the Linux kernel if not for mindless bigotry, with many fewer opportunities for silly-bugger mistakes, and both faster and shorter code. It will happen eventually. It might happen first in FreeBSD, as the blinkered ancients there are closer to retirement.

C considered dangerous

Posted Sep 4, 2018 6:46 UTC (Tue) by quotemstr (subscriber, #45331) [Link] (5 responses)

> Generally nowadays memmove is *faster* than memcpy, by a remarkably wide margin for smallish N.

That's absurd nonsense. You must be talking about different implementations of memcpy and memmove. so it's not an apples-to-apples comparison. You can always implement memcpy by taking memmove and baking in the memory-direction branches, which means that no matter how efficient your memmove, you can generate an even _more_ efficient memcpy out of it.

I can imagine memmove and memcpy being equally fast, but memmove being faster, by doing more work? Something is clearly wrong with the comparison.

memmove faster than memcpy for small N

Posted Sep 5, 2018 23:03 UTC (Wed) by jreiser (subscriber, #11027) [Link] (4 responses)

The key is "for small N". The glibc implementation of memcpy favors speed for non-small N, and has several checks and conditional branches before doing much "useful work". The penalty for a mis-predicted conditional branch is often about 12 cycles. The implementation of memmove has fewer initial checks (mostly for overlap or not), and starts doing "useful work" sooner. For small N, the fewer mis-predicted conditional branches enable faster memmove.

memmove faster than memcpy for small N

Posted Sep 5, 2018 23:45 UTC (Wed) by quotemstr (subscriber, #45331) [Link] (3 responses)

That makes no sense. A memcpy could turn those branches into unconditional jumps and avoid any mispredict penalties.

memmove faster than memcpy for small N

Posted Sep 6, 2018 3:42 UTC (Thu) by jreiser (subscriber, #11027) [Link] (2 responses)

Code, please. I'll bet a beer that your code will be slower for small N (or incorrect.)

memmove faster than memcpy for small N

Posted Sep 6, 2018 6:55 UTC (Thu) by liw (subscriber, #6379) [Link]

I think both sides need to provide code and benchmarks to settle this question. Hypotheticals and claims based on mental models only go far. Even arguing based on code is insuffident, given how complex an issue performance analysis can be.

I tried writing a benchmark. GCC optimised it away.

memmove faster than memcpy for small N

Posted Sep 6, 2018 7:14 UTC (Thu) by quotemstr (subscriber, #45331) [Link]

No, you write the code. I'm not the one making the absurd claim. A program is never made slower by converting always-taken conditionals into jumps.

C considered dangerous

Posted Sep 21, 2018 18:14 UTC (Fri) by mathstuf (subscriber, #69389) [Link]

> C++ does still have integer overflow UB, as does Rust

No, Rust defines integer overflow as twos-complement (though in debug builds it will panic). You can use an always-panicking addition however, but that's not the default.

C considered dangerous

Posted Sep 1, 2018 16:12 UTC (Sat) by rurban (guest, #96594) [Link] (1 responses)

> there's nothing secure that can be done generically in this situation. That's why memcpy_s just aborts. In the kernel, that would be a panic. A panic is just a DoS and is much better than an escalation of privilege attack.

Wrong. memcpy_s does not abort, it returns an error code. With the safeclib (which can be used with the kernel) it returns ESNOSPC: https://github.com/rurban/safeclib/blob/master/src/mem/me...

The kernel code can then decide what to do. It doesn't need to be a panic.

Regarding overlap: memcpy_s detects EOVERFLOW and ESOVRLP. The slow variant memmove_s does allow overlapping regions, but is much slower. People demand a fast memcpy. The safeclib with clang5+ even has a faster memcpy_s than the optimized glibc assembler variants. clang can do these optimizations much better than the handwritten assembler. gcc not so.

C considered dangerous

Posted Sep 1, 2018 17:31 UTC (Sat) by quotemstr (subscriber, #45331) [Link]

> Wrong. memcpy_s does not abort, it returns an error code.

Wrong. Annex K functions call the global error handler on constraint violation, which almost always aborts, at least in any sane implementation. Yes, you can set it to do something else, and if the error handler returns, memcpy_s will return an error code. I would not recommend using the functions this way.

C considered dangerous

Posted Aug 30, 2018 6:59 UTC (Thu) by anton (subscriber, #25547) [Link] (10 responses)

I'm not sure another argument to memcpy would help; unlike with strcpy, you have the number of bytes that would be copied anyway.

Yes. So now the programmer has two numbers, and only one parameter. What will he do? Will he write a long sequence that takes both numbers into account? Or will he just call memcpy() with the source length, because he knows that the destination buffer is bug enough? Maybe he is right in his knowledge, and maybe that knowledge does not become wrong during maintenance. But would you bet your program's security on that?

Better give him a library function with two length parameters, and an appropriate reaction (probably reporting an error) if the destination buffer is too small.

C considered dangerous

Posted Aug 30, 2018 7:47 UTC (Thu) by rschroev (subscriber, #4164) [Link]

> ... because he knows that the destination buffer is bug enough

Well played, sir :)

C considered dangerous

Posted Aug 31, 2018 20:46 UTC (Fri) by rweikusat2 (subscriber, #117920) [Link] (8 responses)

I'm sorry but this is nonsense. memcpy copies a fixed number of bytes from a source buffer to a destination buffer. This means it's up to other code to ensure that the destination buffer is large enough. Considering that the number of bytes which will be copied is necessarily known, that's easy enough to do. This is a code correctness issue and the bug (buffer to small) doesn't go away because memcpy compares some random numbers.

I mean, your fictional programmer who can't be trusted to get something as simple as that right suddenly can be trusted to always pass another parameter reflecting the actual size of the buffer correctly? How so? Why shouldn't he just pass (size_t)-1 because he knows "the buffer will be big enough", IOW, doesn't care?

C considered dangerous

Posted Sep 1, 2018 14:04 UTC (Sat) by epa (subscriber, #39769) [Link] (7 responses)

Yes, a sloppy and careless programmer will write dangerous code with memcpy_s() or any other routine. And the elite-level programmer will use memcpy() and do so safely, because he or she has carefully tracked the sizes of all buffers and not made any mistakes. It’s the average schmoes in the middle, who do our best but are fallible, who can be helped by an interface that takes some redundant information.

C considered dangerous

Posted Sep 3, 2018 15:49 UTC (Mon) by rweikusat2 (subscriber, #117920) [Link] (6 responses)

All humans are fallible (and all computer are prone to mechanical defects:-) but that's besides the point.

The interesting question is "Does one gain anything tangible by passing a destination buffer size to a block memory copy routine which already takes a number-of-bytes-to-copy parameter" and the answer is no: The application programmer still has to ensure that the buffer whose address is passed is large enough so that the intended number of bytes can be copied into it. Ie, the only change is that the programmer now has to track two lengths instead of one. Assuming that tracking one length correctly is already considered too difficult for people to do reliably, how can the situation possibly be improved by introducing another length value application programmers have to worry about?

C considered dangerous

Posted Sep 5, 2018 8:56 UTC (Wed) by anton (subscriber, #25547) [Link] (5 responses)

Let's take a look at an example:

char buf[BUFLEN];

And we want to copy len bytes from src to buf, and we know that len<=BUFLEN. Doing the copy with

memcpy(buf,src,len);

would be correct, as would be

memcpy_s(buf,BUFLEN,src,len);

Now, during maintenance, there are changes that can cause len>BUFLEN. Now the first variant has a buffer overflow, while the second does not (but, as written, does not report the error). Tracking BUFLEN and len is not the problem here; it's tracking whether the assumption implied in the memcpy-using code is true.

The tangible benefit that one gets with the memcpy_s variant is that we do not get a buffer overflow even if there is a mistake in tracking whether the assumption is true. True, we do not get that benefit if we write the wrong value as the second parameter of memcpy_s, but that's a less likely mistake in this case, and probably in many others.

You can make the assumption explicit by writing

assert(len<BUFLEN);

before the memcpy, but then you have to "track two lengths" again, and in addition, assertion checking can be disabled, or the length in the memcpy can be changed without changing the assertion.

C considered dangerous

Posted Sep 5, 2018 9:08 UTC (Wed) by johill (subscriber, #25196) [Link] (3 responses)

To be fair, in your example the size of the buffer could also be changed without changing BUFLEN (which is your objection to the assert), so you should write

memcpy_s(buf,sizeof(buf),src,len);

in this case, I guess.

C considered dangerous

Posted Sep 5, 2018 9:23 UTC (Wed) by anton (subscriber, #25547) [Link]

Yes, that's even better. I was actually thinking about the maintenance programmer changing the len parameter of memcpy (to, say, len+a-b), and forgetting to change the assertion; there are often stale comments, so I would not be surprised about stale assertions (unless the tests cause them to trigger).

C considered dangerous

Posted Sep 5, 2018 10:12 UTC (Wed) by excors (subscriber, #95769) [Link]

Just be careful if someone might move that into a separate function like "void do_copy(char buf[BUFLEN], ...) { memcpy_s(buf, sizeof(buf), ...); }", and either ignores compiler warnings or is using an older compiler, since sizeof(buf) is now 8 regardless of BUFLEN. C is fun. Better to use a decent ARRAY_SIZE macro (like the one in the Linux kernel) that fails to compile if the argument is not an array.

C considered dangerous

Posted Sep 5, 2018 10:21 UTC (Wed) by pizza (subscriber, #46) [Link]

Unless, of course, 'buf' is just a raw pointer -- then sizeof(buf) won't give you what you want.

(Personally, most uses of 'memcpy' in my code involve assembling chunks of data into a buffer. Nearly all of the memcpy()s' destinations are at nonzero offsets of the original buffer, so memcpy_s() really doesn't do me any good...)

C considered dangerous

Posted Sep 5, 2018 20:12 UTC (Wed) by rweikusat2 (subscriber, #117920) [Link]

Well, for the sake of the contrived example, let's assume that "during maintenance",

memcpy_s(buf, BUFLEN, src, len)

is changed to

memcpy_s(src, len, buf, BUFLEN)

now, there's no buffer overflow but likely a very serious memory corruption problem.

There's exactly no reason for this assumption except that's it's another way to break the working code but there wasn't any reason for your assumption, either.