User: Password:
|
|
Subscribe / Log in / New account

C considered dangerous

[LWN subscriber-only content]

Welcome to LWN.net

The following subscription-only content has been made available to you by an LWN subscriber. Thousands of subscribers depend on LWN for the best news from the Linux and free software communities. If you enjoy this article, please consider accepting the trial offer on the right. Thank you for visiting LWN.net!

Free trial subscription

Try LWN for free for 1 month: no payment or credit card required. Activate your trial subscription now and see why thousands of readers subscribe to LWN.net.

By Jake Edge
August 29, 2018
LSS NA

At the North America edition of the 2018 Linux Security Summit (LSS NA), which was held in late August in Vancouver, Canada, Kees Cook gave a presentation on some of the dangers that come with programs written in C. In particular, of course, the Linux kernel is mostly written in C, which means that the security of our systems rests on a somewhat dangerous foundation. But there are things that can be done to help firm things up by "Making C Less Dangerous" as the title of his talk suggested.

He began with a brief summary of the work that he and others are doing as part of the Kernel Self Protection Project (KSPP). The goal of the project is to get kernel protections merged into the mainline. These protections are not targeted at protecting user-space processes from other (possibly rogue) processes, but are, instead, focused on protecting the kernel from user-space code. There are around 12 organizations and ten individuals working on roughly 20 different technologies as part of the KSPP, he said. The progress has been "slow and steady", he said, which is how he thinks it should go.

[Kees Cook]

One of the main problems is that C is treated mostly like a fancy assembler. The kernel developers do this because they want the kernel to be as fast and as small as possible. There are other reasons, too, such as the need to do architecture-specific tasks that lack a C API (e.g. setting up page tables, switching to 64-bit mode).

But there is lots of undefined behavior in C. This "operational baggage" can lead to various problems. In addition, C has a weak standard library with multiple utility functions that have various pitfalls. In C, the content of uninitialized automatic variables is undefined, but in the machine code that it gets translated to, the value is whatever happened to be in that memory location before. In C, a function pointer can be called even if the type of the pointer does not match the type of the function being called—assembly doesn't care, it just jumps to a location, he said.

The APIs in the standard library are also bad in many cases. He asked: why is there no argument to memcpy() to specify the maximum destination length? He noted a recent blog post from Raph Levien entitled "With Undefined Behavior, Anything is Possible". That obviously resonated with Cook, as he pointed out his T-shirt—with the title and artwork from the post.

Less danger

He then moved on to some things that kernel developers can do (and are doing) to get away from some of the dangers of C. He began with variable-length arrays (VLAs), which can be used to overflow the stack to access data outside of its region. Even if the stack has a guard page, VLAs can be used to jump past it to write into other memory, which can then be used by some other kind of attack. The C language is "perfectly fine with this". It is easy to find uses of VLAs with the -Wvla flag, however.

But it turns out that VLAs are not just bad from a security perspective, they are also slow. In a micro-benchmark associated with a patch removing a VLA, a 13% performance boost came from using a fixed-size array. He dug in a bit further and found that much more code is being generated to handle a VLA, which explains the speed increase. Since Linus Torvalds has declared that VLAs should be removed from the kernel because they cause security problems and also slow the kernel down; Cook said "don't use VLAs".

Another problem area is switch statements, in particular where there is no break for a case. That could mean that the programmer expects and wants to fall through to the next case or it could be that the break was simply forgotten. There is a way to get a warning from the compiler for fall-throughs, but there needs to be a way to mark those that are truly meant to be that way. A special fall-through "statement" in the form of a comment is what has been agreed on within the static-analysis community. He and others have been going through each of the places where there is no break to add these comments (or a break); they have "found a lot of bugs this way", he said.

Uninitialized local variables will generate a warning, but not if the variable is passed in by reference. There are some GCC plugins that will automatically initialize these variables, but there are also patches for both GCC and Clang to provide a compiler option to do so. Neither of those is upstream yet, but Torvalds has praised the effort so the kernel would likely use the option. An interesting side effect that came about while investigating this was a warning he got about unreachable code when he enabled the auto-initialization. There were two variables declared just after a switch (and outside of any case), where they would never be reached.

Arithmetic overflow is another undefined behavior in C that can cause various problems. GCC can check for signed overflow, which performs well (the overhead is in the noise, he said), but adding warning messages for it does grow the kernel by 6%; making the overflow abort, instead, only adds 0.1%. Clang can check for both signed and unsigned overflow; signed overflow is undefined, while unsigned overflow is defined, but often unexpected. Marking places where unsigned overflow is expected is needed; it would be nice to get those annotations put into the kernel, Cook said.

Explicit bounds checking is expensive. Doing it for copy_{to,from}_user() is a less than 1% performance hit, but adding it to the strcpy() and memcpy() families are around a 2% hit. Pre-Meltdown that would have been a totally impossible performance regression for security, he said; post-Meltdown, since it is less than 5%, maybe there is a chance to add this checking.

Better APIs would help as well. He pointed to the evolution of strcpy(), through strncpy() and strlcpy() (each with their own bounds flaws) to strscpy(), which seems to be "OK so far". He also mentioned memcpy() again as a poor API with respect to bounds checking.

Hardware support for bounds checking is available in the application data integrity (ADI) feature for SPARC and is coming for Arm; it may also be available for Intel processors at some point. These all use a form of "memory tagging", where allocations get a tag that is stored in the high-order byte of the address. An offset from the address can be checked by the hardware to see if it still falls within the allocated region based on the tag.

Control-flow integrity (CFI) has become more of an issue lately because much of what attackers had used in the past has been marked as "no execute" so they are turning to using existing code "gadgets" already present in the kernel by hijacking existing indirect function calls. In C, you can just call pointers without regard to the type as it just treats them as an address to jump to. Clang has a CFI-sanitize feature that enforces the function prototype to restrict the calls that can be made. It is done at runtime and is not perfect, in part because there are lots of functions in the kernel that take one unsigned long parameter and return an unsigned long.

Attacks on CFI have both a "forward edge", which is what CFI sanitize tries to handle, and a "backward edge" that comes from manipulating the stack values, the return address in particular. Clang has two methods available to prevent the stack manipulation. The first is the "safe stack", which puts various important items (e.g. "safe" variables, register spills, and the return address) on a separate stack. Alternatively, the "shadow stack" feature creates a separate stack just for return addresses.

One problem with these other stacks is that they are still writable, so if an attacker can find them in memory, they can still perform their attacks. Hardware-based protections, like Intel's Control-Flow Enforcement Technology (CET), provides a read-only shadow call stack for return addresses. Another hardware protection is pointer authentication for Arm, which adds a kind of encrypted tag to the return address that can be verified before it is used.

Status and challenges

Cook then went through the current status of handling these different problems in the kernel. VLAs are almost completely gone, he said, just a few remain in the crypto subsystem; he hopes those VLAs will be gone by 4.20 (or whatever the number of the next kernel release turns out to be). Once that happens, he plans to turn on -Wvla for the kernel build so that none creep back in.

There has been steady progress made on marking fall-through cases in switch statements. Only 745 remain to be handled of the 2311 that existed when this work started; each one requires scrutiny to determine what the author's intent is. Auto-initialized local variables can be done using compiler plugins, but that is "not quite what we want", he said. More compiler support would be helpful there. For arithmetic overflow, it would be nice to see GCC get support for the unsigned case, but memory allocations are now doing explicit overflow checking at this point.

Bounds checking has seen some "crying about performance hits", so we are waiting impatiently for hardware support, he said. CFI forward-edge protection needs link-time optimization (LTO) support for Clang in the kernel, but it is currently working on Android. For backward-edge mitigation, the Clang shadow call stack is working on Android, but we are impatiently waiting for hardware support for that too.

There are a number of challenges in doing security development for the kernel, Cook said. There are cultural boundaries due to conservatism within the kernel community; that requires patiently working and reworking features in order to get them upstream. There are, of course, technical challenges because of the complexity of security changes; those kinds of problems can be solved. There are also resource limitations in terms of developers, testers, reviewers, and so on. KSPP and the other kernel security developers are still making that "slow but steady" progress.

Cook's slides [PDF] are available for interested readers; before long, there should be a video available of the talk as well.

[I would like to thank LWN's travel sponsor, the Linux Foundation, for travel assistance to attend the Linux Security Summit in Vancouver.]


Did you like this article? Please accept our trial subscription offer to be able to see more content like it and to participate in the discussion.


(Log in to post comments)

C considered dangerous

Posted Aug 29, 2018 21:46 UTC (Wed) by boudewijn (subscriber, #14185) [Link]

Well, I read Levien's blog post, and agreed, and got the t-shirt. Not for me, but for my wife; I don't wear t-shirts. But it was a very good read, and a good design. I wonder, was it Levien's own?

C considered dangerous

Posted Aug 30, 2018 16:06 UTC (Thu) by raph (subscriber, #326) [Link]

I art-directed it. The actual execution of the image was by "dbeast32," as credited on my blog.

C considered dangerous

Posted Aug 31, 2018 18:24 UTC (Fri) by khim (subscriber, #9252) [Link]

Indeed, it's an excellent read, but that phrase about "this can’t be done with a simple model where pointers are just numbers. The best way to understand the actual C standard is that programs run on an exotic, complex machine" just totally surprised me.

I mean: look on this exotic, complex machine. Does it look familiar to you? Well, it should, there are billion or so of it's descendants, you probably used one of them at some point. How does it illustrate aliasing feature of C? Very simple: 8088 does not deal with floating point numbers, 8087 does. And, more importantly, in these early years that separate piece of silicone used DMA to write results of operations - completely asynchronyously from operations executed on main CPU. Which means that if you write float into memory and then read int from that some piece of memory - you could get old value, new value, or, theoretically, even some mix of these two (although that later version never happen on original IBM PC). Heck, it's easy to imagine simple modification of that scheme where attempt to access the same memory from both 8088 and 8087 would just crash the whole system... or corrupt memory somewhere else.

Keep that real world example of aliasing violation disaster in mind when you read specs... and suddenly they stop being exotic and complex and become easy to see and understand! No need for tagging and virtual machines, just an old piece of silicone which is very simple (by today's standards).

C considered dangerous

Posted Aug 31, 2018 22:48 UTC (Fri) by zlynx (subscriber, #2285) [Link]

Small nitpick and a pet peeve: CPUs are built with silicon. Breast enhancements are built with silicone.

C considered dangerous

Posted Aug 29, 2018 22:44 UTC (Wed) by iabervon (subscriber, #722) [Link]

I'm not sure another argument to memcpy would help; unlike with strcpy, you have the number of bytes that would be copied anyway. The common cases for getting it wrong are (a) you copy something into a buffer that you think is big enough, but is actually not the size you expected, in which case you'd just pass the wrong length and (b) you copy something into a buffer that's big enough, but you're not putting it at the beginning, in which case you'd probably use the total length instead of the remaining length.

Also, chances seem slim that the caller will do something secure in the case where memcpy stops early, given that they didn't think to compare lengths and do something particular in that situation.

I could see having an "end of allocation" value that you get from your allocator and pass around along with your pointers, which has the advantage that getting it wrong is likely to be extremely obvious, it's hard to synthesize it yourself (which you might get wrong), and you don't need to adjust it (ditto).

C considered dangerous

Posted Aug 30, 2018 1:12 UTC (Thu) by balkanboy (subscriber, #94926) [Link]

Sound good, though ultimately I'd like kernel devs to adopt Rust as their main Linux kernel development language. Beats the crap out of C and C++ combined.

C considered dangerous

Posted Sep 2, 2018 11:23 UTC (Sun) by xav (subscriber, #18536) [Link]

> There are a number of challenges in doing security development for the kernel, Cook said. There are cultural boundaries due to conservatism within the kernel community;

C considered dangerous

Posted Aug 30, 2018 4:44 UTC (Thu) by quotemstr (subscriber, #45331) [Link]

What you want is basically memcpy_s: https://docs.microsoft.com/en-us/cpp/c-runtime-library/re...

It's surprisingly useful. I wish C11 Annex K (the standard that describes the *_s functions) had taken off outside the Windows world. The annex K functions have much nicer properties than the BSD-style "safe" ones (like strlcpy). In particular...

> I'm not sure another argument to memcpy would help; unlike with strcpy, you have the number of bytes that would be copied anyway

Sometimes, you use only part of a larger buffer. Passing in the buffer size can catch situations in which your partial-buffer size accidentally grows larger than the buffer.

> Also, chances seem slim that the caller will do something secure in the case where memcpy stops early

You're right: there's nothing secure that can be done generically in this situation. That's why memcpy_s just aborts. In the kernel, that would be a panic. A panic is just a DoS and is much better than an escalation of privilege attack.

C considered dangerous

Posted Aug 30, 2018 16:48 UTC (Thu) by epa (subscriber, #39769) [Link]

They look useful, but...
If the source and destination overlap, the behavior of memcpy_s is undefined.
How hard would it have been to fix that sharp edge at the same time as the other things? It could just be defined as equivalent to memmove_s() in that case. Hopefully a future version of the standard will tidy that up.

C considered dangerous

Posted Aug 30, 2018 21:49 UTC (Thu) by zlynx (subscriber, #2285) [Link]

Only people using x86 compatible Core2 or later CPUs think this is a good idea.

memcpy and memmove are *NOT IDENTICAL*.

That extra comparison to determine copy direction costs time and branch prediction failure. The only reason everyone seems to think it's free is that common CPU types now run ahead and prime the branch prediction.

I had evidence from oprofile in 2005 that showed memmove was most definitely slower than memcpy. That was on Pentium 3 and 4. Pipeline bubbles on P4 were painful.

Just like all the other sharp edges in C like pointer aliasing, programmers need to *read the documentation* and *know what they're doing* when using memcpy. Having it blow up their program is entirely fair.

C considered dangerous

Posted Aug 31, 2018 6:07 UTC (Fri) by epa (subscriber, #39769) [Link]

Having it blow up the program would be fine. But the standard says the behaviour is undefined. That allows for silent memory corruption or indeed anything else. If I wanted something marginally faster but which would magnify the effect of programmer error, I would use plain memcpy(). The whole point of the _s variants is to add extra safety, even if it does cost a couple of extra instructions.

C considered dangerous

Posted Aug 31, 2018 6:40 UTC (Fri) by iabervon (subscriber, #722) [Link]

It would probably be good enough to define memcpy_s() to write to only locations in the destination, and write values that are in the corresponding point in the source either before the operation or afterwards, independently for each location. If you don't realize the buffers overlap, and you intend to keep using the source buffer, memmove doesn't save you anyway. It's worth allowing memcpy_s to have the obvious performance optimization without also allowing the compiler to infer that the buffers don't overlap and start doing really unexpected things.

C considered dangerous

Posted Sep 1, 2018 16:12 UTC (Sat) by rurban (guest, #96594) [Link]

> there's nothing secure that can be done generically in this situation. That's why memcpy_s just aborts. In the kernel, that would be a panic. A panic is just a DoS and is much better than an escalation of privilege attack.

Wrong. memcpy_s does not abort, it returns an error code. With the safeclib (which can be used with the kernel) it returns ESNOSPC: https://github.com/rurban/safeclib/blob/master/src/mem/me...

The kernel code can then decide what to do. It doesn't need to be a panic.

Regarding overlap: memcpy_s detects EOVERFLOW and ESOVRLP. The slow variant memmove_s does allow overlapping regions, but is much slower. People demand a fast memcpy. The safeclib with clang5+ even has a faster memcpy_s than the optimized glibc assembler variants. clang can do these optimizations much better than the handwritten assembler. gcc not so.

C considered dangerous

Posted Sep 1, 2018 17:31 UTC (Sat) by quotemstr (subscriber, #45331) [Link]

> Wrong. memcpy_s does not abort, it returns an error code.

Wrong. Annex K functions call the global error handler on constraint violation, which almost always aborts, at least in any sane implementation. Yes, you can set it to do something else, and if the error handler returns, memcpy_s will return an error code. I would not recommend using the functions this way.

C considered dangerous

Posted Aug 30, 2018 6:59 UTC (Thu) by anton (subscriber, #25547) [Link]

I'm not sure another argument to memcpy would help; unlike with strcpy, you have the number of bytes that would be copied anyway.
Yes. So now the programmer has two numbers, and only one parameter. What will he do? Will he write a long sequence that takes both numbers into account? Or will he just call memcpy() with the source length, because he knows that the destination buffer is bug enough? Maybe he is right in his knowledge, and maybe that knowledge does not become wrong during maintenance. But would you bet your program's security on that?

Better give him a library function with two length parameters, and an appropriate reaction (probably reporting an error) if the destination buffer is too small.

C considered dangerous

Posted Aug 30, 2018 7:47 UTC (Thu) by rschroev (subscriber, #4164) [Link]

> ... because he knows that the destination buffer is bug enough
Well played, sir :)

C considered dangerous

Posted Aug 31, 2018 20:46 UTC (Fri) by rweikusat2 (subscriber, #117920) [Link]

I'm sorry but this is nonsense. memcpy copies a fixed number of bytes from a source buffer to a destination buffer. This means it's up to other code to ensure that the destination buffer is large enough. Considering that the number of bytes which will be copied is necessarily known, that's easy enough to do. This is a code correctness issue and the bug (buffer to small) doesn't go away because memcpy compares some random numbers.

I mean, your fictional programmer who can't be trusted to get something as simple as that right suddenly can be trusted to always pass another parameter reflecting the actual size of the buffer correctly? How so? Why shouldn't he just pass (size_t)-1 because he knows "the buffer will be big enough", IOW, doesn't care?

C considered dangerous

Posted Sep 1, 2018 14:04 UTC (Sat) by epa (subscriber, #39769) [Link]

Yes, a sloppy and careless programmer will write dangerous code with memcpy_s() or any other routine. And the elite-level programmer will use memcpy() and do so safely, because he or she has carefully tracked the sizes of all buffers and not made any mistakes. It’s the average schmoes in the middle, who do our best but are fallible, who can be helped by an interface that takes some redundant information.

C considered dangerous

Posted Sep 3, 2018 15:49 UTC (Mon) by rweikusat2 (subscriber, #117920) [Link]

All humans are fallible (and all computer are prone to mechanical defects:-) but that's besides the point.

The interesting question is "Does one gain anything tangible by passing a destination buffer size to a block memory copy routine which already takes a number-of-bytes-to-copy parameter" and the answer is no: The application programmer still has to ensure that the buffer whose address is passed is large enough so that the intended number of bytes can be copied into it. Ie, the only change is that the programmer now has to track two lengths instead of one. Assuming that tracking one length correctly is already considered too difficult for people to do reliably, how can the situation possibly be improved by introducing another length value application programmers have to worry about?

C considered dangerous

Posted Aug 30, 2018 3:27 UTC (Thu) by nevets (subscriber, #11875) [Link]

I'm curious to how shadow stacks will affect function graph tracer and kretprobes. Both rely on modifying the return address of the function in order to inject a trampoline to trace the function return.

C considered dangerous

Posted Aug 30, 2018 7:14 UTC (Thu) by marcH (subscriber, #57642) [Link]

> Uninitialized local variables will generate a warning, but not if the variable is passed in by reference.

BTW do the compiler options of the kernel still reject C99 variable declarations and if yes why?
https://stackoverflow.com/questions/288441/variable-decla...

To more easily gauge stack sizes maybe?

C considered dangerous

Posted Aug 30, 2018 9:52 UTC (Thu) by mina86 (subscriber, #68442) [Link]

> do the compiler options of the kernel still reject C99 variable declarations

[citation needed] for compiler options *ever* rejecting C99 variable declarations. And if they used to they definitely no longer do, just ‘git grep 'for (int '’ to find a few examples.

It’s true that predominant style is to declare variables at the start of a function (I also like sorting declarations from the ones that are the longest ;) ).

C considered dangerous

Posted Aug 30, 2018 10:13 UTC (Thu) by lkundrak (subscriber, #43452) [Link]

> I also like sorting declarations from the ones that are the longest

I recently learned that this is called a "Reverse Christmas Tree"

C considered dangerous

Posted Aug 30, 2018 11:06 UTC (Thu) by jani (subscriber, #74547) [Link]

> And if they used to they definitely no longer do, just ‘git grep 'for (int '’ to find a few examples.

I occasionally do git grep popularity contests across the kernel tree to get a feeling whether the use of some function or construct or style is generally accepted or needs caution. Your suggested git grep leads to 28 matches, of which only 3 are actual kernel code and the rest are tools or scripts. Of about 83k for loops in the kernel tree. Conclusion, don't use it.

Variable declarations inline in code are harder to grep, but the short gut feeling answer is, don't use them either. The declaration in the for loop is more likely to be accepted I think.

> It’s true that predominant style is to declare variables at the start of a function

Not just at the start of a function. Declarations at the start of a block are widely used.

Uniform style serves a purpose too, as it makes the code faster and easier for the human readers to grasp.

C considered dangerous

Posted Aug 30, 2018 12:30 UTC (Thu) by error27 (subscriber, #8346) [Link]

In the kernel we use -Wdeclaration-after-statement. It causes a compile error. I introduced one just to show what the warning looks like:

drivers/video/fbdev/sm501fb.c: In function ‘sm501fb_start’:
drivers/video/fbdev/sm501fb.c:1613:2: error: ‘for’ loop initial declarations are only allowed in C99 or C11 mode
for (int k = 0; k < (256 * 3); k++)
^~~

It's an aesthetic preference. https://lkml.org/lkml/2005/12/13/223

One thing that I do is I have "foo->bar" and I want to find out what type it is, so i highlight "bar" then I scroll to the top of the function where "foo" is declared and then I use cscope to jump to the type definition. So it's nice if you kind of know all the variables are declared at the start of the function.

C considered dangerous

Posted Aug 30, 2018 12:34 UTC (Thu) by mina86 (subscriber, #68442) [Link]

> In the kernel we use -Wdeclaration-after-statement.

I stand corrected.

C considered dangerous

Posted Aug 30, 2018 13:43 UTC (Thu) by excors (subscriber, #95769) [Link]

> One thing that I do is I have "foo->bar" and I want to find out what type it is, so i highlight "bar" then I scroll to the top of the function where "foo" is declared and then I use cscope to jump to the type definition.

That sounds like a consequence of not using an IDE that has a decent go-to-declaration feature (which would work for local variables and would properly understand scopes, #includes, macros, etc). And you probably wouldn't even need to look at the original type definition, you could just put the mouse over 'bar' to get a tooltip showing its type, or press the autocomplete key after 'foo->' to see all the members and their types, or use one of several other features.

So it's not only about aesthetic preferences of coding style, it's also about personal choice of editing tool that makes certain styles harder to work with. Since people get very attached to their tools, that seems an impossible conflict to resolve.

C considered dangerous

Posted Aug 30, 2018 14:37 UTC (Thu) by error27 (subscriber, #8346) [Link]

Hm... It turns out that Vim can jump to the declaration. I feel dumb now.

I'm not actually too tied to Vim though and I really don't like cscope. I'm not eager to use KDevelop. What other IDEs are people using on Debian?

C considered dangerous

Posted Aug 31, 2018 7:11 UTC (Fri) by cpitrat (subscriber, #116459) [Link]

Well there's emacs, which supports it too.

Finding the declaration of a variable in Emacs

Posted Sep 3, 2018 22:08 UTC (Mon) by giraffedata (subscriber, #1954) [Link]

Well there's emacs, which supports it too.

Do tell. How do I point to a local variable use and have Emacs show me where its declaration is?

I know about Etags - But I don't think this is among its functions.

Searching LKML?

Posted Aug 30, 2018 17:39 UTC (Thu) by marcH (subscriber, #57642) [Link]

> It's an aesthetic preference. https://lkml.org/lkml/2005/12/13/223

Thanks for remembering this and digging out that sub-thread buried in another massive and totally unrelated thread, impressive feat. Now because lkml.org just went down right while I was browsing it, I went and found some alternative locations:

https://marc.info/?l=linux-kernel&m=113449080512291
https://groups.google.com/d/msg/linux.kernel/e0W-NpF0Kr0/... (click on "Topic options->Tree view" below the Subject: to help with the thread dilution issue)

I was once again amazed at how time-consuming it can be to search LKML archive(s) *even when* I already had the complete message information from lkml.org still cached in my browser window!

Eventually I found that (my?) Google groups' default search option was sorting by date which is useless: https://groups.google.com/forum/#!searchin/linux.kernel/%...
Simply asking to "Sort by relevance" fixes it: https://groups.google.com/forum/#!searchin/linux.kernel/%...

Any additional "lkml search education" welcome (and sorry for being off-topic)

C considered dangerous

Posted Aug 30, 2018 17:58 UTC (Thu) by marcH (subscriber, #57642) [Link]

> It's an aesthetic preference. https://lkml.org/lkml/2005/12/13/223

Well, I'm disappointed: I expected some technical reason and not just a pseudo-"readability" choice. As pointed out there and/or in the stackoverflow discussion, if you can't easily find types in your function then you have a bigger problem because the function is too big in the first place.
In some cases having the type closer to where it's used will actually make the type more obvious.
Afraid people don't look at types often anyway - even less often than they should.

In any case this entire readability debate weighs little compared to uninitialized variables which compilers may not all catch + the noise of compilers reporting false positives. No modern language forces declarations at the top, it's a legacy. Safer languages even tend to force the opposite.

C considered dangerous

Posted Aug 30, 2018 12:26 UTC (Thu) by adobriyan (guest, #30858) [Link]

> do the compiler options of the kernel still reject C99 variable declarations and if yes why?

Unfortunately, yes:
error: 'for' loop initial declarations are only allowed in C99 or C11 mode

Kernel is -std=gnu89.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/...

C considered dangerous

Posted Aug 30, 2018 13:38 UTC (Thu) by cyrus (subscriber, #36858) [Link]

Clang was frequently mentioned in this article. If I followed the discussions correctly, Clang can successfully compile the kernel for various ARM flavors. The Chromebook Pixel, for instance, is shipped with a Clang-compiled kernel. What's the situation for x86, though? Last I heard the kernel now requires asm-goto for x86 which Clang does not support. What's the status of asm-goto support for Clang? When can we expect that the kernel will "just compile" and the compiled kernel will "just run" on x86?

C considered dangerous

Posted Aug 30, 2018 14:33 UTC (Thu) by cesarb (subscriber, #6266) [Link]

The minimum gcc version to compile the kernel has just been raised to 4.6 on all architectures (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/...), and clang still pretends to be gcc 4.2, so the kernel won't compile with clang anymore. It's not just a matter of the version number; the fallback code for compatibility with old gcc versions has been removed, and AFAIK clang doesn't implement all the new gcc features.

C considered dangerous

Posted Aug 30, 2018 17:16 UTC (Thu) by marcH (subscriber, #57642) [Link]

> The Chromebook Pixel, for instance, is shipped with a Clang-compiled kernel.

Much more than the Pixelbook, currently all devices running chromeos-4.4 and chromeos-4.14:
https://chromium.googlesource.com/chromiumos/overlays/chr...

That's a fairly large number of devices: https://www.chromium.org/chromium-os/developer-informatio...
(not the one expected but the "Year of the Linux desktop" has finally arrived :-)

Hint: check Kees' current employer.

> What's the situation for x86, though?

? like most chromebooks the Pixelbook is x86

> When can we expect that the kernel will "just compile" and the compiled kernel will "just run" on x86?

For reference:
https://chromium.googlesource.com/chromiumos/third_party/...
https://chromium.googlesource.com/chromiumos/third_party/...

C considered dangerous

Posted Aug 30, 2018 16:41 UTC (Thu) by smitty_one_each (subscriber, #28989) [Link]

> a function pointer can be called even if the type of the pointer does not match the type of the function being called—assembly doesn't care, it just jumps to a location

Is C a honey badger at heart?

C considered dangerous

Posted Aug 30, 2018 22:50 UTC (Thu) by marcH (subscriber, #57642) [Link]

> He noted a recent blog post from Raph Levien entitled "With Undefined Behavior, Anything is Possible".

tl;dr: for safety C is broken beyond repair. As a meta-assembler it was a "stroke of genius" but it just can't scale beyond that.

> https://raphlinus.github.io/programming/rust/2018/08/17/u...
> > If one’s interest is in safe, portable code – C can be a very fine choice. One must use it well, though [...] C’s sharp edges can be managed safely two ways, at least: One is through careful use of well-designed coding standards. Large program authors should make key architectural decisions very early on, define a safe, constrained, style – and have the team stick to that style.

In other words: 1. either your C project has a very expensive team of world-class rock stars, or: 2. your software is full of bugs and security holes cut by C' sharp edges. As observed in the mainstream (!) news almost every week.

PS: C programmers shouldn't worry about their retirement; COBOL people can apparently still make a lot of money :-) https://www.npr.org/sections/money/2018/01/10/576879734/e...

Switch-case fall-through dangerous?

Posted Aug 31, 2018 9:19 UTC (Fri) by pr1268 (subscriber, #24648) [Link]

where there is no break for a case. That could mean that the programmer expects and wants to fall through to the next case

Pity that this is considered a "dangerous" aspect of the C language; I have been known to exploit case fall-through in some of my own code. E.g. printing out a date based on an input string (think format string on date (1)):

case 'Y':
  /* print century 2-digits only */
  /* NO break; here! */
case 'y':
  /* print 2-digit year */
  break;

I would imagine this is slightly faster (and smaller code) than isolating each case separately, but perhaps I'm wrong here... Of course, I was (am) careful to document this verbosely with a comment.

Switch-case fall-through dangerous?

Posted Aug 31, 2018 12:02 UTC (Fri) by karkhaz (subscriber, #99844) [Link]

Certainly fall-through is useful, but the vast majority of case statements I've seen don't use it; this suggests that it should not be the default. With hindsight, breaking ought to be the default, and there should be a keyword called `fall` which does that explicitly.

The article mentions that the comment is used by "the static analysis community," can anyone elaborate on this? In particular, is there an external tool that goes through the code looking for (the absence of) these comments? If so, I wonder whether a compiler extension might be better than an external tool; instead of a comment, there could actually be a `fall` keyword, together with a -Wfall-thru compiler switch that emits a warning if it sees a case that has neither a `break` nor a `fall`. With any luck, this keyword might even be adopted into the standard if it's useful. It would basically be a no-op in terms of emitted machine code, but the compiler could see and warn about its absence.

Switch-case fall-through dangerous?

Posted Aug 31, 2018 12:26 UTC (Fri) by dezgeg (subscriber, #92243) [Link]

The compiler switch already exists for GCC 7, it is -Wimplicit-fallthrough. And it knows to how to parse a comment like /* fall through */ as described.

The usage of comments as "keywords" to guide static analysis is a pretty old thing. The convention dates back at least to the original 'lint' tool for UNIX from 1978! See: http://files.cnblogs.com/files/bangerlee/10.1.1.56.1841.pdf

Switch-case fall-through dangerous?

Posted Aug 31, 2018 12:34 UTC (Fri) by karkhaz (subscriber, #99844) [Link]

Thanks for this information! Though the use of comments seems ugly, like it would complicate the design of the compiler...compilers should be free to lex the comments out without considering the possibility that they contain interpretable information. On the other hand, I suppose comments are better than nonstandard keywords, which would tie you to using only compilers that understand them.

Switch-case fall-through dangerous?

Posted Aug 31, 2018 19:00 UTC (Fri) by khim (subscriber, #9252) [Link]

Well, you could use C++, where it's standard keyword instead.


Copyright © 2018, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds