Recent improvements in GCC diagnostics
Much of the talk was dedicated to improvements in the ASCII-art output created by the compiler's static analyzer. In the existing GCC 13 release, the compiler is able to quote source code, underline and label source ranges, and provide hints for improving the code. All of this output is created by a module called pretty-print.cc, which has a lot of nice capabilities but which is proving increasingly hard to extend. It does not create two-dimensional layouts well, is not good with non-ASCII text, and its colorization support falls short.
This module tries to explain potential code problems found by the analyzer using text, and "sort-of succeeds", he said. But it is lacking spatial information that would be helpful for developers. If the compiler is complaining about a potential out-of-bounds access, which direction is this access going? Is it before or after the valid area, or perhaps overlapping with it? To illustrate this point, Malcolm showed this example (taken from his slides):
This output describes a potential buffer overflow and provides useful information, but it still may not be enough for the developer to visualize what is really going on. So GCC 14 adds a diagram:
More complex situations can be illustrated as well; see the slides for other examples. There will also be better diagrams for string operations that show, when possible, the actual string literal involved and which handle UTF-8 strings.
All of these pictures are the result of a new text-art module that can do everything provided by pretty-print.cc and quite a bit more. It handles two-dimensional layouts and the full Unicode character set. It has support for color and other text attributes, including "blink" — though he requested that the audience not actually use that feature. It is "round-trippable", meaning that its output can be parsed back into a two-dimensional buffer; this feature will be useful for future diagrams, he said. As a demonstration of what text-art can do, he put up the output from the "most useless GCC plugin ever" — a chessboard.
There is, naturally, still work to be done. One project is a new
#pragma operation to have GCC draw the in-memory layout of a
structure so that developers can see how the individual fields will be
packed. Another is to provide output in the SVG format, though he confided
that he is not sure about how useful that capability will be. "Crude
prototypes" of both features exist, he said.
Moving on to the GCC static analyzer, Malcolm talked about some new features for analyzing C string operations. He implemented a new warning for operations that might be passed an unterminated string, but then took it back out and created a more flexible module that is able to scan for an expected null byte. It can, for example, check format strings for proper null termination, and is able to detect uninitialized bytes in strings as well.
He has added an understanding of the semantics of a number of standard string functions — strcat(), strcpy(), strlen(), and the like. The analyzer is now able to detect operations that will overrun a string buffer, though it only works with fixed-size strings at the moment. More advanced analysis is in the works for the future. There is also a check for overlapping strings passed to strcat(); he said that he wanted to use the restrict keyword to indicate where such checks make sense, but "nobody really understands what restrict does". So, for now, the checker just looks for overlaps in situations where that is not allowed.
Future plans, he said, include implementing a new function attribute to indicate the need for a null-terminated string as a parameter. The visualizations for the diagnostics produced by the analyzer can always use improvement. He would also like to add an understanding of the semantics of more standard-library functions so that their usage can be checked.
The analyzer currently only works with C code; adding the ability to handle C++ is a desired feature. Basic support for C++ does exist now, but it is unsupported, "don't use it". The biggest problem, he said, is that it has no concept of exceptions and is badly confused by code using them, but there are a number of other problems as well. There has been a Google Summer of Code student (Benjamin Priour) working on C++, focusing only on the no-exceptions case for now. The goal is to be able to use the analyzer on GCC itself (GCC has moved to C++, but does not use exceptions). A test suite has been added, and much of the analyzer code has been made able to work with either language. The handling of the C++ new keyword has been improved. There is still a lot to be done, though.
Another project Priour has worked on, also with regard to C++, is improving output when an error is detected deeply within a nested set of system header files. In such cases, a simple mistake can generate pages of output. A new compiler option, the concisely named -fno-analyzer-show-events-in-system-headers option makes all that output go away.
Despite these improvements, Malcolm said, an attempt to use the analyzer with non-trivial C++ code "will still emit nonsense".
Within the analyzer code itself, a new integration-testing suite has been established. Every analyzer patch is tested by building a whole set of projects, including coreutils, Doom, Git, the kernel, QEMU, and several others. The warnings emitted are captured and compared against a baseline to look for regressions (or improvements). The analyzer is now able to use the alloc_size function attribute to check accesses to objects returned by functions. Another feature that might make it into the GCC 14 release is a warning for potential infinite loops. This check is not ready yet; it generates false positives and runs in O(n2) time, neither of which is ideal.
Malcolm concluded with a longer-term goal: improving the handling of errors related to C++ templates. A simple typo in the wrong place can end up generating pages of useless error information. There are various groups trying to figure out what information is actually useful in such situations. The real problem, he said, is that the compiler is still stuck in the 1970s and the batch-mode interaction style that was established then. For more complex errors there really needs to be a more interactive way for developers to explore the situation.
[Thanks to the Linux Foundation, LWN's travel sponsor for supporting my
travel to this event.]
Index entries for this article | |
---|---|
Conference | GNU Tools Cauldron/2023 |
Posted Oct 13, 2023 15:55 UTC (Fri)
by vadim (subscriber, #35271)
[Link] (42 responses)
Why is at this point in time ASCII art still a thing? First, UTF8 supports much nicer output. Second, isn't it about time we just had proper graphical output in the terminal? Some terminals like kitty even support it already, but the functionality is sadly rarely used.
Posted Oct 13, 2023 16:19 UTC (Fri)
by dave_malcolm (subscriber, #15013)
[Link] (9 responses)
FWIW I have a crude implementation of SVG output for this in one of my working copies, which *might* make it into GCC 14, but it's not quite clear to me where the SVG images would go.
What I'd really like is if a terminal supported a "disclosure widget": the ability to hierarchically wrap parts of the output in a way that the user can interactively drill down into the output. That could help a lot with the C++ template issue.
Posted Oct 13, 2023 16:33 UTC (Fri)
by dave_malcolm (subscriber, #15013)
[Link] (4 responses)
FWIW, if you want to experiment with the new diagnostics, the slides are all screenshots from Compiler Explorer; an example visualization of a buffer overflow can be seen at:
Posted Oct 13, 2023 18:24 UTC (Fri)
by ermo (subscriber, #86690)
[Link] (3 responses)
This might potentially make a difference to someone finding themselves in the unfortunate situation of not having access to anything but said linux console...
Posted Oct 13, 2023 23:37 UTC (Fri)
by WolfWings (subscriber, #56790)
[Link] (2 responses)
As soon as you're SSHing into a box or have a full GUI, sticking to CP437 is entirely detrimental and serves no purpose. Expecting actual UTF-8 support, especially when asking to draw diagrams, is a fair request at this point IMHO, even bash prompts in many distros use them now and it's pretty much required for internationalization support of foreign languages.
Posted Oct 14, 2023 15:25 UTC (Sat)
by ermo (subscriber, #86690)
[Link]
Having at least a fallback for a TERM=linux set of glyphs might be useful here.
Posted Oct 14, 2023 17:49 UTC (Sat)
by ballombe (subscriber, #9523)
[Link]
This is my situation everyday I work. I do all my development on the Linux console with vim.
Posted Oct 13, 2023 20:22 UTC (Fri)
by iabervon (subscriber, #722)
[Link] (1 responses)
For similar sorts of operation, I've found that it's really easy and common to lose the console output as I start to try to understand the issue it's reporting. It can also be annoying when it gives a helpful explanation of the code, but only if it finds something wrong.
Maybe it would be useful to have an option of producing the analysis as a standalone HTML document and turning warnings and errors in the console output into short summaries with links? That could also include enough machine-readable information that an IDE displaying GCC's analysis can connect annotated text to the place where the user is editing the code as well as to the right identifiers in the IDE's refactoring tools.
Posted Oct 16, 2023 9:44 UTC (Mon)
by Tobu (subscriber, #24111)
[Link]
Posted Oct 14, 2023 0:20 UTC (Sat)
by Per_Bothner (subscriber, #7375)
[Link]
DomTerm has escape sequences for "fold (show/hide) buttons" along with delimiting sections of the output they apply to. DomTerm also supports "dynamic pretty-printing" (in the Lisp sense): you can add escape sequences to indicate logical grouping and optional line-breaks. Both folding and pretty-printing work on existing (old) output - even if you resize the window width after the application emitting the output has finished.
Here is a "dynamic screenshot": it's an html log of the output combined with some JavaScript that implements folding and pretty-printing. Try clicking the triangles and/or re-sizing the window. You probably want much of the output to initially be in the hidden state, until the "show" button is clicked.
I'm current working on changes to xterm.js (used by vscode, Jupyter, and other projects) to enable this kind of "non-traditional" functionality, possibly using "addons". (Currently, the DomTerm application can use the xterm.js terminal emulator, but without most of the DomTerm extensions. I'm hoping that switching to xterm.js long-term will allow for wider dissemination of this kind of functionality that goes beyond the traditional terminal.)
Posted Oct 15, 2023 18:19 UTC (Sun)
by njs (subscriber, #40338)
[Link]
Posted Oct 13, 2023 16:19 UTC (Fri)
by pbonzini (subscriber, #60935)
[Link]
Posted Oct 14, 2023 14:28 UTC (Sat)
by jengelh (guest, #33263)
[Link]
The usual deterrent is when you are trying to align text in some form. strlen("\x{200b}") = ?, strlen("\x{00e9}") = ?, strlen("\x{0065}\x{0301}") = ?. Though I've written strlen, the issue has nothing to do with programming language/runtime, it applies evereywhere. And something like printf("%-16s", "anything with Unicode") will break if it is not using e.g. ICU to lookup the visual character widths. But people don't want to deal with icu or ncurses or anything like it, but rather work on their program, so they just emit ASCII and call it a day.
Posted Oct 14, 2023 15:54 UTC (Sat)
by wtarreau (subscriber, #51152)
[Link] (28 responses)
Posted Oct 14, 2023 16:33 UTC (Sat)
by mb (subscriber, #50428)
[Link] (21 responses)
And that is exactly why UTF-8 text is broken on your machine.
>you constantly get your terminal mangled with invisible bytes that break code sequences,
That used to happen all the time back in the bad old days where everybody configured some other iso-xxx encoding for their machine and application.
Ascii or any other country specific encoding is only usable, if you only have US american texts on your system. As soon as you receive text from somebody else, it immediately breaks.
Posted Oct 14, 2023 20:21 UTC (Sat)
by dave_malcolm (subscriber, #15013)
[Link] (4 responses)
It's possible to select pure ASCII with -fdiagnostics-text-art-charset=ascii ;
here's the example I posted earlier, but specifying ASCII output. If there are some worthwhile heuristics for sniffing the terminal connection to affect the default, that might be worth considering; we already have some logic for deciding whether to emit SGR codes for embedding URLs so maybe we should do similar for the text-art character set?
Posted Oct 15, 2023 5:26 UTC (Sun)
by wtarreau (subscriber, #51152)
[Link] (2 responses)
Posted Oct 16, 2023 23:12 UTC (Mon)
by dave_malcolm (subscriber, #15013)
[Link] (1 responses)
Posted Oct 16, 2023 23:56 UTC (Mon)
by ABCD (subscriber, #53650)
[Link]
Shouldn't you also be looking at the LC_CTYPE and LC_ALL variables as well if you are looking at LANG (as LC_* overrides LANG and LC_ALL overrides everything)? Additionally, I believe that it is expected that LANG=POSIX and LANG=C should behave identically. Looking further into this, it appears that perhaps the best answer would be to do something like this: Another option might be to test the charset for UTF-8 explicitly, instead of assuming anything that isn't ANSI_X3.4-1968 can support the line drawing characters.
Posted Oct 15, 2023 23:17 UTC (Sun)
by ermo (subscriber, #86690)
[Link]
The only suggestion I have is that you might want to specifically test TERM=linux in a linux virtual console with a LANG=en_US.UTF-8 locale enabled with a few different fonts (e.g. latarcyrheb-(size) and Terminus ter-v(size)n to ensure that the conservative -fdiagnostics-text-art-charset=unicode option works like you intend it to in that scenario?
I believe you can see for yourself which box-art characters are enabled ootb in a linux virtual console in UTF-8 mode for a given console font by invoking `showconsolefont` (part of the kbd package). The outcome may surprise you, and not necessarily in a good way.
Thanks again for engaging. I look forward to be able to take advantage of this new functionality in the future.
Posted Oct 15, 2023 5:21 UTC (Sun)
by wtarreau (subscriber, #51152)
[Link] (15 responses)
Yeah that's what I had been told repeatedly. Due to this, last time I installed a new distro on my machine, I adopted it and rolled back one week later. Too much pain. You just need to have a program you're debugging that accidentally prints a 8-bit byte by accident at the end of stdout to have a garbled terminal. Ditto whenever you grep for something in any of the many text files you wrote in the last 30 years and it prints an accentuated character. This encoding is viral, it only works when 100% of the contents you work with already works and encourages you to convert all your data (including historic ones) and to reinstall all your systems all at once, otherwise you put garbage everywhere. I have way less trouble in iso, occasionally switching to utf-8 for the rare annoying applications that require it to display eye-candy stuff than doing the opposite!
Posted Oct 15, 2023 7:08 UTC (Sun)
by Cyberax (✭ supporter ✭, #52523)
[Link] (6 responses)
You can just spend a couple of hours to set up everything to utf-8 _once_, and it'll keep working forever. Old files can be converted on the as-needed basis. And if they are pure ASCII, then no conversion is even necessary.
Posted Oct 16, 2023 6:32 UTC (Mon)
by wtarreau (subscriber, #51152)
[Link] (5 responses)
Posted Oct 16, 2023 9:13 UTC (Mon)
by Cyberax (✭ supporter ✭, #52523)
[Link] (4 responses)
(Also, one-byte encoding suck. I can tell that as a survivor of KOI-8, CP-1251, CP-855, and the good old GOST standard encoding).
Posted Oct 17, 2023 15:43 UTC (Tue)
by wtarreau (subscriber, #51152)
[Link] (3 responses)
That's exactly why you don't have this problem in the first place.
Posted Oct 17, 2023 16:41 UTC (Tue)
by zdzichu (subscriber, #17118)
[Link] (2 responses)
Posted Oct 17, 2023 19:16 UTC (Tue)
by Wol (subscriber, #4433)
[Link] (1 responses)
The problem appears to be that wtarreau is looking after a LOT of boxes, of assorted ages, many of which predate universal unicode.
And which - for whatever reason - he does not have the ability, or authority, to upgrade.
Cue one unholy mess.
Cheers,
Posted Oct 18, 2023 14:46 UTC (Wed)
by wtarreau (subscriber, #51152)
[Link]
Posted Oct 15, 2023 9:44 UTC (Sun)
by mpr22 (subscriber, #60784)
[Link] (2 responses)
That terminal's UTF-8 mode is seriously defective. There's a well-established norm (print � – Unicode code point U+FFFD REPLACEMENT CHARACTER – and carry on) for how terminals should behave in that situation, and I would very much not describe the result as "garbling" the terminal.
Posted Oct 16, 2023 20:28 UTC (Mon)
by NYKevin (subscriber, #129325)
[Link] (1 responses)
Quite possibly. But this also highlights a useful rule of thumb: Plain text usually isn't.
The vast majority of terminals and terminal emulators in actual use today do not render plain text. They render rich text, using in-band signalling with an ANSI standard set of escape codes, plus a huge variety of non-standard extensions. Those extensions are (poorly) managed by terminfo(5) and the TERM environment variable, which have been subjected to exactly the same problem as the browser User-Agent string (except with xterm instead of Mozilla/5.0). SSH is an especially bad pain point, because the *remote* host's terminfo is consulted rather than the local host (meaning that you cannot synchronize the installation of a new terminal emulator with the installation of its terminfo files, unless you do a simultaneous installation on all machines everywhere that you might possibly want to log into). If I had to guess, I would suggest that this might have nothing whatsoever to do with text encoding, and everything to do with one of those terrible mechanisms malfunctioning in some ridiculous way.
I mean, either that, or it's a terminal from the 90's that still thinks "Unicode" means "UCS-2." But I would like to believe that wtarreau is competent enough to avoid using such a monstrosity after "adopting" UTF-8.
Posted Oct 16, 2023 20:39 UTC (Mon)
by Wol (subscriber, #4433)
[Link]
Cheers,
Posted Oct 15, 2023 14:49 UTC (Sun)
by dvdeug (guest, #10998)
[Link] (4 responses)
Which accented character? You might have been lucky enough to be using ISO-8859-1 since it came out 35 years ago, but just about anyone else might have problems with various Mac, DOS, and character sets supporting other languages. CJK languages all need more space than one codepage will supply.
And that's pretty idiosyncratic. Do you want to view changelogs on Debian? They're UTF-8 encoded, and have the original script names of Arab and Japanese developers, among others. You can't trust that any text files that comes from any where will be encoded in ISO-8859-1.
> it only works when 100% of the contents you work with already works
One can make that complaint about just about any character set that's larger than 8-bit; even some large 8-bit character sets, like CP1252 and worse VISCII (which puts characters in C0 slots), will break stuff that expects ISO-8859-1. The set of characters sets that protect C0 and C1 space and use one byte per character, no combining characters, work fairly well together, even if they may be illegible. But that's not feasible for many, and can still leave people the puzzle of figuring out what character set is supposed to be used to interpret the text.
Posted Oct 16, 2023 6:41 UTC (Mon)
by wtarreau (subscriber, #51152)
[Link] (3 responses)
Yes but this was already well known. All of us coming from the DOS world were used to seeing 1-for-1 replacement. I was even used to reading a "é" when it was written "Ä" on screen. The problem with UTF-8
> Do you want to view changelogs on Debian?
I don't, but there are way less problems reading UTF-8 on ISO than the opposite, because at worst I get a few chars I don't care about and that's all, which is much better than invisible chars remaining stuck in the middle of nowhere, the invisible non-breakable space that some mistakenly insert in their command lines using alt+space that breaks their command-lines, RTL stuff that makes your cursor go wild when editing a line etc.
Don't get me wrong, I do understand that some other languages need more bits to store their characters, I just don't like the huge abuse that's being made by replacing standard chars with new ones that don't bring any value, or even emojis (since when a character needs to contain other colors than the font ones?).
Posted Oct 16, 2023 12:36 UTC (Mon)
by mathstuf (subscriber, #69389)
[Link]
Since people want to be able to express themselves in ways that culture has made common. Unicode is way more descriptive than prescriptive and that's for the best IMNSHO. IRC had :) and whatnot. With more pixels available, people would obviously want to do more too. I'm not the greatest fan of emoji, but it is far better than slinging raw images around.
Posted Oct 16, 2023 14:32 UTC (Mon)
by dvdeug (guest, #10998)
[Link] (1 responses)
Which is Unix's responsibility; had Microsoft had their way, we'd be using UTF-16.
> I do understand that some other languages need more bits to store their characters, I just don't like the huge abuse that's being made by ...
That's a cop-out. None of the complaints above have anything to do with emoji. They all have to do with the inevitable problem with having more bits and both languages that are right-to-left and left-to-right. There's nothing any solution could have done much better in that sense. Either we have a constant length code of 16 or 32 bits, or we have a variable length code like UTF-8, or we have a codepage switching mechanism (all of which have supported CJK have also been variable length; a single byte codepage switching mechanism would be horribly inefficient for Chinese).
Posted Oct 16, 2023 15:38 UTC (Mon)
by rschroev (subscriber, #4164)
[Link]
Even with the fixed-length UTF-32 there is the fact that glyphs are often composed of multiple code points.
None of this is the responsibility of Unix. It's just the consequence of the complexity of human language.
Posted Oct 15, 2023 20:53 UTC (Sun)
by atai (subscriber, #10977)
[Link]
Posted Oct 16, 2023 8:45 UTC (Mon)
by geert (subscriber, #98403)
[Link] (1 responses)
Oops, you forgot to upgrade to iso-8859-15 when trading in your FRF for EUR ;-)
Posted Oct 17, 2023 15:46 UTC (Tue)
by wtarreau (subscriber, #51152)
[Link]
Posted Oct 19, 2023 11:13 UTC (Thu)
by jezuch (subscriber, #52988)
[Link] (2 responses)
Except when dealing with American retailers 🤷♂️
Posted Oct 19, 2023 14:07 UTC (Thu)
by kleptog (subscriber, #1183)
[Link] (1 responses)
Posted Oct 19, 2023 14:38 UTC (Thu)
by Wol (subscriber, #4433)
[Link]
Cheers,
Posted Oct 17, 2023 7:31 UTC (Tue)
by spacefrogg (subscriber, #119608)
[Link]
Second, using weird Unicode glyphs puts a high demand on the used fonts. People have needs and especially those who don't go with the distribution's default font have good reason not to. The more glyphs your output produces the fewer number of fonts you can reliably use on your terminal.
So, I am very happy to read ASCII art, when it get's the point across. There is no shame in keeping simple things simple.
Posted Oct 13, 2023 19:42 UTC (Fri)
by eru (subscriber, #2753)
[Link] (17 responses)
Posted Oct 13, 2023 20:49 UTC (Fri)
by khim (subscriber, #9252)
[Link] (15 responses)
Wasn't that supposed to be fixed with concepts and C++23? I have no idea what actually happened, but if I remember correctly C++20 added concepts themselves to the language and C++23 was supposed to cover standard library with enough concept-related markup to produce nice error messages.
Posted Oct 14, 2023 19:54 UTC (Sat)
by tialaramex (subscriber, #21167)
[Link] (12 responses)
The big problem is that unlike C++ 0x Concepts [which were roughly Rust's traits, but as a C++ feature, and were voted back out of the standard before C++ 11 was published] these concepts aren't actually checked, they're just textual substitution again, like macros, like templates, like so many of the unmaintainable nightmares of C++. So the machine may be as clueless as you are about what the problem really is, and it's very hard to guess what's worth communicating.
For example, suppose I propose to sort some Geese. In Rust, the sort function isn't defined for Geese unless they are Ord and Ord is a named trait for types which have total order. Is a Goose, in fact, Totally Ordered with respect to all others? If so, Goose impl Ord explains how that works, otherwise there's no sort function. So the compiler can say that I can't sort Geese because they aren't Ord, simple.
In C++ the sort function is defined for the concept std::totally_ordered_with<T,U> which basically comes down "there seem to be some comparison operators defined for this type, so, YOLO". Maybe sorting my Geese works? Maybe the result is that my program is silently a ill-formed C++ program with no meaning? Maybe it crashes? Who knows.
We can watch this happen in real time, in Rust my misfortunate::OnewayGreater<T> for example is a wrapper which insists it's always the greatest, and nevertheless a correct Rust sort (including the ones provided by the standard library) will do... something. I mean they can't sort it, because the type is lying, sorted isn't a possible state, but we're guaranteed nothing crazy happens.
The equivalent type in C++ is likely to cause a crash, but honestly anything might happen, anything at all.
Against this backdrop, improving error messages is... fraught.
Posted Oct 14, 2023 21:20 UTC (Sat)
by khim (subscriber, #9252)
[Link] (11 responses)
Sure, but situation is not different from how Sure, but it's not hard to write sort that wouldn't work like that. Just use `unsafe` and you can create all kinds of issues, not too much dissimilar from C++. Which means that in Rust you can not use sort function except if you lie about properties of your type, but even if you do, sort function couldn't use that information, anyway. That's called “defensive programming” and, again, have nothing to do with traits, templates or concepts. No. When requirements of the concept are not satisfied the situation is exactly the same as with Rust, only C++ is a bit more flexible (in Rust trait have to be always implemented explicitly even if it's obvious that type satisfies all the requirements while in C++ trait is implemented implicitly). It's when template uses something not defined in concept C++20 differs from Rust. I would argue that both languages are doing it wrong: But having using both I would say that I much prefer C++ templates and concepts. I don't know why you think templates are “unmaintainable nightmares of C++”: I have used both C++ templates and Rust proc macros and I would say that development and debugging of proc macro is much more error prone and problematic. Just, please, don't compare generics in Rust with C++ templates. They may be similar syntactically, but on semantic level C++ templates have to be compared with Rust's proc macro system, not with generics.
Posted Oct 15, 2023 11:03 UTC (Sun)
by dvdeug (guest, #10998)
[Link] (1 responses)
That's like saying 2+2 in Python is equivalent to BigInteger(2).plus(BigInteger(2)) in Java. I mean, in some sense it is (actually, something more powerful), but 99% of the time you don't need more than 2+2 in Java. Rust generics are designed to cover most of the cases that C++ templates are; if you're talking about those cases, templates and generics are the right comparison.
Posted Oct 15, 2023 14:08 UTC (Sun)
by khim (subscriber, #9252)
[Link]
The problem is that “crazy and impenetrable error messages” that C++ is so famous for are not from these cases. And they can be covered by concepts just fine. It's heavy and convoluted TMP that is intrinsically linked with these. Where you can use Very good example. Yes, 99% of time you don't care about limitations of Java integers. But when you start, specifically, talking about how convoluted and awkward syntax of Java is for user-defined types then the fact that you can use overloaded operators to mix integers with bignums and user-defined types is Python, but not in Java that difference becomes important.
Posted Oct 15, 2023 12:05 UTC (Sun)
by tialaramex (subscriber, #21167)
[Link] (8 responses)
In C++ specifically, the standard just says if a Concept has semantics, those are required, _but_ failing to meet the semantic requirements is Ill-Formed No Diagnostic Required, aka your program has no meaning and may do anything but you may not get compiler errors or warnings.
Yes, it's also true that C++ doesn't check your sort function to ensure that: having said with Concepts that it only requires comparisons it doesn't then try to add things together, or call a method named foo() on them -- but while that contributes further to the terrible diagnostics situation (because now sort may have *secret* requirements on top of the advertised restrictions and who can say whose fault that is when the program doesn't compile?) it's potentially possible for good libraries to get this right, whereas there's nothing to be done about core language YOLO design.
It doesn't really make sense to compare C++ templates to a proc macro. For a start only template meta-programming comes close to the point where a proc macro is necessary, for the sort of trivial templates ordinary C++ programmers are writing the generics actually are the equivalent functionality. Indeed take sort which we already discussed, in Rust sort is a generic function, it isn't a proc macro, and in C++ of course it's a bunch of templates.
My favourite standard library Rust function is generic, and very short, you can't write equivalent C++ but if you could it would necessarily be a template:
pub fn drop<T>(_x: T) { }
But also at the far end of the scale, proc macros are far more powerful. If C++ templates could install software doubtless by now there'd be C++ projects which just use cmake by installing it even if you don't want it. For a proc macro while installing software would be _incredibly_ rude it's nowhere close to the edge of what's possible.
Posted Oct 15, 2023 14:53 UTC (Sun)
by khim (subscriber, #9252)
[Link] (7 responses)
Yes. But only these produce hard-to-penetrate error messages when concepts are properly used which, essentially, means that only these are interesting. And they are done with proc macro in Rust which leads to much harder to decypher error messages than TMP. Not really. In C++ you have a choice between flexibility and nice error messages. And can pick solution from the full range of possibilities. From simple In Rust you also have a choice but it's much more drastic: you either can use traits or macros and that decision can not be easily changed. When in C++ the appropriate course would just say “oh, yeah, we may want to deal with both integers and floats in that code so let's replace Seriously? Is it some kind of sick joke? Show me how to change Proc macro can do arbitrary things outside of Rust, that's true. But inside of Rust they are much more limited than C++ templates. Rust did many things right, but it's metaprogramming capabilities are both harder to use and more limited than in C++. Which would have been acceptable if not for crazy attempts of some Rust zealots to portray C++ templates as some kind of failure. C++ did many things wrong and Rust did many things right, that's true, but specifically templates in C++ are both more powerful and easier to use than Rust's traits. Zig does even better on the “easy to use” front (but the flip-side is much worse “error messages” front). Rust went after simple error messages for simple cases and awful complexity for complex cases. C++ did the other choice. That's jut simple objective fact, i don't understand why is it so hard to admit it. Maybe because if you would admit it then the main perceived problem of C++ templates (awful error messages) would disappear? And if the other, more acute problem, the monomorphisation bloat, would be accepted then the question of “why haven't Rust done like Swift did would raise it's ugly head?”
Posted Oct 15, 2023 16:31 UTC (Sun)
by mb (subscriber, #50428)
[Link] (5 responses)
What do you do with this argument count in a real program?
Posted Oct 15, 2023 17:29 UTC (Sun)
by khim (subscriber, #9252)
[Link] (4 responses)
Real program would, of course, do more than just counting numbers. It may, e.g., transparently and accurately log these parameters. Or marshal them and execute code via RPC. But counting them is just a very simple, minimal and self-contained task. There are lots of uses for reflection capabilities in meta-programming, even that Keynote that failed to a Keynote was mostly about these things. Again: it's Ok to admit that choice that Rust made is more limiting that what C++ did, but produces better error messages. It may be even right thigh to do. I, personally, never had issues with TMP and it's error messages (the biggest practical use were limitations of MSVC), but people are different, some may value hand-holding more than expressivity. It's all about trade-offs and for someone the fact that you couldn't write generic code in Rust as easily as you may in C++ or Zig may be an nice trade-off (because for the use-cases where Rust generics work adequately they do provide better error messages). But all that talk about how C++ templates are awful because they are implicit and thus dangerous is total red herring in my experience: as tialaramex himself noted one may easily lie to the compiler in Rust, too (and this may lead to very dangerous consequences with traits like
Posted Oct 15, 2023 18:21 UTC (Sun)
by mb (subscriber, #50428)
[Link] (3 responses)
I still don't see how counting the number of arguments of otherwise opaque function types helps here.
If you want to wrap an actual function call (or any other statement) and print out the result, then you can actually do that with macros: https://doc.rust-lang.org/std/macro.dbg.html
>Again: it's Ok to admit that choice that Rust made is more limiting that what C++ did
There's nothing to "admit".
There are many more things that are different or just outright missing in Rust. It's why it's called Rust and not C++.
>as tialaramex himself noted one may easily lie to the compiler in Rust, too (and this may lead to very
That is why these traits are unsafe to implement.
>and the fact that in Rust you have to use sort_by instead of sort
I don't get it. When is it not possible to use sort()?
Posted Oct 15, 2023 21:19 UTC (Sun)
by khim (subscriber, #9252)
[Link] (2 responses)
Sure, but what about arguments? Shouldn't I be able to print them? You couldn't even print the number of arguments, let alone their values! You can play some trick and expand `dbg` a tiny bit, but you can not do things which in C++ not just possible, but easy. Template `LogTiming` that receives function as argument and then returns another, wrapped, function that can be used in place of original one is something developers with C++ or Java background take for granted. But Rust couldn't do anything like that. Proc macro is limited to the list of token it receives and couldn't do anything about context, it doesn't have access to types, variables and so on. When something is not implemented “by design” it doesn't become subject of keynote (even if, ultimately, failed one). And again: classes are not supported because nobody knows how to do them safely (it's not clear whether they can even be done safely at all), and exceptions are, now, officially supported. In reality they were always supported, Rust just pretended that they don't work. When you are dealing with types that don't have a total order. Like But are you sure that someone who would just port PHP-style comparison to Rust would even think twice about how this comparison behaves?
Posted Oct 15, 2023 23:22 UTC (Sun)
by tialaramex (subscriber, #21167)
[Link]
This means in Rust we can express the idea that when we succeed we'll exit immediately, since that's just a Try where our branch() turns success into ControlFlow::Break
Do you think the ABI stability for C-unwind constitutes "official support" for exceptions? I believe you've badly misunderstood. With this ABI if you've got some C++ code X, which calls some Rust code Y, and then the Rust code calls some further C++ code Z, but Z throws, it's OK if X catches it. Rust will get out of the way, and technically this might be survivable. This does *not* provide for a C++ exception thrown by Z to be somehow "caught" by Rust in Y, nor for a Rust panic in Y to be "caught" as an exception in X.
As to sort_by() yes, this is much clearer. Our hypothetical PHP-porter is confronted with the question, how do they provide a function (or lambda) that does f(&a: f32, &b: f32) -> Ordering ? Let's look at two likely choices:
1. They write their own, but, Rust forces them to either decide what they meant, or, panic. Chances are they decide to panic for NaN and various other tricky cases. If the software never sees a NaN this works, if it does they panic. Everything remains totally safe unlike C++ where it's (say it with me) Ill-Formed No Diagnostic Required and so our program had no defined meaning even if this sort never occurred. [If you said "Undefined Behaviour" you're wrong, UB is a runtime occurrence, this is IFNDR which is much worse and happens during compilation]
2. They find f32::total_cmp which matches exactly the desired signature. This function will cheerfully order every 32-bit floating point value. Does it do what you expected? Maybe. But it definitely puts them in some agreed order.
In C++ of course they do less work, and as a reward they get... undefined results. Very on brand.
Posted Oct 16, 2023 6:03 UTC (Mon)
by mb (subscriber, #50428)
[Link]
I just think you are demanding a solution for a problem that doesn't exist in the real world.
Of course, you can also print the arguments, if you want that:
If you want to print the number of arguments, then just println!("2"); in this case.
>and exceptions are, now, officially supported
No, they aren't.
>But are you sure that someone who would just port PHP-style comparison to Rust would even think twice about how this comparison behaves?
Yes, I agree that PHP people certainly wouldn't think twice before doing things.
Posted Oct 15, 2023 22:50 UTC (Sun)
by tialaramex (subscriber, #21167)
[Link]
Sure, for each kArgumentsCounttemplate proceed as follows. First, we need a counter, let's call that k and set it initially to zero. Now, fork the compiler, it's fine, we're a proc macro so we're "allowed" to do that (obviously it's a terrible idea, but so was Template Meta-Programming and look where we are now...) and attempt to compile code which calls the function with k arguments of inferred type, if that won't compile we blow up the compiler with a chosen error code, otherwise we blow up with a different error code, the surviving compiler reaps the error code and either continues (forking again with k += 1) or it knows the final value.
Alternatively, and I kinda like this approach, "just" do full blown analysis as the runtime helpers do (via the Language Server protocol) so that we can ask ourselves the answer to this question (or other reflective questions) immediately. If we find that we woke up in a compiler which lacks our retro-fitted analysis feature that's fine, emit such an analysis, link it to the compiler and replace the running compiler with our improved synthetic one in the ordinary way.
Are either of these a good idea? No. But you see nor is kArgumentsCounttemplate and yet...
Posted Oct 17, 2023 11:33 UTC (Tue)
by jwakely (subscriber, #60262)
[Link] (1 responses)
You do not remember correctly. There was no plan to "cover standard library with enough concept-related markup". The existing parts of the standard library are largely untouched in new standards, we don't spend months/years retrofitting new features into everything.
Posted Oct 17, 2023 11:36 UTC (Tue)
by jwakely (subscriber, #60262)
[Link]
Posted Oct 14, 2023 14:10 UTC (Sat)
by ibukanov (subscriber, #3942)
[Link]
Recent improvements in GCC diagnostics
Recent improvements in GCC diagnostics
Recent improvements in GCC diagnostics
https://godbolt.org/z/9Y5qscE5Y
Recent improvements in GCC diagnostics
Recent improvements in GCC diagnostics
Recent improvements in GCC diagnostics
Recent improvements in GCC diagnostics
It is pointless for a C compiler to require X/Wayland.
Anyway as far as I am concerned the shorter the error messages are the better, I prefer to read my code.
Recent improvements in GCC diagnostics
Browsers can render HTML as it is being streamed, by the way. With some care this could be a log format.
Recent improvements in GCC diagnostics
"What I'd really like is if a terminal supported a "disclosure widget": the ability to hierarchically wrap parts of the output in a way that the user can interactively drill down into the output."
Recent improvements in GCC diagnostics
Recent improvements in GCC diagnostics
Recent improvements in GCC diagnostics
Recent improvements in GCC diagnostics
Recent improvements in GCC diagnostics
Recent improvements in GCC diagnostics
>backspace that sometimes fails to remove offending bytes or even eats the prompt
Since everybody uses UTF-8 these problems are completely gone.
I added an option to control what unicode characters GCC will use for these diagrams.Recent improvements in GCC diagnostics
Recent improvements in GCC diagnostics
LANG=C, you might already have the info you need internally.
FWIW I've now added a special-case so that GCC will default to pure ASCII for such diagrams if LANG=C is in the environment.
The patch is here.
Recent improvements in GCC diagnostics
Recent improvements in GCC diagnostics
#include <langinfo.h>
/* ... */
const char *charset = nl_langinfo (CODESET);
/* If the current locale's charset is ASCII, don't assume that the terminal supports anything else. */
if (!strcmp (charset, "ANSI_X3.4-1968"))
text_art_charset = DIAGNOSTICS_TEXT_ART_CHARSET_ASCII;
diagnostics_text_art_charset_init (context, text_art_charset);
Recent improvements in GCC diagnostics
Recent improvements in GCC diagnostics
Recent improvements in GCC diagnostics
Recent improvements in GCC diagnostics
Recent improvements in GCC diagnostics
Recent improvements in GCC diagnostics
Recent improvements in GCC diagnostics
Recent improvements in GCC diagnostics
Wol
Recent improvements in GCC diagnostics
Recent improvements in GCC diagnostics
You just need to have a program you're debugging that accidentally prints a 8-bit byte by accident at the end of stdout to have a garbled terminal.
Recent improvements in GCC diagnostics
Recent improvements in GCC diagnostics
Wol
Recent improvements in GCC diagnostics
Recent improvements in GCC diagnostics
is the variable size that breaks when facing unexpected sequences, particularly the rollback since it was decided that it was probably robust enough to support backspace instead of storing it into a buffer. As a result the linux terminal itself is broken. Just boot on a console with init=/bin/sh, set your locale to latin1, press "é" then backspace and discover how you eat the prompt. I mentioned this 10+ years ago already and was told "we know but it would be difficult to do better"...
Recent improvements in GCC diagnostics
Recent improvements in GCC diagnostics
Recent improvements in GCC diagnostics
Recent improvements in GCC diagnostics
Recent improvements in GCC diagnostics
Recent improvements in GCC diagnostics
Recent improvements in GCC diagnostics
Recent improvements in GCC diagnostics
Recent improvements in GCC diagnostics
Wol
Recent improvements in GCC diagnostics
Recent improvements in GCC diagnostics
Recent improvements in GCC diagnostics
Recent improvements in GCC diagnostics
> The equivalent type in C++ is likely to cause a crash, but honestly anything might happen, anything at all.
Recent improvements in GCC diagnostics
unsafe
traits and unsafe
functions work in Rust. That difference have nothing whatsoever to do with templates or concepts.
• The requirement to always have explicitly implemented trait is onerous, awful and problematic. Its frequent source of frustration and kludges that people use to paper over it (newtype and macros) are not pretty.
• The fact that there are no way in C++ to even say “this function is not supposed to be using anything except for what it explicitly `requires`” means that it's very hard to rely on some property of some type that it's not listed in the concept. That, too, is frequent source of frustration.Recent improvements in GCC diagnostics
> Rust generics are designed to cover most of the cases that C++ templates are; if you're talking about those cases, templates and generics are the right comparison.
Recent improvements in GCC diagnostics
std::is_same_v
, std::enable_if_t
, if constexpr
and other such things. And Rust generics are completely unsuitable for these usecases, only macros can do that something similar.Recent improvements in GCC diagnostics
> For a start only template meta-programming comes close to the point where a proc macro is necessary
Recent improvements in GCC diagnostics
auto foo(auto x, auto y) { return x + y; }
to nicely-defined library with concepts and everything in between.int
arguments with auto
” in Rust you immediately have to deal with bazillion concepts even if your goal is simple and desiring to test your algorithm with 32bit floats and 64bit floats to gauge it's stability.kArgumentsCount
template variable into procmacro:
int main() {
constexpr auto SinArguments = kArgumentsCount<sin>;
constexpr auto PowArguments = kArgumentsCount<pow>;
constexpr auto FmaArguments = kArgumentsCount<fma>;
std:: cout << std::format("sin have {} argument(s)\n", SinArguments);
std:: cout << std::format("pow have {} argument(s)\n", PowArguments);
std:: cout << std::format("pow have {} argument(s)\n", FmaArguments);
}
Recent improvements in GCC diagnostics
Recent improvements in GCC diagnostics
Send
and Sync
) and the fact that in Rust you have to use sort_by
instead of sort
doesn't make code any safer for the user of float
s: if someone needs to sort float
s then s/he would do that and would, most likely, don't even think twice about the fact that “to avoid bugs” Rust makes that really inconveniet.Recent improvements in GCC diagnostics
Rust doesn't implement many C++ features by choice. It is limited by design.
The most visible things that Rust doesn't implement are classes and exceptions.
>dangerous consequences with traits like Send and Sync)
In unsafe blocks you can do unsafe things. That's why they exist.
> I If you want to wrap an actual function call (or any other statement) and print out the result, then you can actually do that with macros: https://doc.rust-lang.org/std/macro.dbg.html
Recent improvements in GCC diagnostics
f32
or f64
. And, apparently, the fact that you can sort them with std::sort
in C++ but have to use sort_by is supposed to make everything so much better.Recent improvements in GCC diagnostics
Recent improvements in GCC diagnostics
As I said: Rust does not implement all features of all other programming languages. It doesn't implement many C++ features by choice. There's nothing to "admit".
https://play.rust-lang.org/?version=stable&mode=debug...
That's why PHP is what PHP is.
Recent improvements in GCC diagnostics
Recent improvements in GCC diagnostics
Recent improvements in GCC diagnostics
Recent improvements in GCC diagnostics