|
|
Subscribe / Log in / New account

Coccinelle for Rust

By Daroc Alden
September 30, 2024

Kangrejos 2024

Tathagata Roy has been working to make the Coccinelle tool that is used (among other things) to automate the refactoring of C code work on Rust code as well. Roy gave a presentation at Kangrejos about that work, including the creative approaches necessary to work with Rust's more complicated control flow and syntax.

Roy opened by describing the purpose of Coccinelle within the kernel. Linux's code is huge, and frequently a change to some internal API will need to be reflected across a large number of drivers or other kernel components; Coccinelle allows maintainers to write patches in a special language that can be automatically applied across the entire code base, making it easier to make changes that have a broad impact. This kind of change requires a specialized tool since parsing C source code is not trivial.

Although there is much less Rust code in the kernel, it would be nice if the same tool that existing kernel maintainers are familiar with also worked on Rust code. Roy presented an example of what that might look like; in short, authors would follow the same process as for C code. To construct a semantic patch for Coccinelle, the programmer picks a typical diff for the desired change, and then makes it more generic by adding "expression variables". An example of a patch for a Rust API change might look like this:

    @@
    expression tcx, arg;
    @@

    - tcx.type_of(arg)
    + tcx.bound_type_of(arg).subst_identity()

That patch finds uses of .type_of() and rewrites them to call a different function, even if the method is being invoked on or with complex expressions.

Under the surface, however, correctly supporting Rust requires a lot of thought. The biggest issue Roy highlighted is more complex control flow. In order to apply patches correctly, Coccinelle needs to match patch expressions to the abstract syntax tree (AST) of the program. This requires knowledge of the program's control flow, Roy explained. In Rust, this includes wrinkles such as if expressions. It's valid to write code like this in Rust:

    println!("{}", if boolean { "string 1" } else { "string 2" })

This can lead to some counterintuitive control-flow graphs. The center of Roy's recent work has been getting the representation of Rust programs into a form that can be processed by Coccinelle's existing pattern-matching algorithm. His approach works by doing a preorder traversal of the AST, converting it into a representation of the control flow in the program, and then matching on that, instead of on the raw AST. This approach works, but still has some problems, he said. It results in "huge" graphs, which take time for Coccinelle to process, and make debugging difficult.

The graphs can be compressed by 4-5 times by eliminating intermediate nodes with only one child, but that does not present a complete solution. Another issue with this representation is dealing with metavariables (such as tcx and arg, above), Roy explained. Metavariables can stand in for any expression, which makes matching them to parts of the control-flow graph somewhat tricky. So, to make matching easier, the graph is also decorated with extra edges that point to the next child (or closest cousin) of a node. This makes it much easier to efficiently determine where an expression ends.

[Tathagata Roy]

This new internal representation isn't the only work-in-progress for Coccinelle. Roy explained that Coccinelle for Rust had recently added an ellipsis operator that can stand in for any control-flow path connecting two nodes. This makes it easier to write rules that match two different call sites separated by some arbitrary code.

Matching a pattern that contains an ellipsis requires considering different potential matches, which means considering multiple alternatives, so it is a form of disjunction. Previously, Coccinelle only permitted disjunctions of expressions; now, Roy is working on generalizing disjunctions to include arbitrary statements. This has proved particularly difficult for Rust code, because of Coccinelle's approach to parsing. Coccinelle's C parser is hand-written, and can therefore include extensions for parsing Coccinelle-specific constructs in semantic patches into the same structure as the adjacent C code. Coccinelle's Rust parser, on the other hand, is the stock parser provided by the rust-analyzer project — because Roy did not want to rewrite an entire Rust parser and worry about keeping it up to date. This makes parsing Coccinelle's syntax for disjunctions tricky.

The solution Roy has selected is to convert disjunctions into Rust macros by simple textual substitution, parse the resulting Rust code with rust-analyzer, and then recognize the macro name and treat it specially. This has the advantage that it permits easier parsing, but the disadvantage that it doesn't allow disjunctions of attributes, enum alternatives, or other constructs that can't be wrapped in macros. Roy emphasized that support for disjunctions was still a work in progress.

There are two other problems that Roy wants to see resolved before calling the work complete: allowing Coccinelle to work on actual macros (which he described as "a pain in the AST") and processing patches in parallel. When these are finished, Coccinelle will hopefully prove to be as useful a tool for Rust programmers as it is for C programmers. People wishing to use the current (experimental) support for Rust can find Roy's development version in the CoccinelleForRust repository — although the linked branch does not include the in-progress support for disjunctions.


Index entries for this article
KernelDevelopment tools/Coccinelle
ConferenceKangrejos/2024


to post comments

Ternary expression?

Posted Sep 30, 2024 23:29 UTC (Mon) by Paf (subscriber, #91811) [Link] (7 responses)

I don’t know rust well at all, but how is this:
println!("{}", if boolean { "string 1" } else { "string 2" })

Different from:
printf(“%s”, boolean ? String 1 : string 2)
?

Ternary expression?

Posted Oct 1, 2024 6:38 UTC (Tue) by milian.wolff (guest, #153128) [Link] (4 responses)

There is no semantic difference between these two statements. Rust simply has no ternary expressions. Instead it has first-class support for expressions in many more contexts compared to C and C++, allowing for whole inline if / else if / else cascades or even loops etc.

Ternary expression?

Posted Oct 1, 2024 13:34 UTC (Tue) by Paf (subscriber, #91811) [Link] (3 responses)

Ahh, this comment helps me understand, thank you! There is full support for control flow code in (to a C person) fairly arbitrary spots.

So the particular example is perhaps poorly chosen as it has an extremely close equivalent in C.

Ternary expression?

Posted Oct 1, 2024 13:44 UTC (Tue) by daroc (editor, #160859) [Link] (2 responses)

That's fair — it's the example Roy chose, and he is the expert in this case, so I assume it was the best example available. Maybe it would help to point out that in Rust you can have things like while loops inside if expressions, which isn't really possible with C ternary expressions. So you need generalized control flow inside expressions, instead of a relatively small subset.

Ternary expression?

Posted Oct 2, 2024 11:29 UTC (Wed) by khim (subscriber, #9252) [Link]

> Maybe it would help to point out that in Rust you can have things like while loops inside if expressions, which isn't really possible with C ternary expressions.

We are not talking arbitrary C code, though, we are talking about kernel. In Linux kernel, of course, it's not just possible, it's routine, there are many macros that exploit that ability.

Ternary expression?

Posted Oct 3, 2024 14:16 UTC (Thu) by Paf (subscriber, #91811) [Link]

Thank you! (It didn’t sound like it was your choice of example, fwiw)

Ternary expression?

Posted Oct 1, 2024 8:38 UTC (Tue) by danielthompson (subscriber, #97243) [Link]

In C the ternary operator is an expression but control flow (such as if) are not expressions. In Rust control flow is "just an expression".

I suspect part of the concern is also that, by allowing coccinelle to recognise the if in the println!() macro, will also result in it having to test many, many other control flow expressions for matches which would have been ignored when they appear in C.

Ternary expression?

Posted Oct 1, 2024 11:48 UTC (Tue) by kaesaecracker (subscriber, #126447) [Link]

Rust does not have ternary expressions. You can use an if expression instead, which is equivalent.

Lexical code transformer?

Posted Sep 30, 2024 23:32 UTC (Mon) by rywang014 (guest, #167182) [Link] (2 responses)

A C change requires lexical transformations to the C callers, and Coccinelle can do that for you. But a C change typically requires a change of abstractions in Rust side. For example now your C function returns an error by returning a negative int, which requires a signature change in Rust abstractions from returning a MyDescriptor to a Result<MyDescriptor> or even something like `SomeRegistrationFromWhichYouCanGetA<MyDescriptor>`. And each caller needs to explicitly handle it - either by a `?` or by some added error handling code. So this is not likely a lexical transformation. How can an automatic tool do that? Even if it can change all callers with an additional `?` it's not likely to be semantically correct.

Lexical code transformer?

Posted Oct 1, 2024 9:17 UTC (Tue) by matthias (subscriber, #94967) [Link] (1 responses)

So your use case is that an infallible function now can return an error. But this is a change of abstraction in C as well as in rust. Even if you do not need to change the return type in C, you still have to make sure that erros are handled correctly at each call site. How can an automatic tool do that?

I do not see at all that the task for C is simpler than for rust. To the contrary. If the adding of '?' in rust code is syntactically correct, i.e., if the compiler is happy, then it is most likely also semantically correct. If a call site does not have the proper return type itself, the compiler will notice that the error is not handled. If the resulting code just compiles then there already is error handling for the specific error type.

Of course, there is still the possibility that the error is not handled correctly, as the error path might not have been tested at all. It might just have been dead code. But this is not different to C.

Lexical code transformer?

Posted Oct 1, 2024 15:28 UTC (Tue) by roc (subscriber, #30627) [Link]

Yet another example of "this is easy in C, but hard in Rust ... because C lets you do it wrong".

Not very useful

Posted Oct 1, 2024 18:44 UTC (Tue) by dev-ardi (guest, #172609) [Link]

This doesn't seem particularly useful, at least in Rust. This seems to only work with the AST, which doesn't allow for much analysis.
If this were to integrate with something like clippy it could have much deeper analysis by inspecting the HIR and MIR.

To me it looks like what we really want is something like clippy plugins. The reason they don't exist is of course because clippy uses permanently unstable rustc libraries so any clippy plugin would need high maintanance...

A tbsp of AST?

Posted Oct 4, 2024 16:02 UTC (Fri) by mathstuf (subscriber, #69389) [Link]

Would this project[1] (tree-based source processing language) help? It gives a proper (tree-sitter-derived) AST to match on at least.

[1] https://git.peppe.rs/languages/tbsp/about/


Copyright © 2024, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds