|
|
Subscribe / Log in / New account

LWN.net Weekly Edition for August 6, 2020

Welcome to the LWN.net Weekly Edition for August 6, 2020

This edition contains the following feature content:

This week's edition also includes these inner pages:

  • Brief items: Brief news items from throughout the community.
  • Announcements: Newsletters, conferences, security updates, patches, and more.

Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.

Comments (none posted)

"Structural pattern matching" for Python, part 1

By Jake Edge
August 5, 2020

We last looked at the idea of a Python "match" or "switch" statement back in 2016, but it is something that has been circulating in the Python community both before and since that coverage. In June it was raised again, with a Python Enhancement Proposal (PEP) supporting it: PEP 622 ("Structural Pattern Matching"). As that title would imply, the match statement proposed in the PEP is actually a pattern-matching construct with many uses. While it may superficially resemble the C switch statement, a Python match would do far more than simply choose a chunk of code to execute based on the value of an expression.

Proposal

Guido van Rossum introduced PEP 622 to the python-dev mailing list on June 23; he was one of five co-authors of the PEP along with Brandt Bucher, Tobias Kohn, Ivan Levkivskyi, and Talin. Van Rossum's introduction did not include the PEP itself (original version), which is somewhat unusual, but instead consisted of the contents of a README file, "which is shorter and gives a gentler introduction than the PEP itself". It notes that the python-ideas mailing list had "several extensive discussions" about some kind of match statement over the years, which were summarized by Kohn in a blog post in 2018. The introduction starts with a fairly simple example:

def http_error(status):
    match status:
        case 400:
            return "Bad request"
        case 401:
            return "Unauthorized"
        case 403:
            return "Forbidden"
        case 404:
            return "Not found"
        case 418:
            return "I'm a teapot"
        case _:
            return "Something else"

That is pretty self-explanatory except perhaps for "_", which is a wildcard that will match anything. The idea is that each case will be tried in the order specified, the first to match will be executed, and the match statement terminates; only zero or one of the case entries will be executed and there is no falling through as with C. But this example is also deceptively similar to the C switch statement. Multiple case values can be combined in a single line, however, (e.g. case 401|403:) which is somewhat more compact than the C equivalent.

The next example starts to show where the proposed match statement takes a different path:

# The target is an (x, y) tuple
match point:
    case (0, 0):
        print("Origin")
    case (0, y):
        print(f"Y={y}")
    case (x, 0):
        print(f"X={x}")
    case (x, y):
        print(f"X={x}, Y={y}")
    case _:
        raise ValueError("Not a point")

If point is not a tuple of two values, it will match the wildcard case and raise the exception, otherwise it will match one of the four cases given. For the three cases that are not the origin (i.e. (0, 0)), the variables in the case get replaced with their corresponding values from point, thus they can be printed out. These variable "extractions" are new, different behavior than what may be expected in Python; a point that is (0, 1) will not only match the second case, it will bind y to the value 1. Here is some sample output:

    # point = (23, 0)
    X=23
    # point = (0, 'foo')
    Y=foo
    # point = ('0', '0')
    X=0, Y=0

There is lots more to the proposal, as the introductory message and the voluminous PEP describe, including using Python dataclasses, guard clauses, sequence and mapping patterns, extracting sub-patterns with the walrus operator (":=") that came from the contentious PEP 572, and more. Here is an example that (perhaps nonsensically) combines several of those in the introduction:

    from dataclasses import dataclass

    @dataclass
    class Point:
	x: int
	y: int

    def test(item):
	match item:
	    case Point(x, y) if x == y:
		print('A Point on the diagonal')
	    case [ Point(x1, y1), p2 := Point(x2, y2) ]:
		print(f'Two Points, x coord of the first is {x1}, the second Point is {p2}')
	    case { 'bandwidth' : b, 'latency' : l }:
		print(f'not a Point but it has {b} for bandwidth with {l} for latency')
The first case uses a guard clause on the Point data class to restrict it to only match on points with the same value for x and y. The second uses a sequence pattern to match a sequence of two points, extracting the second into p2. The last case is checking for a mapping object that has an entry for the two fields named bandwidth and latency. These patterns can be arbitrarily complex and composed in various ways, as might be expected.

One of the areas that drew complaints was the syntax for constants. Instead of using hard-coded values in match statements, developers will likely want to use symbolic constants, as the following example shows:

RED, GREEN, BLUE = 0, 1, 2

match color:
    case .RED:
        print("I see red!")
    case .GREEN:
        print("Grass is green")
    case .BLUE:
        print("I'm feeling the blues :(")
If the names were used without the dot, they would be interpreted as variable extractions (e.g. case RED: would match anything and assign it to the variable RED), so some syntactic mechanism must be used to disambiguate those cases. The PEP authors chose the dot, but many of those commenting on it were not particularly happy with that choice. There were plenty of other complaints as well.

Reaction

Antoine Pitrou had several concerns, but one of those was the "context switch" required when reading the code:

When reading and understanding a match clause, there's a cognitive overhead because [suddenly] `Point(x, 0)` means something entirely different (it doesn't call Point.__new__, it doesn't lookup `x` in the locals or globals...).

He suggested alternative syntax using with or @ rather than making the case look like the creation of a new object. In a somewhat similar vein, Greg Ewing suggested making it explicit that variables in the case statements were being bound:

[...] I would rather see something explicitly marking names to be bound, rather than making the binding case the default.
E.g.
   case Point(?x, 0):
This would also eliminate the need for the awkward leading-dot workaround for names to be looked up rather than bound.

The PEP says this was rejected on the grounds that binding is more common than matching constants, but code is read more often than it is written, and readability counts.

Van Rossum noted that the PEP authors "did plenty of bikeshedding in private" about how to specify a case condition; he suggested that if there were a groundswell of support for some alternative, it could be adopted instead. Daniel Moisset was strongly in favor of the overall idea, but took some of the concerns even further, suggesting that the use of the dot on constants makes it easy for developers to shoot themselves in the foot:

The "We use a name token to denote a capture variable and special syntax to denote matching the value of that variable" feels a bit like a foot-gun.

He listed two possibilities that he found to be less dangerous: using angle brackets around to-be-captured variable names (e.g. "<x>") or effectively putting the capture variables in their own namespace by matching into a "capture object":

match get_node() into c:
    case Node(value=c.x, color=RED): print(f"a red node with value {c.x}")
    case Node(value=c.x, color=BLACK): print(f"a black node with value {c.x}")
    case c.n: print(f"This is a funny colored node with value {c.n.value}")

Rob Cliffe was generally in favor of the PEP, but was concerned with using "|" for listing alternatives in matches given that it already has a different meaning (i.e. bitwise OR). In addition, he agreed with others who were suggesting that "else" be used instead of the wildcard value as the default case when nothing else matches. Using "_" "is obscure, and [wouldn't] sit well with code that already uses that variable for its own purposes". The single-underscore "variable" is a convention used in Python to denote a value that is not needed (i.e. a throwaway value); for example:

    x, _, z = foo

That will unpack the three-element foo sequence and throw away the middle value. Its connection to wildcards is somewhat tenuous, though. Chris Angelico wondered if using the ellipsis ("...") made sense instead, which Barry Warsaw also favored. But Van Rossum said that he would find ellipsis confusing:

The problem is that ellipsis already has a number of other meanings, *and* is easily confused in examples and documentation with leaving things out that should be obvious or uninteresting. Also, if I saw [a, ..., z] in a pattern I would probably guess that it meant "any sequence of length > 2, and capture the first and last element" rather than "a sequence of length three, and capture the first and third elements". (The first meaning is currently spelled as [a, *_, z].)

He also pointed out that other languages use "_" for wildcards as well. The current version of the PEP puts it this way:

Perhaps the most convincing argument is that _ is used as the wildcard in every other language we've looked at supporting pattern matching: C#, Elixir, Erlang, F#, Haskell, Mathematica, OCaml, Ruby, Rust, Scala, and Swift.

One counterargument is that internationalization (i18n) libraries often use the underscore to indicate a string that needs translation (e.g. _('This string should be localized')), which makes the use as a wildcard somewhat confusing. The two uses would not collide, though, because the pattern for a case is treated differently than other language constructs—which may be part of what's causing some of the negative reaction to the PEP.

Marc-André Lemburg, who wrote the rejected PEP 275 ("Switching on Multiple Values") back in 2001, was happy to see the new PEP, but was surprised, like many others, about the lack of an else for the catch-all case. Van Rossum, who wrote the also-rejected PEP 3103 ("A Switch/Case Statement") in 2006 targeting Python 3, noted that, in the 19 years since PEP 275, "there are now some better ideas to steal from other languages than C's switch. :-)" He said that the authors of the current PEP were split on the question of else to some extent.

The authors don't feel very strongly about whether to use `else:` or `case _:`. The latter would be possible even if we added an explicit `else` clause, and we like TOOWTDI [there's only one way to do it]. But it's clear that a lot of people *expect* to see `else`, and maybe seeing `case _:` is not the best introduction to wildcards for people who haven't seen a match statement before.

A wrinkle with `else` is that some of the authors would prefer to see it aligned with `match` rather than with the list of cases, but for others it feels like a degenerate case and should be aligned with those. (I'm in the latter camp.)

He said that the authors were still discussing it and, once they reach agreement, would update the proposal. Lemburg was concerned that "case _:" is "just too easy to miss when looking at the body of 'match'". But as Van Rossum and others noted, a case with just a wildcard (whatever the syntax for that ends up being) will be valid syntax, so an else is not strictly needed, though it may still be desirable for other reasons.

The PEP also had a lengthy section on a __match__() protocol that could be used to customize how objects are matched, but it turned out to be a confusing idea with a lot of unclear corner cases. It drew lots of complaints and has been dropped in later versions of the PEP.

Gregory P. Smith raised concerns about confusing readers of the code, especially those who are not well-versed in Python. He posited showing a chunk of code using match to someone who does not know the language:

match get_shape():
    case Line(start := Point(x, y), end) if start == end:
        print(f"Zero length line at {x}, {y}")
I expect confusion to be the result. If they don't blindly assume the variables come from somewhere not shown to stop their anguish.

With Python experience, my own reading is:

  • I see start actually being assigned.
  • I see nothing giving values to end, x, or y.
  • Line and Point are things being called, probably class constructions due to being Capitalized.
  • But where did the parameter values come from and why and how can end be referred to in a conditional when it doesn't exist yet? They appear to be magic!

Did get_shape() return these? (i think not). Something magic and implicit rather than explicit happens in later lines. The opposite of what Python is known for.

He suggested that making the match conditions look like a call to a class constructor would simply be too confusing. He presented alternative syntax, but Ewing called that "inscrutable [...] in its own way". Paul Moore disagreed with the idea that those with Python knowledge would be completely confused as portrayed by Smith; it may take a bit of effort to get there, but Moore is confident that developers would come up to speed on match fairly quickly.

Needed?

Mark Shannon questioned whether the PEP truly outlined a serious problem that needed solving in the language. The PEP describes some anecdotal evidence about the frequency of the isinstance() call in large Python code bases as a justification for the match feature, but he found that to be a bit odd; "[...] it would be better to use the standard library, or the top N most popular packages from GitHub". He also wondered why the PEP only contained a single example from the standard library of code that could be improved using match. "The PEP needs to show that this sort of pattern is widespread."

The example cited by the PEP does make it clearer why isinstance() is mentioned. The basic idea is that heterogeneous data is frequently "destructured"—objects of various sorts have their internal data pulled out in different ways—in large Python code bases. The original PEP puts it this way:

This PEP aims at improving the support for destructuring heterogeneous data by adding a dedicated syntactic support for it in the form of pattern matching. On very high level it is similar to regular expressions, but instead of matching strings, it will be possible to match arbitrary Python objects.

We believe this will improve both readability and reliability of relevant code. To illustrate the readability improvement, let us consider an actual example from the Python standard library:

def is_tuple(node):
    if isinstance(node, Node) and node.children == [LParen(), RParen()]:
        return True
    return (isinstance(node, Node)
            and len(node.children) == 3
            and isinstance(node.children[0], Leaf)
            and isinstance(node.children[1], Node)
            and isinstance(node.children[2], Leaf)
            and node.children[0].value == "("
            and node.children[2].value == ")")

With the syntax proposed in this PEP it can be rewritten as below. Note that the proposed code will work without any modifications to the definition of Node and other classes here:

def is_tuple(node: Node) -> bool:
    match node:
        case Node(children=[LParen(), RParen()]):
            return True
        case Node(children=[Leaf(value="("), Node(), Leaf(value=")")]):
            return True
        case _:
            return False

The proposed syntax is far more clear—though there are some conceptual hurdles to surmount—but it is not so obvious that the problem is truly widespread. Shannon would seem to be concerned that match may be a feature that is in search of real use cases.

Moisset said that he had put together some extensive notes on the feature. In them, he argued that the feature was not really about pattern matching and was, instead, about introducing algebraic data types, which come from functional languages, into Python. Beyond that, he also described the proposal rather more clearly than the PEP itself does, which is likely part of what got him invited to join the author group of the PEP for round two.

Just over 24 hours after his initial post, Van Rossum called for something of a pause in all of the comments pouring into the mega-thread. He noted four separate items that the PEP authors now knew were contentious (an alternate spelling for "|", else, a different wildcard token instead of "_", and what to do about the dot notation for constants) and asked that folks wait to add additional comments on those choices until the authors could come to some agreement. The final item was also meant to cover the possibility of changing to marking variables (e.g. "?foo") rather than constants as had been suggested. He asked that any other concerns with the PEP be concisely added to his new thread, which several did—though at a much-relaxed pace compared to the original.

That takes us up near the end of June in this tale, but there is more to come. The authors came back with a second version of the PEP, without the __match__() protocol, and dropping the dot notation for constants, replacing it with a requirement that constants in case entries be referenced from some namespace (thus have a dot in their representation: Color.RED), but making few other substantive changes—beyond a gentler introduction courtesy of Moisset. That set off another mega-thread along with several other threads discussing specific aspects of the PEP. We will pick up where we left off soon; stay tuned.

Comments (38 posted)

Netgpu and the hazards of proprietary kernel modules

By Jonathan Corbet
July 31, 2020
On its face, the netgpu patch set appears to add a useful feature: the ability to copy network data directly between a network adapter and a GPU without moving it through the host CPU. This patch set has quickly become an example of how not to get work into the kernel, though; it has no chance of being merged in anything like its current form and has created a backlash designed to keep modules like it from ever working in mainline kernels. It all comes down to one fundamental mistake: basing kernel work on a proprietary kernel module.

The use case for netgpu appears to be machine-learning applications that consume large amounts of data. The processing of this data is offloaded to a GPU for performance reasons. That GPU must be fed a stream of data, though, that comes from elsewhere on the network; this data follows the usual path of first being read into main memory, then written out to the GPU. The extra copy hurts, as does the memory-bus traffic and the CPU time needed to manage this data movement.

This overhead could be significantly reduced if the network adapter were to write the data directly into the GPU's memory, which is accessible via the PCI bus. A suitably capable network adapter could place packet data in GPU memory while writing packet headers to normal host memory; that allows the kernel's network stack to do the protocol processing as usual. The netgpu patch exists to support this mode of operation, seemingly yielding improved performance at the cost of losing some functionality; anything that requires looking at the packet payload is going to be hard to support if that data is routed directly to GPU memory.

A lot of work has been done in recent years to enable just this kind of zero-copy, device-to-device data transfer, so one might expect this functionality to be well received. And, indeed, the code was reviewed normally until the last patch in this 21-part series, where things ran into a snag. This is the patch that interfaces between the netgpu module and the proprietary NVIDIA GPU driver; it can't even be built without the NVIDIA driver's files on disk. On seeing this, Greg Kroah-Hartman stopped and complained:

Ok, now you are just trolling us.

Nice job, I shouldn't have read the previous patches.

Please, go get a lawyer to sign-off on this patch, with their corporate email address on it. That's the only way we could possibly consider something like this.

What followed was an occasionally harsh and acrimonious discussion on whether the patches should have ever been posted in the first place. Jonathan Lemon, the author of the patches, insisted that they were not about providing functionality to the proprietary NVIDIA module in particular:

This is not in support of a proprietary driver. As the cover letter notes, this is for data transfers between the NIC/GPU, while still utilizing the kernel protocol stack and leaving the application in control.

While the current GPU utilized is nvidia, there's nothing in the rest of the patches specific to Nvidia - an Intel or AMD GPU interface could be equally workable.

Others disagreed, though, stating that the code was clearly designed around the NVIDIA module from the beginning. Christoph Hellwig argued that any upstream-oriented driver should be based on the existing P2PDMA framework, which exists just to support device-to-device data transfers. Jason Gunthorpe agreed, and argued that the design of the module as a whole was driven by NVIDIA's choices:

The design copied the nv_p2p api design directly into struct netgpu_functions and then aligned the rest of the parts to use it too. Yes, other GPU drivers could also be squeezed into this API, but if you'd never looked at the NVIDIA driver you'd never pick such a design. It is inherently disconnected from the [memory-management subsystem].

By the time the discussion wound down, it was clear that this patch set wasn't going anywhere in its current form. Large amounts of work will have to be done to build it on top of the existing kernel mechanisms for cross-device data movement — P2PDMA, the DMA-buf subsystem, etc. This work may have achieved its initial goal, but it clearly went far down the wrong path when it comes to being merged into the mainline kernel.

The sad part is that, by all appearances, the goal of this work was not to add functionality for NVIDIA GPUs in particular. Lemon does not seem to be an NVIDIA employee; the patches included a Facebook email address. But NVIDIA, with its proprietary module, was what was at hand, so that is the device that the patch set was designed to work with. Designing the module to work with free GPU drivers from the outset would have driven a number of decisions in different directions and avoided much of the trouble that has ensued.

Meanwhile, in an attempt to make this sort of mistake harder to make (and, surely only by coincidence, make life a bit harder for proprietary modules in general), Hellwig has posted a patch set changing the way GPL-only symbols are handled. Symbols exported as GPL-only by the kernel are made unavailable to proprietary modules, but there has always been a bit of a loophole in how this is enforced. A proprietary module can be broken into two parts, one of which is a minimal shim layer that interfaces between the kernel and the proprietary code. If the shim module is GPL-licensed, it can access GPL-only symbols, which it can then make available to the proprietary module.

Hellwig's patch does not entirely close this loophole, but it makes exploiting it a bit harder. With this patch applied, any module that imports symbols from a proprietary module is itself marked as being proprietary, denying it access to GPL-only symbols. If the shim module has already accessed GPL-only symbols by the time it gets around to importing symbols from the proprietary module, that import will not be allowed. This new restriction would keep the netgpu-NVIDIA interface module from loading, impeding the development of such modules in the future. It still does not cover the case, though, of a shim module that exports its own symbols to a proprietary module without importing anything from that module.

For as long as the kernel has had a module loader, there have been debates over proprietary modules. In 2006, the development community seriously discussed banning them entirely. Two years later, a long list of kernel developers signed a position statement stating that proprietary modules are "detrimental to Linux users, businesses, and the greater Linux ecosystem". These modules continue to exist, though, and do not appear to be going away anytime soon. This episode, where a proprietary module helped send a significant development project in the wrong direction and makes it nearly impossible to implement this functionality in a way that works with all GPUs, demonstrates one of the reasons why the development community sees those modules as being harmful.

Comments (50 posted)

Some statistics from the 5.8 kernel cycle

By Jonathan Corbet
August 3, 2020
Linus Torvalds released the 5.8 kernel on August 2, concluding another nine-week development cycle. By the time the work was done, 16,306 non-merge changesets had been pulled into the mainline repository for this release. That happens to be a record, beating the previous record holder (4.9, released in December 2016) by 92 changesets. It was, in other words, a busy development cycle. It's time for our traditional look into where that work came from to see what might be learned.

A total of 1,991 developers contributed to 5.8, which is another record; 304 of those developers appeared for the first time in this cycle. The community added over 924,000 lines of code and removed around 371,000 for a net growth of over 553,000 lines of code. The most active developers for 5.8 were:

Most active 5.8 developers
By changesets
Mauro Carvalho Chehab5493.4%
Christoph Hellwig3542.2%
Andy Shevchenko2231.4%
Jason Yan2051.3%
Chris Wilson1991.2%
Jérôme Pouiller1751.1%
Thomas Gleixner1561.0%
Gustavo A. R. Silva1360.8%
Masahiro Yamada1330.8%
Miquel Raynal1250.8%
Leon Romanovsky1140.7%
Sean Christopherson1090.7%
Geert Uytterhoeven1010.6%
Colin Ian King1010.6%
Daniel Vetter990.6%
Al Viro980.6%
Peter Zijlstra950.6%
Christophe Leroy930.6%
Lorenzo Bianconi890.5%
Serge Semin870.5%
By changed lines
Mauro Carvalho Chehab27261425.8%
Oded Gabbay806037.6%
Yan-Hsuan Chuang157981.5%
Arnd Bergmann130821.2%
Jack Wang128951.2%
Thomas Bogendoerfer111611.1%
Christoph Hellwig109401.0%
Omer Shpigelman108611.0%
Ryder Lee100761.0%
Chris Wilson86820.8%
David Howells81300.8%
Serge Semin75200.7%
Andrii Nakryiko61890.6%
Thomas Gleixner56950.5%
Marco Elver56190.5%
Peter Zijlstra55330.5%
Boris Brezillon54510.5%
Leon Romanovsky53990.5%
Ping-Ke Shih51730.5%
Bryan O'Donoghue49530.5%

In this cycle, Mauro Carvalho Chehab managed to get to the top of both the by-changesets and by-lines columns. Much of his work was focused on documentation, converting more files to RST and reworking the video4linux2 user-space manual, but he also put a lot of work into resurrecting the atomisp camera driver, which had been removed from the staging tree. Christoph Hellwig has done significant work throughout the kernel's memory-management, filesystem, and block subsystems. Andy Shevchenko improved a number of different drivers, Jason Yan performed code cleanups across the kernel, and Chris Wilson, as usual, did a lot of work on the i915 graphics driver.

In the lines-changed column, Oded Gabbay added a massive set of machine-generated register definitions for the Habana Gaudi processor. Yan-Hsuang added a set of machine-generated data for the Realtek rtw88 wireless driver that looks rather more like binary code than source. Arnd Bergmann did cleanup work all over as usual; part of that work was deleting support for the never-realized sh5 subarchitecture. Jack Wang contributed the rndb (a network block device using RDMA) driver.

While the number of developers contributing to the kernel set a new record, the number of companies supporting them remains about flat at 213. The companies supporting the most work in the 5.8 cycle were:

Most active 5.8 employers
By changesets
Intel193911.9%
Huawei Technologies13998.6%
(Unknown)12317.5%
Red Hat10796.6%
(None)10166.2%
Google7914.9%
IBM5423.3%
(Consultant)5153.2%
Linaro5133.1%
AMD5033.1%
SUSE4632.8%
Mellanox4452.7%
NXP Semiconductors3302.0%
Renesas Electronics3222.0%
Oracle2521.5%
Code Aurora Forum2481.5%
Facebook2471.5%
Arm2391.5%
Silicon Labs1751.1%
Linux Foundation1711.0%
By lines changed
Huawei Technologies29336527.8%
Habana Labs932138.8%
Intel882888.4%
(None)476554.5%
(Unknown)367863.5%
Linaro363223.4%
Red Hat347373.3%
Google342093.2%
IBM242332.3%
Mellanox233642.2%
Realtek227672.2%
AMD214112.0%
NXP Semiconductors213282.0%
(Consultant)154181.5%
Facebook148741.4%
MediaTek147511.4%
SUSE136591.3%
1&1 IONOS Cloud132191.3%
Code Aurora Forum118651.1%
Renesas Electronics110771.1%

For the most part, this table looks fairly familiar, but the fact that Huawei has moved up to the top of the list may come as a bit of a surprise. Much of this is the result of Chehab's work described above, but Huawei's contribution this time around is rather larger than that. A great deal of effort has gone into freezing Huawei out of the commercial marketplace in significant parts of the world, but the company remains active in the development community with 92 developers contributing to 5.8. For the curious, Huawei's work was mostly focused in these subsystems:

SubsystemChangesets
Documentation226
drivers/net226
drivers/staging222
fs73
drivers/media62
drivers/scsi62
drivers/gpu49
net49
include38
sound22
security21
kernel18

In summary, 907 of the patches coming from Huawei (65% of the total) applied somewhere in the driver subsystem, but quite a bit of the company's work was spread out over the rest of the kernel as well.

The kernel depends on people who run tests and report bugs; no kernel developer can hope to test every hardware combination and workload out there. The most active contributors in this area in 5.8 were:

Test and report credits in 5.8
Tested-by
Aaron Brown979.1%
Andrew Bowers908.5%
Arnaldo Carvalho de Melo535.0%
Hoan Tran212.0%
Marek Szyprowski191.8%
Serge Semin161.5%
David Heidelberg141.3%
Peter Geis141.3%
Jasper Korten131.2%
Tomasz Maciej Nowak121.1%
Reported-by
Hulk Robot24319.8%
kernel test robot17814.5%
Syzbot705.7%
Dan Carpenter332.7%
Stephen Rothwell262.1%
Randy Dunlap201.6%
Guenter Roeck131.1%
Qian Cai110.9%
Greg Kroah-Hartman80.7%
Lars-Peter Clausen80.7%

Automated testing systems continue to report (by far) the most bugs, but this important work is not limited to such systems.

Patch review is also important, that's how we keep bugs from needing to be reported in the first place. While not all review results in a Reviewed-by tag, there is still a signal to be seen by looking at those tags:

Review credits in 5.8
Rob Herring1832.6%
Christoph Hellwig1792.6%
Alexandre Chartre1281.8%
Andy Shevchenko1251.8%
Ranjani Sridharan1211.7%
Andrew Lunn1131.6%
Darrick J. Wong1071.5%
Florian Fainelli941.4%
Jiri Pirko881.3%
David Sterba831.2%
Hannes Reinecke811.2%
Ursula Braun791.1%
Alex Deucher781.1%
Stephen Boyd781.1%
Kees Cook781.1%

Of the patches merged for 5.8, 5,470 (34% of the total) carried Reviewed-by tags. The last few kernel releases have consistently had such tags in almost exactly one-third of the changesets merged.

The end result of all this is that the kernel-development community continues to work at a high rate. If the ongoing pandemic has had any effect at all, it would appear to have made things go even faster. It will be interesting to see if this trend continues into the 5.9 development cycle; tune in at the beginning of October for the answer to that question.

Comments (10 posted)

Go filesystems and file embedding

July 30, 2020

This article was contributed by Ben Hoyt

The Go team has recently published several draft designs that propose changes to the language, standard library, and tooling: we covered the one on generics back in June. Last week, the Go team published two draft designs related to files: one for a new read-only filesystem interface, which specifies a minimal interface for filesystems, and a second design that proposes a standard way to embed files into Go binaries (by building on the filesystem interface). Embedding files into Go binaries is intended to simplify deployments by including all of a program's resources in a single binary; the filesystem interface design was drafted primarily as a building block for that. There has been a lot of discussion on the draft designs, which has been generally positive, but there are some significant concerns.

Russ Cox, technical lead of the Go team, and Rob Pike, one of the creators of Go, are the authors of the design for the filesystem interface. Cox is also an author of the design for file embedding along with longtime Go contributor Brad Fitzpatrick. Additionally, Cox created YouTube video presentations of each design for those who prefer that format (the filesystem interface video and the file-embedding video). Both designs are quick to note that they are not (yet) formal proposals:

This is a Draft Design, not a formal Go proposal, because it describes a potential large change that addresses the same need as many third-party packages and could affect their implementations (hopefully by simplifying them!). The goal of circulating this draft design is to collect feedback to shape an intended eventual proposal.

Many smaller language and library changes are discussed on the GitHub issue tracker, but for these larger discussions the Go team is trying to use r/golang Reddit threads to scale the discussion — GitHub issues do not have any form of threading, so multiple conversations are hard to keep track of. There is a Reddit thread for each draft—the filesystem interface thread and the file-embedding thread—with quite a few comments on each. There is also a lengthy Hacker News thread that discusses the file-embedding design.

A filesystem interface

The crux of the filesystem interface design is a single-method interface named FS in a new io/fs standard library package:

    type FS interface {
        Open(name string) (File, error)
    }

This means that every filesystem implementation must at least implement the ability to open a file by name, returning a File as well as an error. The File interface is defined as follows:

    type File interface {
        Stat() (os.FileInfo, error)
        Read(buf []byte) (int, error)
        Close() error
    }

In other words, a file has the following characteristics: is able to provide file information like that returned from stat(), is able to be read, and can be closed. These are the bare minimum that a conforming filesystem needs to provide, but an implementation "may also provide other methods to optimize operations or add new functionality". The standard library's file type (os.File) already implements these three methods, so it is a conforming fs.File implementation.

If a File is actually a directory, the file information returned by Stat() will indicate that; in that case, the File returned from Open() must also implement the Readdir() method on top of the File interface. Readdir() returns a list of os.FileInfo objects representing the files inside the directory.

Filesystem implementations can expose additional functionality using what the design calls an "extension interface", which is an interface that "embeds a base interface and adds one or more extra methods, as a way of specifying optional functionality that may be provided by an instance of the base interface." For example, it is common to read a whole file at once, and for in-memory filesystem implementations, it may be inefficient to do this using Open(), multiple calls to Read(), and Close(). In cases like this, a developer could implement the ReadFile() method as defined in the ReadFileFS extension interface:

    type ReadFileFS interface {
        FS  // embed the filesystem interface (Open method)
        ReadFile(name string) ([]byte, error)
    }

Along with the extension interface, the design adds a ReadFile() helper function to the io/fs package that checks the filesystem for the ReadFileFS extension, and uses it if it exists, otherwise it falls back to performing the open/read/close sequence. There are various other extension interfaces defined in the draft proposal, including StatFS, ReadDirFS, and GlobFS. The design does not provide ways to rename or write files, but that could also be done using extensions.

In addition to the new io/fs types and helper functions, the design suggests changes to various standard library packages to make use of the new FS interface. For example, adding a ParseFS() method to the html/template package to allow parsing templates from an in-memory filesystem, or making the archive/zip package implement FS so that developers can treat a zip file as a filesystem and use it wherever FS is allowed.

Much of the feedback on the Reddit discussion has been positive, and it seems like an interface of this kind is something that developers want. However, one of the criticisms made by several people is about the drawbacks of extension interfaces. "Acln0" summarized the concerns:

I have only one observation to make, related to extension interfaces and the extension pattern. I am reminded of http.ResponseWriter and the optional interfaces the http package makes use of. Due to the existence of these optional interfaces, wrapping http.ResponseWriter is difficult. Doing it "generically" involves a combinatorial explosion of optional interfaces, and it's easy to go wrong in a way that looks like this: "we added status logging by wrapping http.ResponseWriter, and now HTTP/2 push doesn't work anymore, because our wrapper hides the Push method from the handlers downstream".

Peter Bourgon, a well-known Go blogger and speaker, believes that this use of extension interfaces means that it "becomes infeasible to use the (extremely useful) decorator pattern. That's really unfortunate. To me that makes the proposal almost a non-starter; the decorator pattern is too useful to break in this way." The decorator pattern wraps an interface and adds some functionality. It is often used for logging or authentication middleware in web servers; in the context of filesystems it would likely be used to add a caching or transformation layer. If a middleware author does not take into account the various optional interfaces, the resulting wrapper will not support them. Nick Craig-Wood, author of Rclone, a cloud-storage tool written in Go, likes the proposal but expressed similar concerns: "Extension (or optional as I usually call them) interfaces are a big maintenance burden - wrapping them is really hard".

The design states that "enabling that kind of middleware is a key goal for this draft design", so it would seem wise for the design's authors to tackle this problem head on. Cox hasn't yet proposed a solution, but acknowledged the issue: "It's true - there's definitely a tension here between extensions and wrappers. I haven't seen any perfect solutions for that.".

Another concern came from "TheSwedeheart" regarding contexts (the standard way in Go to explicitly propagate timeouts, cancellation signals, and request-scoped values down a call chain): "One thing I'm missing to migrate [his virtual filesystem] over to this is support for propagating contexts to each operation, for cancellation.". Cox replied that a library author could "probably pass the context to a constructor that returns an FS with the context embedded in it, and then have that context apply to the calls being made with that specific FS." As "lobster_johnson" pointed out, this goes against the context package's guideline to explicitly pass context as the first function argument, not store a context inside a struct. However, Cox countered with an example of http.Request doing something similar: "Those are more guidelines than rules. [...] Sometimes it does make sense."

There are of course the usual bikeshedding threads that debate naming; "olegkovalov" said: "I'm somewhat scared about io/fs name, fs is a good variable name, it'll cause many troubles to the users when io/fs will appear". After some back-and-forth, Cox stressed the need for a short name to keep the focus on application developers rather than on the filesystem implementers:

You're focusing on the file system implementers instead of the users. Code referring to things like os.FileInfo, os.ModeDir, os.PathError, os.ErrNotExist will all now refer canonically to fs.FileInfo, fs.ModeDir, fs.PathError, fs.ErrNotExist. Those seem much better than, say, filesystem.ErrNotExist. And far more code will be referring to those names than implementing file systems.

Embedding files in binaries

The other draft design proposes a way to embed files (or "static assets") in Go binaries and read their contents at runtime. This simplifies releases and deployments, since developers can simply copy around a large binary with no external dependencies (for SQL snippets, HTML templates, CSS and JavaScript assets for a web application, and so on). As the document points out, there are already over a dozen third-party tools that can do this, but "adding direct support to the go command for the basic functionality of embedding will eliminate the need for some of these tools and at least simplify the implementation of others". Including embedding in the standard go tool will also mean there is no pre-build step to convert files to data in Go source code, and no need to commit those generated files to version control.

The authors of the design make it clear that this is a tooling change, not a Go language change:

Another explicit goal is to avoid a language change. To us, embedding static assets seems like a tooling issue, not a language issue. Avoiding a language change also means we avoid the need to update the many tools that process Go code, among them goimports, gopls, and staticcheck.

The go tool already looks for special comments in Go source files for various things, including // +build tags to include certain files only on specific architectures, and //go:generate comments that tell go generate what commands to run for code-generation purposes. This file-embedding design proposes a new //go:embed comment directive that goes directly above a variable declaration and tells go build to include those files in the resulting binary associated with the variable. Here is a concrete example:

    // The "content" variable holds our static web server content.
    //go:embed image/* template/*
    //go:embed html/index.html
    var content embed.Files

This would make go build include all the files in the image and template directories, as well as the html/index.html file, and make them accessible via the content variable (which is of type embed.Files). The embed package is a new standard library package being proposed that contains the API for accessing the embedded files. In addition, the embed.Files type implements the fs.FS interface from the filesystem design discussed above, allowing the embedded files to be used directly with other standard library packages like net/http and html/template, as well as any third-party packages that support the new filesystem interface.

The design limits the scope of the proposal in an important way. There are many ways that the data in the files could be transformed before being included in the binary: data compression, TypeScript compilation, image resizing, and so on. This design takes a simple approach of just including the raw file data:

It is not feasible for the go command to anticipate or include all the possible transformations that might be desirable. The go command is also not a general build system; in particular, remember the design constraint that it never runs user programs during a build. These kinds of transformations are best left to an external build system, such as Make or Bazel, which can write out the exact bytes that the go command should embed.

Again, the feedback on the Reddit thread was mostly positive, with comments like this one from "bojanz": "This looks like a great start. Thank you for tackling this." There are a few minor suggestions, such as a comment by "zikaeroh" in favor of adding a more powerful path-matching API that supports double star for recursive path matching, like glob('**/*.png', recursive=True) in Python. Kevin Burke, who is the maintainer of a file-embedding package, suggested also storing a cryptographic hash of each file's content so the developer does not have to hash the file at runtime: "This is useful for e.g. cache busting on a static file server".

One of the repeated critiques is from developers who don't like overloading source code comments with the special //go:embed syntax. "Saturn_vk" stated bluntly, "I really don't like the fact that comments are being abused for actual work", and Hacker News commenter "breakingcups" strongly advocated for the use of a project file instead of directives in comments:

Again, more magic comments.

The proposed feature is great, but the unwillingness of the Go team to use a separate, clearly defined project file or at the very least a separate syntax in your code file leads them to stuff every additional feature into comments, a space shared by human notetaking.

Cox summed up his thinking about this with the following comment, which compares the syntax with #pragma for C:

For what it's worth, we already have //go:generate and a few other lesser known ones. And there is a separate draft design to replace // +build with //go:build. At that point we will be completely consistent: these kinds of directives begin with //go:. The point is to look enough like a comment to make tools that don't need to know ignore them, but enough not like a comment to signal to people that something special is going on.

C uses #pragma foo for this. Go simply spells #pragma as //go:.

Next up

There is a fair amount of community support for both draft designs, particularly the more user-facing proposal for file embedding. Many developers are already using third-party file-embedding libraries to simplify their deployments and these efforts will standardize that tooling. It seems likely that the designs will be refined and turned into full proposals. With Go 1.15 due out on August 1, it's possible that these proposals would be ready for Go 1.16 (scheduled for six months out), but if there needs to be another round of feedback — for example, regarding the problems with extension interfaces — it is more likely to be included in Go 1.17 in a year's time.

Comments (15 posted)

Checking out FreeCAD

By John Coggeshall
August 5, 2020

Our look at running a CNC milling machine using open-source software led me to another tool worth looking at: FreeCAD. I wasn't previously familiar with the program, so I decided to check it out. In this article I will walk through my experiences with using FreeCAD for the first time to do a variety of CNC-related tasks I normally would have used a commercial product for. I had varying degrees of success in my endeavors, but in the end came away with a positive opinion.

FreeCAD is an LGPL v2+ licensed CAD and CAM program written in Python and C++. The first release of the project was in 2002, and its last stable version 0.18.4 was released in October 2019. The project's GitHub page indicates that it has 271 contributors with new commits happening often (generally more than 50 a week). Beyond code contributions, FreeCAD has a welcoming community with active forums to answer any questions users might have along the way. FreeCAD is designed to be cross-platform, supporting Linux, macOS, and Windows, with binary releases provided by the OS-independent package and environment management system Conda.

I decided to take on a relatively simple CNC project: milling a new street-address sign for my home. The plan called for a 700mm x 150mm sign, and I decided to mill it out of a plank of maple wood. The design I have in mind is pretty straightforward, so it should be a great way to put FreeCAD through a test on a real project. I also looked at using FreeCAD for taking existing models that are available online with an open license and importing them for milling (in this case, a wooden spoon).

It is worth noting that before this effort I had never used FreeCAD before. My personal goal is to become fluent enough with FreeCAD that I can replace my dependence on the commercial CAD software I presently use in my design work. The goal of this article, however, is to share what my experience with FreeCAD was, and provide a glimpse of FreeCAD from the perspective of an inexperienced user.

Getting FreeCAD

FreeCAD is available for download on the FreeCAD project website, along with development builds for the upcoming 0.19.0 release. One thing I learned as I went, however, was that I was much better off using development snapshots than I was using the pre-compiled binaries from the releases. The stable release worked fine, but I quickly realized that features I would have expected to be in the project only exist in 0.18.4 as "experimental" features requiring special effort to enable. Specifically, 3D surface-path generation was a key technology I simply had to have to consider a switch. For unfamiliar readers, this feature allows you to carve a model's outer shell out of a piece of stock material using a CNC — in a way similar to carving a statue out of a block of marble. I discovered on the forums that the Path Workbench for FreeCAD is under active development for 0.19, including 3D surface-paths, and "many bug fixes and new features" related to path generation are in the upcoming release.

Thankfully, FreeCAD provides a pretty easy-to-use package system to install regular development snapshots. With a few commands and a little patience, I was able to get a development-tree build of FreeCAD 0.19 running on my laptop with little hassle.

My first test: a spoon

Before I decided to try making something from scratch using FreeCAD, I wanted to try a pretty common workflow in the open-source maker world: download a model someone else has created, and make it. For me this meant taking an STL file I had handy for a spoon and trying to use FreeCAD to design the tool paths to carve it out. As simple as a spoon is, carving one on a CNC is a fairly complicated effort and a good test of the project's capabilities. To succeed, you have to carve the top of the wood stock, flip the wood over, and then carve the bottom as a separate job. This experiment introduced me to several of FreeCAD's features: importing the STL, editing the model, setting up a CNC job, building tool paths for both the top and bottom, and finally generating G-code. Regrettably, my experiment did not work as well as I would have hoped since I ran into multiple problems.

The first problem I encountered was an overwhelming amount of somewhat disorganized (or even incorrect) documentation. The FreeCAD project has a pretty massive wiki dedicated to documentation on using the application. For someone who knew nothing (me), it took a great deal of reading and effort just to get any sense at all of how to accomplish things that are fairly straightforward in other tools. One of the places I found the documentation most lacking was a clear explanation of the multitude of FreeCAD workbenches. These workbenches are at the heart of any design work being done in FreeCAD, but conceptually their function and interoperability with each other were not clearly explained. As a result, getting to the point where I had a basic understanding to even attempt to make my spoon was more difficult than I had otherwise hoped.

When trying to use FreeCAD, readers should be prepared to spend a considerable amount of effort gaining an understanding of the program's foundational concepts, how those concepts map to the workbenches, and to struggle early on doing even simple tasks. For example, if the goal is resizing an STL model imported via the Mesh Workbench, you will find that the tools to do so don't exist in that workbench. Instead, the imported mesh must first be converted to a "shape", then a "solid" using the Part Workbench. Once it is a solid, you must create a clone of it using the Draft Workbench (strangely billed as a tool for 2D work). This clone, once created through these many steps, then can finally be resized as needed. Coming from previous experience of clicking on an STL and then simply resizing it at will, I found the FreeCAD process to do the same felt incredibly burdensome.

After a good amount of searching online, watching videos, and reading forum posts, I did succeed in importing the STL file of my spoon. Ultimately I roughly followed the process outlined here. That said, I quickly learned that FreeCAD is considerably less capable than other options when it comes to working with complex STL meshes; the processing time between actions taken in my tests was often measured in minutes. This is, for example, compared to the much simpler CAMLab browser-based CAM system that was able to load and work with the same model in seconds. It's unclear exactly where the performance issues are. Ultimately I was able to side-step the issue by reducing the complexity of my STL spoon model using online tools — at the cost of detail in the final output.

In the end, I abandoned my attempt to make a spoon after I encountered a problem with the 3D surface-path tool and decided to move on to my more straightforward street-address sign. In my questions to the FreeCAD forum, another user indicated they weren't able to reproduce the problem I experienced, saying "I suspect you had been working on the same file for awhile, and eventually 'stuff' builds up." Working on a file for long periods of time shouldn't break things, but working from a development branch of the project, those sorts of issues are to be expected. Despite the problems I had, many of which can certainly be chalked up to my lack of understanding, it is worth noting that the FreeCAD community was both engaged and extremely helpful to me as a newcomer. I will certainly be revisiting my quest for a milled-at-home cooking spoon using FreeCAD in the future to see if perhaps a fresh start with more knowledge might fix my problems.

Take two: a street-address sign

For my second experiment, I decided to start from scratch and try to make a street-address sign for my home. Unlike the spoon experiment, requiring complex tool paths over a curved 3D surface (on two sides), my sign design was much more straightforward; the sign only needed to be milled on one side to create a relief of the street-address text in the center.

This project allowed me to try to use the CAD portions of FreeCAD to make this simple 3D model. The basic steps to build the model were: create a properly-sized 3D-rectangle 19mm thick (the size of my board), create an indentation (or "pocket") with a 10mm border, and then add the raised address centered in the middle.

Again, as was the case with trying to import an STL file, I found myself jumping back and forth between multiple workbenches to design my model. I was able to follow this tutorial to get an understanding of how most of the steps were done and to apply them to my specific project. While I ultimately did succeed in creating my model, doing so felt much more complicated than my experiences with other tools. For the sign, here's what I ended up with (replacing the real street-address with "LWN"):

[FreeCAD Address Sign]

Everything in FreeCAD is stored as an object with properties in a hierarchy of dependencies, and the various workbenches allow you to modify different aspects of that hierarchy. When you modify a model, the changes propagate throughout the hierarchy. This makes building models more complicated up front, but also makes them quite flexible once they are designed.

For example, once my sign was complete, changing my real street-address to "LWN" for the screenshot above required only modifying a single text property and adjusting the alignment to re-center it in the sign. Experienced FreeCAD users may point out that I would have also been able to use constraints to auto-center my text, but in my case, I was satisfied to simply eyeball it. Had I used constraints, I likely would have only needed to modify the text to create an entirely new sign properly centered. This approach to modeling, while complicated, is one of the more important conceptual features of FreeCAD: because models are built up based on properties of other models in a hierarchy, it is possible to create a "sign" model that can be quickly changed for the need at hand (different text, different size, etc.) If I were to be designing a sign to sell, for example, it would be a huge time-saver to simply modify one property with a new address to make that new sign.

Path generation

With the model made, it was time to try using the FreeCAD path-generation tooling via the Path Workbench. This is, as previously mentioned, a focus of active development by the FreeCAD community for the 0.19 release.

The Path Workbench's primary purpose is to generate tool paths based on a 3D-model for my CNC router to follow as it cuts the wood. Tool paths come in various types, and which of them makes sense to use depends on many different variables. Based on the model I was attempting to mill, I selected two primary tool path types: pocket paths and profile paths. Pocket paths cut a "pocket" into the material (such as the two empty spaces in the number "8"), and profile paths trace the edge of a model — primarily used to cut the model out of a piece of stock.

As is common with other CAM software packages, FreeCAD provides a tool manager where users can store the details of the various cutting end mills they have available for their CNC. These tools can then be imported into a job object, to be used during tool path operations that are to be generated. For my project (the name of my street has no closed-loops), I needed to create five separate pocket tool path operations — four for the street numbers "9", "8" (two pockets), and "4" in my address, and one for the large pocket of the sign itself (the rest of the street address).

I also needed to create a profile tool path operation to cut my sign out of the larger wood stock it was being carved from. When cutting out a model using this sort of method, it is common to create "tabs" that keep the thing being cut out attached to the stock. Without them, as the part was cut out it would begin to loosen or detach completely from the stock in the middle of cutting it. For FreeCAD, this again highlights the object-hierarchy philosophy of the tool. First, the profile-tool path object is created without these cut-out tabs, then the profile-path object is replaced with what FreeCAD calls a "dressup", based on the original profile-path object as a dependency. There are many different dressups available for a profile-tool path; I used the "tag" dressup. This modification to the profile adds cut-out tabs at reasonably spaced locations on the model. Here is the address sign, complete with all of the generated tool paths needed to carve it out from the stock wood using "LWN" as the sign text:

[FreeCAD sign tool paths]

G-code generation and simulation

Once all the tool paths are created, it is now time to take those tool paths and convert them to a language the CNC controller will understand: G-code. For FreeCAD, this is one of the most impressive aspects of the project I have explored so far. According to the documentation, FreeCAD represents paths in an internal format of G-code and then converts that format into a specific G-code dialect used by a machine in a process the project calls "post-processing". While in this internal state, however, it can be simulated within FreeCAD to see exactly how the material will be removed from the stock. The simulations help when trying to make sure that the way the CNC will cut things is exactly what you expect it to be — before actually cutting anything. In my project, I frequently ran the simulator to test different settings in my tool paths to find the ones that worked best for the job. Various G-code simulation tools exist, but having one built in and just a click away was nice.

When it is time to export the job to G-code to be sent to the CNC for actual cutting, FreeCAD supports eleven different G-code dialect post-processors along with the ability to add your own. These post-processors take the internal G-code format of FreeCAD and translate it into a G-code dialect compatible with the controller being used. In my case, my CNC runs Grbl on the controller, which is supported by FreeCAD.

Wrapping up

In the end, I would consider the experiments using FreeCAD as my go-to CAD and CAM tool to be a success. While I have various gripes, in fairness I discovered what the project freely admits: FreeCAD's "learning curve can be steep". Part of the reason that the program is difficult to learn is that it has so many capabilities, being suitable for a large array of varying CAD tasks. There are many other features we simply don't have time to cover, such as being able to write Python scripts to perform tasks, create architectural diagrams, and 2D drafting. With all of these features and complexity, it's easy to get stuck, but the active FreeCAD community proved itself welcoming and willing to help when it happened to me. For those of us who seek out open-source alternatives, FreeCAD is certainly worth a look if you do any modeling work.

Comments (14 posted)

Page editor: Jonathan Corbet

Inside this week's LWN.net Weekly Edition

  • Briefs: LF Open Source Security Foundation; Linux 5.8; Grub2 update woes; Julia 1.5; LibreOffice 7; systemd 246; Quotes; ...
  • Announcements: Newsletters; conferences; security updates; kernel patches; ...
Next page: Brief items>>

Copyright © 2020, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds