LWN.net Weekly Edition for August 6, 2020
Welcome to the LWN.net Weekly Edition for August 6, 2020
This edition contains the following feature content:
- "Structural pattern matching" for Python, part 1: a complex and capable case statement for Python.
- Netgpu and the hazards of proprietary kernel modules: binary-only modules lead a driver project down the wrong path.
- Some statistics from the 5.8 kernel cycle: where the code in 5.8 came from.
- Go filesystems and file embedding: a pair of proposed new features for the Go language.
- Checking out FreeCAD: a first look at this free computer-aided design system.
This week's edition also includes these inner pages:
- Brief items: Brief news items from throughout the community.
- Announcements: Newsletters, conferences, security updates, patches, and more.
Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.
"Structural pattern matching" for Python, part 1
We last looked at the idea of a Python
"match" or "switch" statement back in 2016, but it is something that has
been circulating in the Python community both before and since that coverage.
In June it was raised again, with a Python Enhancement Proposal (PEP)
supporting it: PEP 622
("Structural Pattern Matching
"). As that title would imply, the
match statement proposed in the PEP is actually a pattern-matching
construct with many uses.
While it may superficially resemble the C switch statement, a
Python match would do far more than simply choose a chunk of code
to execute based on the value of an expression.
Proposal
Guido van Rossum
introduced
PEP 622 to the python-dev mailing list on June 23; he was one of
five co-authors of the PEP along with Brandt Bucher, Tobias Kohn, Ivan
Levkivskyi, and Talin. Van Rossum's introduction did not include the PEP
itself (original
version), which is somewhat unusual, but instead consisted of the contents of
a README
file, "which is shorter and gives a gentler introduction
than the PEP itself
". It notes that the python-ideas mailing list
had "several extensive discussions
" about some kind of match
statement over the years, which were summarized by Kohn in
a blog
post in 2018. The introduction starts with a fairly simple example:
def http_error(status): match status: case 400: return "Bad request" case 401: return "Unauthorized" case 403: return "Forbidden" case 404: return "Not found" case 418: return "I'm a teapot" case _: return "Something else"
That is pretty self-explanatory except perhaps for "_", which is a wildcard that will match anything. The idea is that each case will be tried in the order specified, the first to match will be executed, and the match statement terminates; only zero or one of the case entries will be executed and there is no falling through as with C. But this example is also deceptively similar to the C switch statement. Multiple case values can be combined in a single line, however, (e.g. case 401|403:) which is somewhat more compact than the C equivalent.
The next example starts to show where the proposed match statement takes a different path:
# The target is an (x, y) tuple match point: case (0, 0): print("Origin") case (0, y): print(f"Y={y}") case (x, 0): print(f"X={x}") case (x, y): print(f"X={x}, Y={y}") case _: raise ValueError("Not a point")
If point is not a tuple of two values, it will match the wildcard case and raise the exception, otherwise it will match one of the four cases given. For the three cases that are not the origin (i.e. (0, 0)), the variables in the case get replaced with their corresponding values from point, thus they can be printed out. These variable "extractions" are new, different behavior than what may be expected in Python; a point that is (0, 1) will not only match the second case, it will bind y to the value 1. Here is some sample output:
# point = (23, 0) X=23 # point = (0, 'foo') Y=foo # point = ('0', '0') X=0, Y=0
There is lots more to the proposal, as the introductory message and the voluminous PEP describe, including using Python dataclasses, guard clauses, sequence and mapping patterns, extracting sub-patterns with the walrus operator (":=") that came from the contentious PEP 572, and more. Here is an example that (perhaps nonsensically) combines several of those in the introduction:
from dataclasses import dataclass @dataclass class Point: x: int y: int def test(item): match item: case Point(x, y) if x == y: print('A Point on the diagonal') case [ Point(x1, y1), p2 := Point(x2, y2) ]: print(f'Two Points, x coord of the first is {x1}, the second Point is {p2}') case { 'bandwidth' : b, 'latency' : l }: print(f'not a Point but it has {b} for bandwidth with {l} for latency')The first case uses a guard clause on the Point data class to restrict it to only match on points with the same value for x and y. The second uses a sequence pattern to match a sequence of two points, extracting the second into p2. The last case is checking for a mapping object that has an entry for the two fields named bandwidth and latency. These patterns can be arbitrarily complex and composed in various ways, as might be expected.
One of the areas that drew complaints was the syntax for constants. Instead of using hard-coded values in match statements, developers will likely want to use symbolic constants, as the following example shows:
RED, GREEN, BLUE = 0, 1, 2 match color: case .RED: print("I see red!") case .GREEN: print("Grass is green") case .BLUE: print("I'm feeling the blues :(")
Reaction
Antoine Pitrou had several concerns, but one of those was the "context switch" required when reading the code:
He suggested alternative syntax using with or @ rather than making the case look like the creation of a new object. In a somewhat similar vein, Greg Ewing suggested making it explicit that variables in the case statements were being bound:
E.g.
case Point(?x, 0):This would also eliminate the need for the awkward leading-dot workaround for names to be looked up rather than bound.
The PEP says this was rejected on the grounds that binding is more common than matching constants, but code is read more often than it is written, and readability counts.
Van Rossum noted
that the PEP authors "did plenty of bikeshedding in private
"
about how to specify a case condition; he suggested that if there were a
groundswell of support for some alternative, it could be adopted instead.
Daniel Moisset was strongly in favor of the overall idea, but took some of
the concerns even
further, suggesting that the use of the dot on constants makes it easy
for developers to shoot themselves in the foot:
He listed two possibilities that he found to be less dangerous: using angle brackets around to-be-captured variable names (e.g. "<x>") or effectively putting the capture variables in their own namespace by matching into a "capture object":
match get_node() into c: case Node(value=c.x, color=RED): print(f"a red node with value {c.x}") case Node(value=c.x, color=BLACK): print(f"a black node with value {c.x}") case c.n: print(f"This is a funny colored node with value {c.n.value}")
Rob Cliffe was generally in favor of the PEP, but was concerned
with using "|" for listing alternatives in matches given that it
already has a different meaning (i.e. bitwise OR). In addition, he agreed
with others who were suggesting that "else" be used instead of the
wildcard value as the default case when nothing else
matches. Using "_" "is obscure, and [wouldn't] sit well with
code that already uses that variable for its own purposes
". The
single-underscore "variable" is a convention used in Python to denote a value that is
not needed (i.e. a throwaway value); for example:
x, _, z = foo
That will unpack the three-element foo sequence and throw away the middle value. Its connection to wildcards is somewhat tenuous, though. Chris Angelico wondered if using the ellipsis ("...") made sense instead, which Barry Warsaw also favored. But Van Rossum said that he would find ellipsis confusing:
He also pointed out that other languages use "_" for wildcards as well. The current version of the PEP puts it this way:
One counterargument is that internationalization (i18n) libraries often use the underscore to indicate a string that needs translation (e.g. _('This string should be localized')), which makes the use as a wildcard somewhat confusing. The two uses would not collide, though, because the pattern for a case is treated differently than other language constructs—which may be part of what's causing some of the negative reaction to the PEP.
Marc-André Lemburg, who wrote the rejected PEP 275
("Switching on Multiple Values
") back in 2001, was happy
to see the new PEP, but was surprised, like many others, about the lack
of an else for the catch-all case. Van Rossum, who wrote
the also-rejected
PEP 3103
("A Switch/Case Statement
") in 2006 targeting Python 3,
noted
that, in the 19 years since PEP 275, "there are now some
better ideas to steal from other languages than C's switch. :-)
" He
said that the authors of the current PEP were split on the question of
else to some extent.
A wrinkle with `else` is that some of the authors would prefer to see it aligned with `match` rather than with the list of cases, but for others it feels like a degenerate case and should be aligned with those. (I'm in the latter camp.)
He said that the authors were still discussing it and, once they reach
agreement, would update the proposal. Lemburg was concerned
that "case _:" is "just too easy to
miss when looking at the body of 'match'
". But as Van Rossum and
others noted, a case with just a wildcard (whatever the syntax for
that ends up being) will be valid syntax, so an else is not
strictly needed, though it may still be desirable for other reasons.
The PEP also had a lengthy section on a __match__() protocol that could be used to customize how objects are matched, but it turned out to be a confusing idea with a lot of unclear corner cases. It drew lots of complaints and has been dropped in later versions of the PEP.
Gregory P. Smith raised concerns about confusing readers of the code, especially those who are not well-versed in Python. He posited showing a chunk of code using match to someone who does not know the language:
match get_shape(): case Line(start := Point(x, y), end) if start == end: print(f"Zero length line at {x}, {y}")I expect confusion to be the result. If they don't blindly assume the variables come from somewhere not shown to stop their anguish.
With Python experience, my own reading is:
- I see start actually being assigned.
- I see nothing giving values to end, x, or y.
- Line and Point are things being called, probably class constructions due to being Capitalized.
- But where did the parameter values come from and why and how can end be referred to in a conditional when it doesn't exist yet? They appear to be magic!
Did get_shape() return these? (i think not). Something magic and implicit rather than explicit happens in later lines. The opposite of what Python is known for.
He suggested that making the match conditions look like a call to a class
constructor would simply be too confusing. He presented alternative
syntax, but Ewing called
that "inscrutable [...] in its own way
". Paul Moore disagreed
with the idea that those with Python knowledge would be completely confused
as portrayed by Smith; it may take a bit of effort to get there, but Moore is
confident that developers would come up to speed on match fairly quickly.
Needed?
Mark Shannon questioned
whether the PEP truly outlined a serious problem that needed solving in
the language. The PEP describes some anecdotal evidence about the
frequency of the isinstance()
call in large Python code bases as a justification for the match
feature, but he found that to be a bit odd; "[...] it would be
better to use the standard library, or the top N most popular packages from
GitHub
". He also wondered why the PEP only contained a single
example from the standard library of code that could be improved using
match. "The PEP needs to show that this sort of pattern is widespread.
"
The example cited by the PEP does make it clearer why isinstance() is mentioned. The basic idea is that heterogeneous data is frequently "destructured"—objects of various sorts have their internal data pulled out in different ways—in large Python code bases. The original PEP puts it this way:
We believe this will improve both readability and reliability of relevant code. To illustrate the readability improvement, let us consider an actual example from the Python standard library:
def is_tuple(node): if isinstance(node, Node) and node.children == [LParen(), RParen()]: return True return (isinstance(node, Node) and len(node.children) == 3 and isinstance(node.children[0], Leaf) and isinstance(node.children[1], Node) and isinstance(node.children[2], Leaf) and node.children[0].value == "(" and node.children[2].value == ")")
With the syntax proposed in this PEP it can be rewritten as below. Note that the proposed code will work without any modifications to the definition of Node and other classes here:
def is_tuple(node: Node) -> bool: match node: case Node(children=[LParen(), RParen()]): return True case Node(children=[Leaf(value="("), Node(), Leaf(value=")")]): return True case _: return False
The proposed syntax is far more clear—though there are some conceptual hurdles to surmount—but it is not so obvious that the problem is truly widespread. Shannon would seem to be concerned that match may be a feature that is in search of real use cases.
Moisset said that he had put together some extensive notes on the feature. In them, he argued that the feature was not really about pattern matching and was, instead, about introducing algebraic data types, which come from functional languages, into Python. Beyond that, he also described the proposal rather more clearly than the PEP itself does, which is likely part of what got him invited to join the author group of the PEP for round two.
Just over 24 hours after his initial post, Van Rossum called for something of a pause in all of the comments pouring into the mega-thread. He noted four separate items that the PEP authors now knew were contentious (an alternate spelling for "|", else, a different wildcard token instead of "_", and what to do about the dot notation for constants) and asked that folks wait to add additional comments on those choices until the authors could come to some agreement. The final item was also meant to cover the possibility of changing to marking variables (e.g. "?foo") rather than constants as had been suggested. He asked that any other concerns with the PEP be concisely added to his new thread, which several did—though at a much-relaxed pace compared to the original.
That takes us up near the end of June in this tale, but there is more to come. The authors came back with a second version of the PEP, without the __match__() protocol, and dropping the dot notation for constants, replacing it with a requirement that constants in case entries be referenced from some namespace (thus have a dot in their representation: Color.RED), but making few other substantive changes—beyond a gentler introduction courtesy of Moisset. That set off another mega-thread along with several other threads discussing specific aspects of the PEP. We will pick up where we left off soon; stay tuned.
Netgpu and the hazards of proprietary kernel modules
On its face, the netgpu patch set appears to add a useful feature: the ability to copy network data directly between a network adapter and a GPU without moving it through the host CPU. This patch set has quickly become an example of how not to get work into the kernel, though; it has no chance of being merged in anything like its current form and has created a backlash designed to keep modules like it from ever working in mainline kernels. It all comes down to one fundamental mistake: basing kernel work on a proprietary kernel module.The use case for netgpu appears to be machine-learning applications that consume large amounts of data. The processing of this data is offloaded to a GPU for performance reasons. That GPU must be fed a stream of data, though, that comes from elsewhere on the network; this data follows the usual path of first being read into main memory, then written out to the GPU. The extra copy hurts, as does the memory-bus traffic and the CPU time needed to manage this data movement.
This overhead could be significantly reduced if the network adapter were to write the data directly into the GPU's memory, which is accessible via the PCI bus. A suitably capable network adapter could place packet data in GPU memory while writing packet headers to normal host memory; that allows the kernel's network stack to do the protocol processing as usual. The netgpu patch exists to support this mode of operation, seemingly yielding improved performance at the cost of losing some functionality; anything that requires looking at the packet payload is going to be hard to support if that data is routed directly to GPU memory.
A lot of work has been done in recent years to enable just this kind of zero-copy, device-to-device data transfer, so one might expect this functionality to be well received. And, indeed, the code was reviewed normally until the last patch in this 21-part series, where things ran into a snag. This is the patch that interfaces between the netgpu module and the proprietary NVIDIA GPU driver; it can't even be built without the NVIDIA driver's files on disk. On seeing this, Greg Kroah-Hartman stopped and complained:
Nice job, I shouldn't have read the previous patches.
Please, go get a lawyer to sign-off on this patch, with their corporate email address on it. That's the only way we could possibly consider something like this.
What followed was an occasionally harsh and acrimonious discussion on whether the patches should have ever been posted in the first place. Jonathan Lemon, the author of the patches, insisted that they were not about providing functionality to the proprietary NVIDIA module in particular:
While the current GPU utilized is nvidia, there's nothing in the rest of the patches specific to Nvidia - an Intel or AMD GPU interface could be equally workable.
Others disagreed, though, stating that the code was clearly designed around the NVIDIA module from the beginning. Christoph Hellwig argued that any upstream-oriented driver should be based on the existing P2PDMA framework, which exists just to support device-to-device data transfers. Jason Gunthorpe agreed, and argued that the design of the module as a whole was driven by NVIDIA's choices:
By the time the discussion wound down, it was clear that this patch set wasn't going anywhere in its current form. Large amounts of work will have to be done to build it on top of the existing kernel mechanisms for cross-device data movement — P2PDMA, the DMA-buf subsystem, etc. This work may have achieved its initial goal, but it clearly went far down the wrong path when it comes to being merged into the mainline kernel.
The sad part is that, by all appearances, the goal of this work was not to add functionality for NVIDIA GPUs in particular. Lemon does not seem to be an NVIDIA employee; the patches included a Facebook email address. But NVIDIA, with its proprietary module, was what was at hand, so that is the device that the patch set was designed to work with. Designing the module to work with free GPU drivers from the outset would have driven a number of decisions in different directions and avoided much of the trouble that has ensued.
Meanwhile, in an attempt to make this sort of mistake harder to make (and, surely only by coincidence, make life a bit harder for proprietary modules in general), Hellwig has posted a patch set changing the way GPL-only symbols are handled. Symbols exported as GPL-only by the kernel are made unavailable to proprietary modules, but there has always been a bit of a loophole in how this is enforced. A proprietary module can be broken into two parts, one of which is a minimal shim layer that interfaces between the kernel and the proprietary code. If the shim module is GPL-licensed, it can access GPL-only symbols, which it can then make available to the proprietary module.
Hellwig's patch does not entirely close this loophole, but it makes exploiting it a bit harder. With this patch applied, any module that imports symbols from a proprietary module is itself marked as being proprietary, denying it access to GPL-only symbols. If the shim module has already accessed GPL-only symbols by the time it gets around to importing symbols from the proprietary module, that import will not be allowed. This new restriction would keep the netgpu-NVIDIA interface module from loading, impeding the development of such modules in the future. It still does not cover the case, though, of a shim module that exports its own symbols to a proprietary module without importing anything from that module.
For as long as the kernel has had a module loader, there have been debates
over proprietary modules. In 2006, the development community seriously discussed banning them entirely.
Two years later, a long list of kernel developers signed a position statement stating that proprietary
modules are "detrimental to Linux users, businesses, and
the greater Linux ecosystem
". These modules continue to exist,
though, and do not appear to be going away anytime soon. This episode,
where a proprietary module helped send a significant development project in
the wrong direction and makes it nearly impossible to implement this
functionality in a way that works with all GPUs,
demonstrates one of the reasons why the development community sees
those modules as being harmful.
Some statistics from the 5.8 kernel cycle
Linus Torvalds released the 5.8 kernel on August 2, concluding another nine-week development cycle. By the time the work was done, 16,306 non-merge changesets had been pulled into the mainline repository for this release. That happens to be a record, beating the previous record holder (4.9, released in December 2016) by 92 changesets. It was, in other words, a busy development cycle. It's time for our traditional look into where that work came from to see what might be learned.A total of 1,991 developers contributed to 5.8, which is another record; 304 of those developers appeared for the first time in this cycle. The community added over 924,000 lines of code and removed around 371,000 for a net growth of over 553,000 lines of code. The most active developers for 5.8 were:
Most active 5.8 developers
By changesets Mauro Carvalho Chehab 549 3.4% Christoph Hellwig 354 2.2% Andy Shevchenko 223 1.4% Jason Yan 205 1.3% Chris Wilson 199 1.2% Jérôme Pouiller 175 1.1% Thomas Gleixner 156 1.0% Gustavo A. R. Silva 136 0.8% Masahiro Yamada 133 0.8% Miquel Raynal 125 0.8% Leon Romanovsky 114 0.7% Sean Christopherson 109 0.7% Geert Uytterhoeven 101 0.6% Colin Ian King 101 0.6% Daniel Vetter 99 0.6% Al Viro 98 0.6% Peter Zijlstra 95 0.6% Christophe Leroy 93 0.6% Lorenzo Bianconi 89 0.5% Serge Semin 87 0.5%
By changed lines Mauro Carvalho Chehab 272614 25.8% Oded Gabbay 80603 7.6% Yan-Hsuan Chuang 15798 1.5% Arnd Bergmann 13082 1.2% Jack Wang 12895 1.2% Thomas Bogendoerfer 11161 1.1% Christoph Hellwig 10940 1.0% Omer Shpigelman 10861 1.0% Ryder Lee 10076 1.0% Chris Wilson 8682 0.8% David Howells 8130 0.8% Serge Semin 7520 0.7% Andrii Nakryiko 6189 0.6% Thomas Gleixner 5695 0.5% Marco Elver 5619 0.5% Peter Zijlstra 5533 0.5% Boris Brezillon 5451 0.5% Leon Romanovsky 5399 0.5% Ping-Ke Shih 5173 0.5% Bryan O'Donoghue 4953 0.5%
In this cycle, Mauro Carvalho Chehab managed to get to the top of both the by-changesets and by-lines columns. Much of his work was focused on documentation, converting more files to RST and reworking the video4linux2 user-space manual, but he also put a lot of work into resurrecting the atomisp camera driver, which had been removed from the staging tree. Christoph Hellwig has done significant work throughout the kernel's memory-management, filesystem, and block subsystems. Andy Shevchenko improved a number of different drivers, Jason Yan performed code cleanups across the kernel, and Chris Wilson, as usual, did a lot of work on the i915 graphics driver.
In the lines-changed column, Oded Gabbay added a massive set of machine-generated register definitions for the Habana Gaudi processor. Yan-Hsuang added a set of machine-generated data for the Realtek rtw88 wireless driver that looks rather more like binary code than source. Arnd Bergmann did cleanup work all over as usual; part of that work was deleting support for the never-realized sh5 subarchitecture. Jack Wang contributed the rndb (a network block device using RDMA) driver.
While the number of developers contributing to the kernel set a new record, the number of companies supporting them remains about flat at 213. The companies supporting the most work in the 5.8 cycle were:
Most active 5.8 employers
By changesets Intel 1939 11.9% Huawei Technologies 1399 8.6% (Unknown) 1231 7.5% Red Hat 1079 6.6% (None) 1016 6.2% 791 4.9% IBM 542 3.3% (Consultant) 515 3.2% Linaro 513 3.1% AMD 503 3.1% SUSE 463 2.8% Mellanox 445 2.7% NXP Semiconductors 330 2.0% Renesas Electronics 322 2.0% Oracle 252 1.5% Code Aurora Forum 248 1.5% 247 1.5% Arm 239 1.5% Silicon Labs 175 1.1% Linux Foundation 171 1.0%
By lines changed Huawei Technologies 293365 27.8% Habana Labs 93213 8.8% Intel 88288 8.4% (None) 47655 4.5% (Unknown) 36786 3.5% Linaro 36322 3.4% Red Hat 34737 3.3% 34209 3.2% IBM 24233 2.3% Mellanox 23364 2.2% Realtek 22767 2.2% AMD 21411 2.0% NXP Semiconductors 21328 2.0% (Consultant) 15418 1.5% 14874 1.4% MediaTek 14751 1.4% SUSE 13659 1.3% 1&1 IONOS Cloud 13219 1.3% Code Aurora Forum 11865 1.1% Renesas Electronics 11077 1.1%
For the most part, this table looks fairly familiar, but the fact that Huawei has moved up to the top of the list may come as a bit of a surprise. Much of this is the result of Chehab's work described above, but Huawei's contribution this time around is rather larger than that. A great deal of effort has gone into freezing Huawei out of the commercial marketplace in significant parts of the world, but the company remains active in the development community with 92 developers contributing to 5.8. For the curious, Huawei's work was mostly focused in these subsystems:
Subsystem Changesets Documentation 226 drivers/net 226 drivers/staging 222 fs 73 drivers/media 62 drivers/scsi 62 drivers/gpu 49 net 49 include 38 sound 22 security 21 kernel 18
In summary, 907 of the patches coming from Huawei (65% of the total) applied somewhere in the driver subsystem, but quite a bit of the company's work was spread out over the rest of the kernel as well.
The kernel depends on people who run tests and report bugs; no kernel developer can hope to test every hardware combination and workload out there. The most active contributors in this area in 5.8 were:
Test and report credits in 5.8
Tested-by Aaron Brown 97 9.1% Andrew Bowers 90 8.5% Arnaldo Carvalho de Melo 53 5.0% Hoan Tran 21 2.0% Marek Szyprowski 19 1.8% Serge Semin 16 1.5% David Heidelberg 14 1.3% Peter Geis 14 1.3% Jasper Korten 13 1.2% Tomasz Maciej Nowak 12 1.1%
Reported-by Hulk Robot 243 19.8% kernel test robot 178 14.5% Syzbot 70 5.7% Dan Carpenter 33 2.7% Stephen Rothwell 26 2.1% Randy Dunlap 20 1.6% Guenter Roeck 13 1.1% Qian Cai 11 0.9% Greg Kroah-Hartman 8 0.7% Lars-Peter Clausen 8 0.7%
Automated testing systems continue to report (by far) the most bugs, but this important work is not limited to such systems.
Patch review is also important, that's how we keep bugs from needing to be reported in the first place. While not all review results in a Reviewed-by tag, there is still a signal to be seen by looking at those tags:
Review credits in 5.8 Rob Herring 183 2.6% Christoph Hellwig 179 2.6% Alexandre Chartre 128 1.8% Andy Shevchenko 125 1.8% Ranjani Sridharan 121 1.7% Andrew Lunn 113 1.6% Darrick J. Wong 107 1.5% Florian Fainelli 94 1.4% Jiri Pirko 88 1.3% David Sterba 83 1.2% Hannes Reinecke 81 1.2% Ursula Braun 79 1.1% Alex Deucher 78 1.1% Stephen Boyd 78 1.1% Kees Cook 78 1.1%
Of the patches merged for 5.8, 5,470 (34% of the total) carried Reviewed-by tags. The last few kernel releases have consistently had such tags in almost exactly one-third of the changesets merged.
The end result of all this is that the kernel-development community continues to work at a high rate. If the ongoing pandemic has had any effect at all, it would appear to have made things go even faster. It will be interesting to see if this trend continues into the 5.9 development cycle; tune in at the beginning of October for the answer to that question.
Go filesystems and file embedding
The Go team has recently published several draft designs that propose changes to the language, standard library, and tooling: we covered the one on generics back in June. Last week, the Go team published two draft designs related to files: one for a new read-only filesystem interface, which specifies a minimal interface for filesystems, and a second design that proposes a standard way to embed files into Go binaries (by building on the filesystem interface). Embedding files into Go binaries is intended to simplify deployments by including all of a program's resources in a single binary; the filesystem interface design was drafted primarily as a building block for that. There has been a lot of discussion on the draft designs, which has been generally positive, but there are some significant concerns.
Russ Cox, technical lead of the Go team, and Rob Pike, one of the creators of Go, are the authors of the design for the filesystem interface. Cox is also an author of the design for file embedding along with longtime Go contributor Brad Fitzpatrick. Additionally, Cox created YouTube video presentations of each design for those who prefer that format (the filesystem interface video and the file-embedding video). Both designs are quick to note that they are not (yet) formal proposals:
This is a Draft Design, not a formal Go proposal, because it describes a potential large change that addresses the same need as many third-party packages and could affect their implementations (hopefully by simplifying them!). The goal of circulating this draft design is to collect feedback to shape an intended eventual proposal.
Many smaller language and library changes are discussed on the GitHub issue tracker, but for these larger discussions the Go team is trying to use r/golang Reddit threads to scale the discussion — GitHub issues do not have any form of threading, so multiple conversations are hard to keep track of. There is a Reddit thread for each draft—the filesystem interface thread and the file-embedding thread—with quite a few comments on each. There is also a lengthy Hacker News thread that discusses the file-embedding design.
A filesystem interface
The crux of the filesystem interface design is a single-method interface named FS in a new io/fs standard library package:
type FS interface { Open(name string) (File, error) }
This means that every filesystem implementation must at least implement the ability to open a file by name, returning a File as well as an error. The File interface is defined as follows:
type File interface { Stat() (os.FileInfo, error) Read(buf []byte) (int, error) Close() error }
In other words, a file has the following characteristics: is able to provide
file information like that returned from
stat(),
is able to be read, and can be closed. These are the bare minimum
that a conforming filesystem needs to provide, but an implementation
"may also provide other methods to optimize operations or add new
functionality
". The standard library's file type (os.File) already
implements these three methods, so it is a conforming fs.File
implementation.
If a File is actually a directory, the file information returned by Stat() will indicate that; in that case, the File returned from Open() must also implement the Readdir() method on top of the File interface. Readdir() returns a list of os.FileInfo objects representing the files inside the directory.
Filesystem implementations can expose additional functionality using
what the design calls an "extension interface", which is an interface that
"embeds a base
interface and adds one or more extra methods, as a way of specifying
optional functionality that may be provided by an instance of the base
interface.
" For example, it is common to read a whole file at once,
and for in-memory filesystem implementations, it may be inefficient to do
this using Open(), multiple calls to Read(), and
Close(). In cases like this, a developer could implement the
ReadFile() method as defined in the ReadFileFS
extension interface:
type ReadFileFS interface { FS // embed the filesystem interface (Open method) ReadFile(name string) ([]byte, error) }
Along with the extension interface, the design adds a ReadFile() helper function to the io/fs package that checks the filesystem for the ReadFileFS extension, and uses it if it exists, otherwise it falls back to performing the open/read/close sequence. There are various other extension interfaces defined in the draft proposal, including StatFS, ReadDirFS, and GlobFS. The design does not provide ways to rename or write files, but that could also be done using extensions.
In addition to the new io/fs types and helper functions, the design suggests changes to various standard library packages to make use of the new FS interface. For example, adding a ParseFS() method to the html/template package to allow parsing templates from an in-memory filesystem, or making the archive/zip package implement FS so that developers can treat a zip file as a filesystem and use it wherever FS is allowed.
Much of the feedback on the Reddit discussion has been positive, and it seems like an interface of this kind is something that developers want. However, one of the criticisms made by several people is about the drawbacks of extension interfaces. "Acln0" summarized the concerns:
I have only one observation to make, related to extension interfaces and the extension pattern. I am reminded of http.ResponseWriter and the optional interfaces the http package makes use of. Due to the existence of these optional interfaces, wrapping http.ResponseWriter is difficult. Doing it "generically" involves a combinatorial explosion of optional interfaces, and it's easy to go wrong in a way that looks like this: "we added status logging by wrapping http.ResponseWriter, and now HTTP/2 push doesn't work anymore, because our wrapper hides the Push method from the handlers downstream".
Peter Bourgon, a well-known Go blogger and speaker, believes
that this use of extension interfaces means that it "becomes
infeasible to use the (extremely useful) decorator pattern. That's really
unfortunate. To me that makes the proposal almost a non-starter; the
decorator pattern is too useful to break in this way.
" The decorator
pattern wraps an interface and adds some functionality. It is often used
for logging or authentication middleware in web servers; in the context of
filesystems it would likely be used to add a caching or transformation
layer. If a middleware author does not take into account the various
optional interfaces, the resulting wrapper will not support them. Nick
Craig-Wood, author of Rclone, a
cloud-storage tool written in Go, likes the proposal but expressed
similar concerns: "Extension (or optional as I usually call them)
interfaces are a big maintenance burden - wrapping them is really
hard
".
The design states that "enabling that kind of middleware is a key
goal for this draft design
", so it would seem wise for the design's
authors to tackle this problem head on. Cox hasn't yet proposed a solution,
but acknowledged
the issue: "It's true - there's definitely a tension here between
extensions and wrappers. I haven't seen any perfect solutions for
that.
".
Another concern
came from "TheSwedeheart" regarding contexts (the standard way
in Go to explicitly propagate timeouts, cancellation signals, and
request-scoped values down a call chain): "One thing I'm missing to migrate [his
virtual filesystem] over to this is support for propagating contexts to
each operation, for cancellation.
". Cox replied
that a library author could "probably pass the context to a
constructor that returns an FS with the context embedded in it, and then
have that context apply to the calls being made with that specific
FS.
" As "lobster_johnson" pointed out, this goes against the context package's
guideline to explicitly pass context as the first function argument, not
store a context inside a struct. However, Cox countered
with an example of http.Request doing something similar:
"Those are more guidelines than rules. [...] Sometimes it does make
sense.
"
There are of course the usual bikeshedding threads that debate naming;
"olegkovalov" said:
"I'm somewhat scared about io/fs name, fs is a good
variable name, it'll cause many troubles to the users when io/fs
will appear
". After some back-and-forth, Cox stressed
the need for a short name to keep the focus on application developers rather than on the filesystem
implementers:
You're focusing on the file system implementers instead of the users. Code referring to things like os.FileInfo, os.ModeDir, os.PathError, os.ErrNotExist will all now refer canonically to fs.FileInfo, fs.ModeDir, fs.PathError, fs.ErrNotExist. Those seem much better than, say, filesystem.ErrNotExist. And far more code will be referring to those names than implementing file systems.
Embedding files in binaries
The other draft design proposes a way to embed files (or
"static assets") in Go binaries and read their contents at runtime. This
simplifies releases and deployments, since developers can simply copy around a
large binary with no external dependencies (for SQL snippets, HTML
templates, CSS and JavaScript assets for a web application, and so on). As
the document points out, there are already over a dozen third-party tools
that can do this, but "adding direct support to the go
command for the basic functionality of embedding will eliminate the need
for some of these tools and at least simplify the implementation of
others
". Including embedding in the standard go tool will
also mean there is no pre-build step to convert files to data in Go source
code, and no need to commit those generated files to version control.
The authors of the design make it clear that this is a tooling change, not a Go language change:
Another explicit goal is to avoid a language change. To us, embedding static assets seems like a tooling issue, not a language issue. Avoiding a language change also means we avoid the need to update the many tools that process Go code, among them goimports, gopls, and staticcheck.
The go tool already looks for special comments in Go source files for various things, including // +build tags to include certain files only on specific architectures, and //go:generate comments that tell go generate what commands to run for code-generation purposes. This file-embedding design proposes a new //go:embed comment directive that goes directly above a variable declaration and tells go build to include those files in the resulting binary associated with the variable. Here is a concrete example:
// The "content" variable holds our static web server content. //go:embed image/* template/* //go:embed html/index.html var content embed.Files
This would make go build include all the files in the image and template directories, as well as the html/index.html file, and make them accessible via the content variable (which is of type embed.Files). The embed package is a new standard library package being proposed that contains the API for accessing the embedded files. In addition, the embed.Files type implements the fs.FS interface from the filesystem design discussed above, allowing the embedded files to be used directly with other standard library packages like net/http and html/template, as well as any third-party packages that support the new filesystem interface.
The design limits the scope of the proposal in an important way. There are many ways that the data in the files could be transformed before being included in the binary: data compression, TypeScript compilation, image resizing, and so on. This design takes a simple approach of just including the raw file data:
It is not feasible for the go command to anticipate or include all the possible transformations that might be desirable. The go command is also not a general build system; in particular, remember the design constraint that it never runs user programs during a build. These kinds of transformations are best left to an external build system, such as Make or Bazel, which can write out the exact bytes that the go command should embed.
Again, the feedback on the Reddit thread was mostly positive, with
comments like this
one from "bojanz": "This looks like a great start. Thank you for
tackling this.
" There are a few minor suggestions, such as a comment
by "zikaeroh" in favor of adding a more powerful path-matching API that
supports double star for recursive path matching, like glob('**/*.png',
recursive=True) in
Python. Kevin Burke, who is the maintainer of a file-embedding package,
suggested
also storing a cryptographic hash of each file's content so the developer
does not have to hash the file at runtime: "This is useful for
e.g. cache busting on a static file server
".
One of the repeated critiques is from developers who don't like
overloading source code comments with the special //go:embed
syntax. "Saturn_vk" stated
bluntly, "I really don't like the fact that comments are being abused
for actual work
", and Hacker News commenter "breakingcups" strongly advocated
for the use of a project file instead of directives in comments:
Again, more magic comments.
The proposed feature is great, but the unwillingness of the Go team to use a separate, clearly defined project file or at the very least a separate syntax in your code file leads them to stuff every additional feature into comments, a space shared by human notetaking.
Cox summed up his thinking about this with the following comment, which compares the syntax with #pragma for C:
For what it's worth, we already have //go:generate and a few other lesser known ones. And there is a separate draft design to replace // +build with //go:build. At that point we will be completely consistent: these kinds of directives begin with //go:. The point is to look enough like a comment to make tools that don't need to know ignore them, but enough not like a comment to signal to people that something special is going on.
C uses #pragma foo for this. Go simply spells #pragma as //go:.
Next up
There is a fair amount of community support for both draft designs, particularly the more user-facing proposal for file embedding. Many developers are already using third-party file-embedding libraries to simplify their deployments and these efforts will standardize that tooling. It seems likely that the designs will be refined and turned into full proposals. With Go 1.15 due out on August 1, it's possible that these proposals would be ready for Go 1.16 (scheduled for six months out), but if there needs to be another round of feedback — for example, regarding the problems with extension interfaces — it is more likely to be included in Go 1.17 in a year's time.
Checking out FreeCAD
Our look at running a CNC milling machine using open-source software led me to another tool worth looking at: FreeCAD. I wasn't previously familiar with the program, so I decided to check it out. In this article I will walk through my experiences with using FreeCAD for the first time to do a variety of CNC-related tasks I normally would have used a commercial product for. I had varying degrees of success in my endeavors, but in the end came away with a positive opinion.
FreeCAD is an LGPL v2+ licensed CAD and CAM program written in Python and C++. The first release of the project was in 2002, and its last stable version 0.18.4 was released in October 2019. The project's GitHub page indicates that it has 271 contributors with new commits happening often (generally more than 50 a week). Beyond code contributions, FreeCAD has a welcoming community with active forums to answer any questions users might have along the way. FreeCAD is designed to be cross-platform, supporting Linux, macOS, and Windows, with binary releases provided by the OS-independent package and environment management system Conda.
I decided to take on a relatively simple CNC project: milling a new street-address sign for my home. The plan called for a 700mm x 150mm sign, and I decided to mill it out of a plank of maple wood. The design I have in mind is pretty straightforward, so it should be a great way to put FreeCAD through a test on a real project. I also looked at using FreeCAD for taking existing models that are available online with an open license and importing them for milling (in this case, a wooden spoon).
It is worth noting that before this effort I had never used FreeCAD before. My personal goal is to become fluent enough with FreeCAD that I can replace my dependence on the commercial CAD software I presently use in my design work. The goal of this article, however, is to share what my experience with FreeCAD was, and provide a glimpse of FreeCAD from the perspective of an inexperienced user.
Getting FreeCAD
FreeCAD is available for download on the FreeCAD project website, along with
development builds for the upcoming 0.19.0 release. One thing I learned as I
went, however, was that I was much better off using development snapshots
than I was using the pre-compiled binaries from the releases. The stable
release worked fine, but I quickly realized that features I would have
expected to be in the project only exist in 0.18.4 as "experimental" features
requiring special effort to enable. Specifically, 3D surface-path generation
was a key technology I simply had to have to consider a switch. For
unfamiliar readers, this feature allows you to carve a model's outer shell
out of a piece of stock material using a CNC — in a way similar to carving a
statue out of a block of marble. I discovered on
the forums that the Path Workbench for FreeCAD
is under active development for 0.19, including 3D surface-paths, and
"many bug fixes and new features
" related to path generation are
in the upcoming release.
Thankfully, FreeCAD provides a pretty easy-to-use package system to install regular development snapshots. With a few commands and a little patience, I was able to get a development-tree build of FreeCAD 0.19 running on my laptop with little hassle.
My first test: a spoon
Before I decided to try making something from scratch using FreeCAD, I wanted to try a pretty common workflow in the open-source maker world: download a model someone else has created, and make it. For me this meant taking an STL file I had handy for a spoon and trying to use FreeCAD to design the tool paths to carve it out. As simple as a spoon is, carving one on a CNC is a fairly complicated effort and a good test of the project's capabilities. To succeed, you have to carve the top of the wood stock, flip the wood over, and then carve the bottom as a separate job. This experiment introduced me to several of FreeCAD's features: importing the STL, editing the model, setting up a CNC job, building tool paths for both the top and bottom, and finally generating G-code. Regrettably, my experiment did not work as well as I would have hoped since I ran into multiple problems.
The first problem I encountered was an overwhelming amount of somewhat disorganized (or even incorrect) documentation. The FreeCAD project has a pretty massive wiki dedicated to documentation on using the application. For someone who knew nothing (me), it took a great deal of reading and effort just to get any sense at all of how to accomplish things that are fairly straightforward in other tools. One of the places I found the documentation most lacking was a clear explanation of the multitude of FreeCAD workbenches. These workbenches are at the heart of any design work being done in FreeCAD, but conceptually their function and interoperability with each other were not clearly explained. As a result, getting to the point where I had a basic understanding to even attempt to make my spoon was more difficult than I had otherwise hoped.
When trying to use FreeCAD, readers should be prepared to spend a considerable amount of effort gaining an understanding of the program's foundational concepts, how those concepts map to the workbenches, and to struggle early on doing even simple tasks. For example, if the goal is resizing an STL model imported via the Mesh Workbench, you will find that the tools to do so don't exist in that workbench. Instead, the imported mesh must first be converted to a "shape", then a "solid" using the Part Workbench. Once it is a solid, you must create a clone of it using the Draft Workbench (strangely billed as a tool for 2D work). This clone, once created through these many steps, then can finally be resized as needed. Coming from previous experience of clicking on an STL and then simply resizing it at will, I found the FreeCAD process to do the same felt incredibly burdensome.
After a good amount of searching online, watching videos, and reading forum posts, I did succeed in importing the STL file of my spoon. Ultimately I roughly followed the process outlined here. That said, I quickly learned that FreeCAD is considerably less capable than other options when it comes to working with complex STL meshes; the processing time between actions taken in my tests was often measured in minutes. This is, for example, compared to the much simpler CAMLab browser-based CAM system that was able to load and work with the same model in seconds. It's unclear exactly where the performance issues are. Ultimately I was able to side-step the issue by reducing the complexity of my STL spoon model using online tools — at the cost of detail in the final output.
In the end, I abandoned my attempt to make a spoon after I encountered a
problem with the 3D surface-path tool and decided to move on to my more
straightforward street-address sign. In my questions to the FreeCAD forum,
another user indicated they weren't able to reproduce the problem I
experienced, saying "I suspect you had been working on the same file
for awhile, and eventually 'stuff' builds up.
" Working on a file for
long periods of time shouldn't break things, but working from a development
branch of the project, those sorts of issues are to be expected. Despite the
problems I had, many of which can certainly be chalked up to my lack of
understanding, it is worth noting that the FreeCAD community was both engaged
and extremely helpful to me as a newcomer. I will certainly be revisiting my
quest for a milled-at-home cooking spoon using FreeCAD in the future to see
if perhaps a fresh start with more knowledge might fix my problems.
Take two: a street-address sign
For my second experiment, I decided to start from scratch and try to make a street-address sign for my home. Unlike the spoon experiment, requiring complex tool paths over a curved 3D surface (on two sides), my sign design was much more straightforward; the sign only needed to be milled on one side to create a relief of the street-address text in the center.
This project allowed me to try to use the CAD portions of FreeCAD to make this simple 3D model. The basic steps to build the model were: create a properly-sized 3D-rectangle 19mm thick (the size of my board), create an indentation (or "pocket") with a 10mm border, and then add the raised address centered in the middle.
Again, as was the case with trying to import an STL file, I found myself jumping back and forth between multiple workbenches to design my model. I was able to follow this tutorial to get an understanding of how most of the steps were done and to apply them to my specific project. While I ultimately did succeed in creating my model, doing so felt much more complicated than my experiences with other tools. For the sign, here's what I ended up with (replacing the real street-address with "LWN"):
Everything in FreeCAD is stored as an object with properties in a hierarchy of dependencies, and the various workbenches allow you to modify different aspects of that hierarchy. When you modify a model, the changes propagate throughout the hierarchy. This makes building models more complicated up front, but also makes them quite flexible once they are designed.
For example, once my sign was complete, changing my real street-address to "LWN" for the screenshot above required only modifying a single text property and adjusting the alignment to re-center it in the sign. Experienced FreeCAD users may point out that I would have also been able to use constraints to auto-center my text, but in my case, I was satisfied to simply eyeball it. Had I used constraints, I likely would have only needed to modify the text to create an entirely new sign properly centered. This approach to modeling, while complicated, is one of the more important conceptual features of FreeCAD: because models are built up based on properties of other models in a hierarchy, it is possible to create a "sign" model that can be quickly changed for the need at hand (different text, different size, etc.) If I were to be designing a sign to sell, for example, it would be a huge time-saver to simply modify one property with a new address to make that new sign.
Path generation
With the model made, it was time to try using the FreeCAD path-generation tooling via the Path Workbench. This is, as previously mentioned, a focus of active development by the FreeCAD community for the 0.19 release.
The Path Workbench's primary purpose is to generate tool paths based on a 3D-model for my CNC router to follow as it cuts the wood. Tool paths come in various types, and which of them makes sense to use depends on many different variables. Based on the model I was attempting to mill, I selected two primary tool path types: pocket paths and profile paths. Pocket paths cut a "pocket" into the material (such as the two empty spaces in the number "8"), and profile paths trace the edge of a model — primarily used to cut the model out of a piece of stock.
As is common with other CAM software packages, FreeCAD provides a tool manager where users can store the details of the various cutting end mills they have available for their CNC. These tools can then be imported into a job object, to be used during tool path operations that are to be generated. For my project (the name of my street has no closed-loops), I needed to create five separate pocket tool path operations — four for the street numbers "9", "8" (two pockets), and "4" in my address, and one for the large pocket of the sign itself (the rest of the street address).
I also needed to create a profile tool path operation to cut my sign out of the larger wood stock it was being carved from. When cutting out a model using this sort of method, it is common to create "tabs" that keep the thing being cut out attached to the stock. Without them, as the part was cut out it would begin to loosen or detach completely from the stock in the middle of cutting it. For FreeCAD, this again highlights the object-hierarchy philosophy of the tool. First, the profile-tool path object is created without these cut-out tabs, then the profile-path object is replaced with what FreeCAD calls a "dressup", based on the original profile-path object as a dependency. There are many different dressups available for a profile-tool path; I used the "tag" dressup. This modification to the profile adds cut-out tabs at reasonably spaced locations on the model. Here is the address sign, complete with all of the generated tool paths needed to carve it out from the stock wood using "LWN" as the sign text:
G-code generation and simulation
Once all the tool paths are created, it is now time to take those tool paths and convert them to a language the CNC controller will understand: G-code. For FreeCAD, this is one of the most impressive aspects of the project I have explored so far. According to the documentation, FreeCAD represents paths in an internal format of G-code and then converts that format into a specific G-code dialect used by a machine in a process the project calls "post-processing". While in this internal state, however, it can be simulated within FreeCAD to see exactly how the material will be removed from the stock. The simulations help when trying to make sure that the way the CNC will cut things is exactly what you expect it to be — before actually cutting anything. In my project, I frequently ran the simulator to test different settings in my tool paths to find the ones that worked best for the job. Various G-code simulation tools exist, but having one built in and just a click away was nice.
When it is time to export the job to G-code to be sent to the CNC for actual cutting, FreeCAD supports eleven different G-code dialect post-processors along with the ability to add your own. These post-processors take the internal G-code format of FreeCAD and translate it into a G-code dialect compatible with the controller being used. In my case, my CNC runs Grbl on the controller, which is supported by FreeCAD.
Wrapping up
In the end, I would consider the experiments using FreeCAD as my go-to CAD
and CAM tool to be a success. While I have various gripes, in fairness I
discovered what the project freely admits: FreeCAD's "learning curve
can be steep
". Part of the reason that the program is difficult to
learn is that it has so many capabilities, being suitable for a large array
of varying CAD tasks. There are many other features we simply don't have time
to cover, such as being able to write Python scripts to perform
tasks, create
architectural diagrams, and 2D drafting. With all of
these features and complexity, it's easy to get stuck, but the active FreeCAD
community proved itself welcoming and willing to help when it happened to me.
For those of us who seek out open-source alternatives, FreeCAD is certainly
worth a look if you do any modeling work.
Page editor: Jonathan Corbet
Inside this week's LWN.net Weekly Edition
- Briefs: LF Open Source Security Foundation; Linux 5.8; Grub2 update woes; Julia 1.5; LibreOffice 7; systemd 246; Quotes; ...
- Announcements: Newsletters; conferences; security updates; kernel patches; ...