Leading items

Welcome to the LWN.net Weekly Edition for July 2, 2020

This edition contains the following feature content:

The (non-)return of the Python print statement: Python's founder proposes a significant syntax change.
Four years of Zephyr: an overview of the Zephyr system and its history so far.
Emulating Windows system calls in Linux: several options for helping Wine handle Windows system calls.
Stirring things up for Fedora 33: the next Fedora release could have a number of significant changes.
First PHP 8 alpha released: what's coming in the next major PHP release.
Managing tasks with todo.txt and Taskwarrior: a tour of a couple of text-oriented to-do list managers.
Generics for Go: after years, Go may finally get generic types.

This week's edition also includes these inner pages:

Brief items: Brief news items from throughout the community.
Announcements: Newsletters, conferences, security updates, patches, and more.

Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.

Comments (none posted)

The (non-)return of the Python print statement

By Jake Edge
July 1, 2020

In what may have seemed like an April Fool's Day joke to some, Python creator Guido van Rossum recently floated the idea of bringing back the print statement—several months after Python 2, which had such a statement, reached its end of life. In fact, Van Rossum acknowledged that readers of his message to the python-ideas mailing list might be checking the date: "No, it's not April 1st." He was serious about the idea—at least if others were interested in having the feature—but he withdrew it fairly quickly when it became clear that there were few takers. The main reason he brought it up is interesting, though: the new parser for CPython makes it easy to bring back print from Python 2 (and before).

Prior to Python 3, the print statement was the usual way to print output to the screen:

    >>> print '1 + 2 = ', 1+2
    1 + 2 = 3

But Python 3 changed print from a statement to the print() function. Of the changes for Python 3, switching to print() was perhaps one of the easiest, but it still led to a fair number of complaints. It was rather straightforward for Python 2 code to adopt the new behavior using:

    from __future__ import print_function

But the change did break a lot of working code, so perhaps it makes sense to bring back the print statement, Van Rossum said.

The new parser is based on a parsing expression grammar (PEG) and it was a fairly simple matter to make the change: "One thing that the PEG parser makes possible in about 20 lines of code is something not entirely different from the old print statement." He has a prototype working, but it enables far more than just print statements:

But wait, there's more! The same syntax will make it possible to call *any* function:

>>> len "abc"
3

Or any method:

>>> import sys
>>> sys.getrefcount "abc"
24

Really, *any* method:

>>> class C:
...   def foo(self, arg): print arg
... 
>>> C().foo 2+2
4

He noted that there are some downsides too, including a "bare" print statement being interpreted as the print() function and not a call to it. Potentially more problematic is the behavior when the first argument to the print statement uses parentheses.

>>> print (1, 2, 3)
1 2 3

[...]

>>> print (2+2), 42
4
(None, 42)

Those examples may be puzzling to some readers. In the first, it might seem that a tuple is being passed to the print statement, but the space is not significant; the parser sees a call to the print() function with three arguments. In the second, the same behavior can be seen in Python 3 with "print (2+2), 42". The Python read-eval-print loop (REPL) evaluates print first (either as a statement or a function), which results in the output of "4"; it then prints the result of the whole statement, which is a tuple containing the return value of print (i.e. None) and 42.

Currently a bunch of effort is made in the parser to recognize code that is trying to use the print statement, so that it results in a SyntaxError that suggests adding parentheses ("Did you mean print('hello world')"). That code could be removed if print was resurrected. Van Rossum said that it was not an "all or nothing" proposal, it could be dialed back somewhat by restricting what kinds of function calls it would work for or restricting it only to "print". He also noted that he would withdraw the idea "if the response is a resounding 'boo, hiss'".

It may not have been resounding, but the response was definitely mostly of the "boo, hiss" variety (including some that used that phrase directly, of course). Ethan Furman said that while too many parentheses made code hard to read for him, too few is also problematic, so he was not in favor of any change. Naomi Ceder had mixed feelings; she has struggled to switch to the print() function over the last five years (after 15 years of print without parentheses). "As someone who teaches Python and groans at explaining exceptions, I'm -0 on print without parens and -1 on other calls without parens."

Gregory P. Smith agreed with Ceder, but took it further ("-1 overall for me..."), even though Smith liked the print statement for Python 2 and earlier. Beyond just print, though, he is concerned that calling functions without requiring parentheses will just lead to "a whole new world of typos and misunderstandings". Other languages allow that kind of thing, but Python is not those languages. "I love that the new parser allows us to even explore these possibilities. We should be very cautious about what syntax changes we actually adopt."

Overall, Smith's sentiment seemed popular; there were some who viewed parts of the idea favorably, but the full-blown proposal, including making any kind of call without parentheses, was not well-liked. For his part, Van Rossum said there was an element of "because we can" to the idea; he was pleasantly surprised at how easy it was to make it work with the new parser. He described why the PEG parser made things so much easier in his initial message:

[...] The PEG parser makes this much simpler, because it can simply backtrack -- by placing the grammar rule for this syntax (tentatively called "call statement") last in the list of alternatives for "small statement" we ensure that everything that's a valid expression statement (including print() calls) is still an expression statement with exactly the same meaning, while still allowing parameter-less function calls, without lexical hacks. (There is no code in my prototype that checks for a space after 'print' -- it just checks that there's a name, number or string following a name, which is never legal syntax.)

But Van Rossum said that he would "happily withdraw" the idea since it did not seem to be gaining any real traction. It seems likely that the idea came out of the blue for many—there were multiple good reasons to switch to a print() function as part of the Python 3 transition, after all. But the proposal does serve another purpose: it allows the CPython core developers to see the scope of the types of changes the PEG parser is bringing to the table. That may well open up some interesting features moving forward.

Comments (34 posted)

Four years of Zephyr

June 29, 2020

This article was contributed by Martí Bolívar and Carles Cufí

The Zephyr project is an effort to provide an open-source realtime operating system (RTOS) that is designed to bridge the gap between full-featured operating systems like Linux and bare-metal development environments. It's been over four years since Zephyr was publicly announced and discussed here (apparently to a bit of puzzlement). In this article, we give an update on the project and its community as of its v2.3.0 release in June 2020; we also make some guesses about its near future.

The authors are both Zephyr developers working for Nordic Semiconductor; Cufí was the release manager for the v2.3.0 release.

A Zephyr primer

While Zephyr can scale up to much larger systems, a typical target for the RTOS is a microcontroller without a memory-management unit (MMU) that has a sub-100MHz CPU, 512KB or less of on-chip NOR flash memory, and 32 to 256KB of built-in static RAM. Like the Linux kernel, Zephyr is configurable using the Kconfig language, and uses devicetree to describe hardware.

Unlike other RTOS choices, Zephyr is much more than a kernel. It's an RTOS with "batteries included". The project aims to provide all the software needed to develop, release, and maintain a firmware application. This includes a toolchain with compilers as well as flash and debug tools. The zephyr repository includes the kernel, protocol stacks, drivers, filesystems, and more. Upstream Zephyr also includes code from dozens of other third-party projects pulled in by a tool called "west". These include cryptographic libraries, hardware-abstraction layers (HALs), protocol stacks, and the MCUboot bootloader.

Zephyr is a library operating system. Its build system combines the application, kernel, and any additional code into a single, statically linked executable, all in a single address space (most microcontrollers do not have MMUs anyway). Each image targets a specific board, and is typically executed in-place from flash. Because of this, almost all configuration is done at compile time. Buffers are statically pre-allocated. The entire devicetree is accessible at build time, so a fixed set of devices and drivers are compiled in. This minimizes image size and maximizes information stored in flash, thus saving precious RAM.

Some features are tailored toward the fact that Zephyr can't self-host, so Zephyr developers build the whole "distribution" alongside the application. For starters, it has a cross-platform build and configuration system based on CMake and Python. This runs natively on Linux, macOS, and Windows, thanks in part to the fact that both Kconfig and devicetree are handled in Python 3 instead of the original Unix-only tools. This is critical since Windows is the most widely used OS for microcontroller firmware development.

More details

The Zephyr kernel supports multiple architectures and scheduling algorithms. There are cooperative and preemptive threads, along with facilities for reducing interrupt latencies and guaranteeing the execution of key threads. An optional user mode can use the Memory Protection Units (MPUs) typically present in microcontrollers to isolate and sandbox threads or groups of threads from one another and the kernel.

Zephyr supports six major architectures (x86, Arm, ARC, NIOS II, Xtensa, and RISC-V) and also runs in emulation. Both 32- and 64-bit processor support exists for some architectures. Within the Arm architecture, the emphasis has been on the usual 32-bit Cortex-M cores, but experimental support for Cortex-R and Cortex-A (including 64-bit Cortex-A) exists and continues to improve. Beyond "real hardware," Zephyr runs on QEMU, and as an ELF executable. It supports a simulated radio, which can save time and expense when testing and debugging radio frequency (RF) issues. In all, there are upstream support files for over 200 "boards".

Zephyr has logging and shell subsystems. These have configurable transports, including traditional serial ports (for both) and over the network (for logging). Logging is optionally asynchronous; in this case, a separate thread actually sends log messages. The logging calls themselves post compact messages to a queue, which can be done quickly, so logging can be done even from within interrupt context.

Hardware-specific APIs are built around a lightweight device driver model that is tightly integrated with the kernel. It supports a wide range of peripherals and sensors under this common model. Multiple storage options are available. These range from basic key-value storage optimized for NOR flash to filesystems.

Zephyr's batteries also include various communication stacks. Its networking stack has BSD-like socket APIs and supports various protocols. Zephyr has a fully open-source Bluetooth Low Energy (BLE) protocol stack that runs on multiple hardware devices. It also has an 802.15.4 stack and supports the Thread protocol. Controller Area Network (CAN) and USB device communication are supported out of the box. Zephyr additionally supports firmware upgrades over many of these transports.

But why, though?

One common question about Zephyr is "why?" Why was it started, when there are so many RTOS choices already?

Microcontrollers have become more powerful and include more memory, cores, security features, and must support more complex protocols and communication stacks. Firmware complexity has increased markedly over the years. As a result, homegrown frameworks supplied by a single silicon vendor, whether built directly against bare metal or a minimalistic kernel, are increasingly difficult to develop and maintain. Zephyr aims to be a scalable solution to this problem, developed cooperatively in the open, that integrates all its functionality around a common set of APIs and primitives.

Zephyr is being used by various parts of its community at this point. As readers might guess, Zephyr's members are likely to be using it. Some silicon vendors also have Zephyr support. For example, supported hardware is available from members NXP and Nordic Semiconductor. STMicro has also contributed code for some SoCs and sensors, and has been responsive to users. Various other architectures and boards are supported. Finally, there is an active open-source community, which we'll get into a bit more in the next section.

Who's behind all of this?

Zephyr is a Linux Foundation project with a paid membership and governance structure. It is, however, developed openly. Anyone may use and contribute to Zephyr without becoming a member — many do.

The project's members include silicon vendors, device makers, engineering organizations, and others. Funding is used, among other things, to pay for continuous integration, web hosting, marketing, and other infrastructure. Members receive voting seats on the project's Technical Steering Committee (TSC), and its Governing Board is comprised of members only.

Given that, it might be tempting to look at the project as simply a pay-to-play industry organization, but that wouldn't be a fair assessment. At least where the code and other technical contributions are concerned, Zephyr is a true open-source project. Patches and reviews are open to all. The core Zephyr repository is Apache 2.0 licensed, and contributors retain copyright over their contributions. In the v2.3.0 release, about one in four commits came from emails that weren't from member-owned domains.

The project's meetings, including the TSC's, are largely open for anyone attend and participate in — alas using a proprietary videochat platform — and have open minutes that are sent to the mailing lists.

Zephyr development takes place on GitHub. Patches are sent as pull requests and non-security bugs are tracked as GitHub issues. Subsystem maintainers have overall responsibility for their areas. Maintainers must be approved by the TSC, but maintainership is open to all, not just project members, and there are maintainers who are not from member organizations.

The project has open user and developer mailing lists with searchable archives; chat is currently done via Slack. See this help page for more details.

Wrapping up

Beyond expanding and improving what it currently supports, Zephyr will not stay still when it comes to adding features. In particular, the next release, which is currently scheduled for September, is targeted to include the following: Bluetooth advertising extensions, a thorough rework of its device model, improvements for MMU-based systems such as demand paging, initial support for a new TCP stack, and improved toolchain support for both proprietary toolchains and LLVM.

As we've seen, Zephyr's niche lies somewhere between bare-metal and full-featured operating systems. While relatively young in its current form, Zephyr is moving quicky and has accomplished much. The project has broad ambitions, and aims to provide a soup-to-nuts menu for firmware development. With that, we hope to have resolved any remaining puzzlement — and perhaps to see some more LWN readers saying hello and giving Zephyr a try.

Comments (6 posted)

Emulating Windows system calls in Linux

By Jonathan Corbet
June 25, 2020

The idea of handling system calls differently depending on the origin of each call in the process's address space is not entirely new. OpenBSD, for example, disallows system calls entirely if they are not made from the system's C library as a security-enhancing mechanism. At the end of May, Gabriel Krisman Bertazi proposed a similar mechanism for Linux, but the objective was not security at all; instead, he is working to make Windows games run better under Wine. That involves detecting and emulating Windows system calls; this can be done through origin-based filtering, but that may not be the solution that is merged in the end.

To run with any speed at all, Wine must run Windows code directly on the CPU to the greatest extent possible. That must end, though, once the Windows program makes a system call; trapping into the Linux kernel with the intent of making a Windows system call is highly unlikely to lead to good results. Traditionally, Wine has handled this by supplying its own version of the user-space Windows API that implemented the required functionality using Linux system calls. As explained in the patch posting, though, Windows applications are increasingly executing system calls directly rather than going through the API; that makes Wine unable to intercept them.

The good news is that Linux provides the ability to intercept system calls in the form of seccomp(). The bad news is that this mechanism, as found in current kernels, is not suited to the task of intercepting only system calls made from Windows code running within a larger process. Intercepting every system call would slow things down considerably, an effect that tends to make gamers particularly cranky. Tracking which parts of a process's address space make Linux system calls and which make Windows calls within the (classic) BPF programs used by seccomp() would be awkward at best and, once again, would be slow. So it seems that a new mechanism is called for.

The patch set adds a new memory-protection bit for mmap() called PROT_NOSYSCALL which, by default, does not change the kernel's behavior. If, however, a given process has turned on the new SECCOMP_MODE_MMAP mode in seccomp(), any system calls made from memory regions marked with PROT_NOSYSCALL will be trapped; the handler code can then emulate the attempted system call.

The cover letter notes that one should not rely on this mechanism the way OpenBSD uses its origin verification:

It goes without saying that this is in no way a security mechanism despite being built on top of seccomp, since an evil application can always jump to a whitelisted memory region and run the syscall. This is not a concern for Wine games.

seccomp() is used for this non-security feature, the text continues, because the alternative would be to duplicate much of its functionality.

The patch series generated a fair amount of discussion from developers who were not entirely comfortable with this mechanism. Kees Cook, for example, asked whether it would instead be possible to rewrite the Windows binary code at load time, replacing system calls with calls to the emulation functions. The answer, it seems, is "no". Modifying a game's code is likely to set off checks made to defeat cheaters, who also would otherwise make code modifications of their own. Wine developer Paul Gofman added that, to make such changes, Wine "would need some way to find those syscalls in the highly obfuscated dynamically generated code, the whole purpose of which is to prevent disassembling, debugging and finding things like that in it".

Matthew Wilcox, instead, suggested that the personality() mechanism could be extended to support a Windows personality. This, essentially, would create a new system-call entry point that would emulate the Windows calls. Gofman replied that this approach had been considered, but that the cost of executing the personality() call on each transition between Linux and Windows code would be too high. A possible solution here is to implement a special personality that looks at a flag, stored in user-space memory, to determine how system calls should be handled. Gofman offered to create a Wine patch using such a mechanism if an implementation existed; Krisman said that he would give it a try.

Andy Lutomirski had a couple of other suggestions, the first of which was a prctl() operation that would redirect all system calls through a user-space trampoline. System calls from the trampoline itself would be executed normally. In Wine's case, that trampoline could emulate system calls from Windows code while passing Linux system calls through to the kernel. Krisman indicated interest in this approach, and may implement a version of this idea as well.

Lutomirski's other idea was to allow a process to establish an (extended) BPF filter program for all system calls; he later extended this idea to have it handle all "architectural privilege transitions" for the process. This approach offers a lot of flexibility and may be useful far beyond Wine, but it suffers from a significant flaw: in the absence of unprivileged BPF, it could only be invoked by a privileged process, which is a show-stopper for Wine. Unless something changes, unprivileged BPF is an idea that isn't going anywhere in Linux, so the filter program does not look like a solution that Wine could use.

The end result of this discussion is that the problem is reasonably well understood and there is a shared desire to solve it. What form that solution will take is far from clear, though; there are a few approaches that need to be experimented with. Expect to see more patches in the future as the developers work to find which idea works best.

Comments (20 posted)

Stirring things up for Fedora 33

By Jonathan Corbet
June 29, 2020

The next release of the Fedora distribution — Fedora 33 — is currently scheduled for the end of October. Fedora's nature as a fast-moving distribution ensures that each release will contain a number of attention-getting changes, but Fedora 33 is starting to look like it may be a bit more volatile than its immediate predecessors. Several relatively controversial changes are currently under discussion on the project's mailing lists; read on for a summary.

The end of mod_php

For many years, the mod_php Apache module was the preferred way to run PHP code in response to web requests. But that was many years ago; the recommended module now is php-fpm. Reasons for switching include support for threaded modules, the ability to work with nginx, and removal of the PHP interpreter from the web server's address space. Fedora has supported both modules for some time, but is currently planning to remove mod_php and support only php-fpm as of Fedora 33.

Unsurprisingly, a module with as much history as mod_php still has a few users, a couple of whom made themselves heard in the mailing-list discussion. For example, John M. Harris Jr. said:

The default doesn't matter, there's absolutely no reason to take away the sysadmin's choice here. There are at least 40 servers I personally am responsible for where I see no reason to move from mod_php to php-fpm, for example.

Neal Gompa responded:

As an (oft forgotten!) member of the PHP SIG, I'd like to point out that part of what we're supposed to do is ship a PHP stack that minimizes footguns and security nightmares. The PHP stack doesn't make this easy as it is, and it's been relatively well-known for a while now that running the interpreter in the same process as the webserver is a bad idea. [...]

It is absolutely a good idea to take away this choice from our builds when the advice from framework developers, ecosystem experts, and the upstream developers is to *not* use mod_php.

Unless some higher level of outcry manifests, it seems likely that there is nowhere near enough opposition to derail this plan. Fedora users who are running mod_php will want to look at moving over to php-fpm in the near future.

No more swap partition

Fedora normally installs itself with a modest swap partition to take pressure off of memory. This proposal, however, proposes to do away with the swap file and, instead, set up a virtual swap device using zram. The zram device works by compressing memory contents and storing them back in memory; if the contents of RAM compress well, swapping them to zram can free up considerable amounts of space. Additionally, zram is often faster than swapping to a storage device, even when considering that the CPU must do the compression and decompression.

This proposal created a fairly long discussion thread, most of which made little useful progress toward a considered decision of this proposal. Participants seemed generally in favor, with a few being not fully convinced that the change made sense. One objection that was raised is that a system with no swap file cannot be hibernated; according to Chris Murphy, who is the developer behind this proposal, that is not a big concern:

Most laptops today have UEFI Secure Boot enabled by default and therefore hibernation isn't possible. And even when the laptop doesn't have Secure Boot enabled, there's a forest of bugs. It works for some people and not others

One other question that came up is what should happen to an existing swap file when an older Fedora system is upgraded. The obvious solution is to just leave that arrangement in place, but Murphy also made the case for removing the swap file and switching the system to zram; to do otherwise, he said, would fragment the user base. How that detail will be worked out remains to be seen, but swap-on-zram as a whole looks set to go forward.

Compiler policies

Longstanding Fedora policy says that the entire distribution is to be built with the GCC compiler, with exceptions only for packages that cannot be built with GCC. The plan for Fedora 33 is to get rid of that requirement. Instead, the compiler used for any given package would be the one preferred by upstream (assuming that the upstream for that package has expressed a preference, of course). That would open the way for building a number of LLVM-preferring packages without maintainers having to struggle to get a working build with GCC.

Compilers tend to evoke emotional reactions, and that was the case here. Kevin Kofler opposed the change, saying: "We have a system compiler for a reason". Some, such as Jakub Jelinek, pointed to ABI incompatibilities and worried about introducing subtle bugs into the distribution, but the prevailing view seems to be that such incompatibilities are rare and should be treated as bugs when they are found. Gompa stated that moving OpenMandriva to LLVM hurt both performance and security, but did not offer specifics.

Jeff Law, who is driving this change, said that using the upstream-preferred compiler would make life easier for a lot of Fedora package maintainers. He also asserted that "packages in Fedora should be as close to upstream as possible", and that the choice of toolchain is a part of that. This change, too, seems likely to make it through to the Fedora 33 release.

Default editors

By far the biggest thread (so far) was, entirely predictably, provoked by this proposal to set the default editor on Fedora systems to nano. Not everybody understands the implications of changing which toolchain is used to build a given package, but everybody knows what their favorite editor is and is usually willing to tell others about it. Fedora currently does not set a default editor, which means that users typically end up in vi (or vim), which is not the friendliest choice for people who are not familiar with that editor's quirks. Picking nano would give new users a fighting chance at figuring out how to exit the editor, at least, and maybe even (intentionally) making a change or two first.

The thread was long, but a summary need not be. Opinions varied from strong opposition like this message from Jan Kratochvil:

This is another step trying to make Fedora end-user friendly while the only effect is making it hostile to developers. As Fedora will never be used by end-users as it conflicts with Fedora's foundation Freedom.

to this comment from Adam Williamson: "My only regret is that I have but one +1 to give to this proposal!". In the end, most participants seem to recognize that vi is not the friendliest experience for new users, and that experienced users can easily set the EDITOR environment variable to get the editor they want.

Switching to Btrfs

Then, there is this proposal to make Btrfs the default filesystem for new Fedora installations. This is not a new idea; indeed, there was once an approved plan to switch to Btrfs for Fedora 16 in 2011. There are obvious advantages to switching to a modern filesystem like Btrfs, including its storage-management and snapshotting capabilities. Btrfs turned out to be too unstable in 2011, though, and that plan was dropped.

Concerns about stability were quick to come up this time as well; Vitaly Zaitsev was quick to assert that:

I'm strongly against this proposal. BTRFS is the most unstable file system I ever seen. It can break up even under an ideal conditions and lead to a complete data loss.

On the other hand, Btrfs developer (and member of the group pushing this proposal) Josef Bacik pointed out that Btrfs is deployed on vast numbers of machines at Facebook, a choice that has "worked out very well" even though Facebook deliberately puts it on low-quality storage hardware.

It is too soon to predict whether Btrfs will prove more successful at being adopted by Fedora now than it did nine years ago. But the value to the distribution of having a filesystem like Btrfs available is clear, and there is a long list of capable developers pushing this proposal. Fedora users may yet get Btrfs by default before 2030.

One that didn't make it

As long as they don't actively cause problems, retired packages on Fedora systems tend to go into a sort of limbo state. They remain on the systems where they have been installed even as those systems are upgraded to new releases where those packages are no longer present. Over time, they may develop problems or security issues, but they sit there like that server everybody has forgotten about until something happens to draw somebody's attention.

This proposal envisioned a scheme where retired packages would be explicitly obsoleted by a new metapackage called fedora-retired-packages; that would cause them to be automatically removed one release after they were retired. Some developers welcomed the idea of cleaning up unmaintained packages, while others strongly opposed removing packages that probably still work from systems where they may be needed. A separate subthread got into disagreements about the mechanism used, suggesting that removal should be handled explicitly in the DNF package manager rather than through a magic metapackage.

In the end, Miroslav Suchý, who was pushing this proposal, decided to withdraw it. He will work on a new proposal for a tool that can allow users to remove unmaintained packages if they see fit.

The Fedora 33 schedule currently places a deadline of June 30 for system-wide change proposals and July 21 for proposals for self-contained changes. The above proposals were all of the system-wide variety with the exception of dropping mod_php. There is, thus, time for more changes to be proposed for the next Fedora release. Community members might be forgiven, though, if they thought that there is already enough on the list for this time around.

Comments (130 posted)

First PHP 8 alpha released

By John Coggeshall
June 30, 2020

The PHP project has released the first alpha of PHP 8, which is slated for general availability in November 2020. This initial test release includes many new features such as just-in-time (JIT) compilation, new constructs like Attributes, and more. One of twelve planned releases before the general availability release, it represents a feature set that is still subject to change.

The PHP 8 release is being managed by contributors Sara Golemon and Gabriel Caruso. Dubbed "Alpha 1", this first release of PHP 8 is one of three releases to be done prior to a feature freeze. During this time, more widespread testing of new features is performed by the community and implementation details are worked out. This process will continue until August 4, at which point the feature set will be frozen to coincide with the first beta release scheduled for August 6.

Some interesting features

The release announcement omitted any specifics on new features or other changes, which would typically accompany a release. For now, the proposals that have been approved and implemented in the Request for Comments section of the PHP wiki is the best source of what is in the release now, and what might still be on the way.

Major-version PHP releases always have at least one significant improvement, and in this case, that is JIT for PHP 8. JIT will enable the engine to compile PHP code — a single PHP function or an entire application — to machine code for better performance. The RFC on the implementation provides a significant amount of detail regarding the JIT implementation for interested readers.

Beyond major improvements like JIT, there are also other new language-level features including Attributes. For those who are unaware, Attributes offer structured syntactic metadata for declarations in PHP code such as classes, functions, methods, and properties. Similar features already exist in other popular languages, such as annotations in Java and decorators in Python. Attributes replace the widely used phpDocumentor syntax for documentation block comments that is often deployed in PHP applications to serve a similar need. Currently in PHP 7, these comments are parsed at run time using PHP's reflection API to extract metadata.

One example of the existing use of this approach is the popular unit-testing framework PHPUnit, which uses documentation block annotations to implement things like order of operations in testing methods. With PHP 8, these same annotations can be defined and formalized into the language itself, eliminating the need for the expensive run-time comment parsing currently required. Attributes also may play a role in defining (or excluding) targets of PHP 8's new JIT compiler. Note that the Attributes feature is still in flux, with significant changes to the behavior still being decided and implemented before the August feature-freeze deadline.

PHP 8 will also support some desirable new language features for typing, continuing the trend of building robust data-type handling into the traditionally dynamically-typed language. Looking at current PHP 7 releases, there is support for typing in function/method declarations. However, this support is limited to two options: either a single data type is specified as part of the declaration, or no data type is specified at all. If no data type is specified, it is up to the developer to implement their own type-checking logic on an otherwise typeless value. This is less than ideal, as sometimes a method could reasonably be written that is given an integer or floating point number, but not a string. Like annotations, currently PHP projects handle this dilemma with documentation block comments that are intended to specify variable type details — but those details are not enforceable by PHP itself. To address this shortcoming, PHP 8 now supports type unions in declarations as part of its syntax, allowing developers to specify multiple types for functions, methods, and properties:

    class Number {
        private int|float $number;

        public function setNumber(int|float $number) : void
        {
            $this->number = $number;
        }

        public function getNumber() : int|float
        {
            return $this->number;
        }
    }

In the preceding example, the | operator is used to define multiple potential data types PHP will accept. In this case, an int or a float in the various contexts of the example. These checks are largely performed at runtime, although compile-time checks are also used to catch some cases.

Other changes expected in PHP 8 and implemented in Alpha 1 are the unbundling of legacy extensions like xmlrpc, previously rejected features like catching exceptions without requiring a variable for them, and the new Stringable interface to better handle __toString() implementations consistently in an application. In PHP, __toString() is a "magic method" of an object that, when provided, is used to return a string representation of an object. The Stringable interface provides a type that either accepts a primitive string data-type, or an object implementing the __toString() method for use in type-hinting.

More to come

This is only the first public release of the upcoming PHP 8 code base. As PHP 8 releases head toward general availability, future articles will follow the progress. The next release, Alpha 2, is scheduled for July 9. There are still many different discussions happening regarding features that may make it into the PHP 8.0 release, depending on whether they can be finalized in time for the feature-freeze deadline. In the meantime, early adopters can begin testing their code bases and reporting any bugs they might find.

Comments (6 posted)

Managing tasks with todo.txt and Taskwarrior

June 26, 2020

This article was contributed by Martin Michlmayr

One quote from Douglas Adams has always stayed with me: "I love deadlines. I like the whooshing sound they make as they fly by". We all lead busy lives and few ever see the bottom of our long to-do lists. One of the oldest items on my list, ironically, is to find a better system to manage all my tasks. Can task-management systems make us more productive while, at the same time, reducing the stress caused by the sheer number of outstanding tasks? This article looks at todo.txt and Taskwarrior.

The management of tasks is rather personal and people have completely different approaches and philosophies. This is, of course, reflected in the requirements for, and expectations from, a task manager. Requirements can also change as our interaction with computers changes. For example, while I put a lot of emphasis on managing tasks via the command line in the past, these days I'm more interested in a good mobile app (to add tasks on the go and to receive reminders) and web support (to get an overview of all tasks).

A good way to filter tasks is also essential for me. One of the reasons for using task-management software is so you can stop worrying about tasks until they become relevant. This requires a way to find relevant tasks when needed, such as when the due date is coming up soon or because you're in a relevant setting or place (often called a "context" in task-management systems). Going to the supermarket would be a good time to bring up a shopping list, for example. Task-management systems offer a number of ways to organize information that can be used in filters, such as tags, contexts (often stored as tags in the form of @tag, such as @home), and lists.

In a series of two articles, we'll review four systems for managing tasks and to-do items around which open-source ecosystems have formed.

Simple task management with todo.txt

Todo.txt is a simple plain-text format to specify tasks. Each line describes one task, and tasks can have a priority (e.g. (A)), a project (+LWN), and a context (@home). The specification also defines the tag:value syntax but only mentions due (due dates) specifically. A number of custom tags are in common use, such as t for threshold dates (i.e. start dates) and rec for recurring tasks. Tasks are marked as complete by adding a lowercase x at the beginning of the line. An example might look something like this:

    (A) Proofread article +LWN due:2020-06-25
    Revisit task managers @home t:2025-01-01
    x Provide todo.txt examples +LWN

The todo.txt web site lists a lot of tools built around the file format. Unfortunately, the first impression isn't particularly great since a lot of the tools are out-of-date or unmaintained. Todo.txt Touch, the project's official app for iOS, which is placed prominently on the web site, had its last commit in 2014 and was removed from Apple's App Store in 2017 because of incompatibilities with Dropbox. The Android app was removed from Google Play for the same reason.

While it would be nice if the web site offered a more curated list of actively developed software, clicking on all the links eventually revealed that there is an active ecosystem around todo.txt. There is support for a wide range of editors, including a Vim plugin that supports syntax highlighting and presents overdue tasks as errors. Additionally, todoTxtWebUi lets you add tasks in your browser; it also supports basic filters, but there's no way to define and store more complex filters.

Simpletask is an actively developed Android app. Adding new tasks is simple and the app makes it possible to create complex filters. There is support for Dropbox and Nextcloud. Using cloud services appears to be the recommended way to sync tasks in the todo.txt ecosystem; the problem of conflicts, which can happen when tasks are edited on multiple devices, is not addressed, however.

Markor (seen at right) is another interesting app for Android in this context. It is not a task manager; instead it is an editor with support for a number of formats including Markdown, YAML, and todo.txt. Adding tasks is a pleasure due to Markor's syntax highlighting, which can be seen in the screen shot. Markor doesn't allow users to group, sort, or search tasks, but improvements are under discussion.

Overall todo.txt is a simple system that aims to get out of your way. The system reflects the philosophy of founder Gina Trapani, who remarked: "To me, todo.txt is a task list, not a reminder tool, or a calendar". While I personally want a task manager that reminds me of upcoming tasks so I can stop thinking about them until I need to, a simple approach has its advantages and will appeal to some.

Fighting tasks with Taskwarrior

Taskwarrior is another task manager around which a healthy community has formed. In contrast to todo.txt, Taskwarrior supports a rich set of features and attributes, including various dates (such as start, end and due dates), dependencies, projects, and tags. User-defined attributes can also be added. Taskwarrior sets virtual tags automatically depending on the situation, such as TODAY, or, maybe more commonly seen, OVERDUE. The project even supports a Document Object Model (DOM) through which data can be accessed.

While tasks are stored in human-readable text files, interaction is through the command-line tool task. It makes adding, editing, and querying tasks easy. Taskwarrior supports filters, automatically calculates priorities, and integrates a calendar view and statistics. It does not dictate the user's workflow or the task-management methodology to be followed, but there is a helpful write-up about implementing the popular Getting Things Done (GTD) system with Taskwarrior.

Many tools build on Taskwarrior. For example, Tasksh is an interactive shell which makes listing and editing tasks easy. It's particular useful for the periodic review of tasks. VIT, the Visual Interactive Taskwarrior, is a curses-based frontend, which will feel familiar to those who work with Vim and Mutt. With these tools, the Taskwarrior ecosystem offers a range of complementary text-based tools.

For those who prefer managing their tasks in a web browser, TaskwarriorWeb is one option. It has a simple but modern design. Unfortunately, it doesn't expose all of Taskwarrior's functionality (such as dependencies) and has limited capabilities to group and filter tasks. Furthermore, the status of the project isn't clear. While a move to the official GitHub organization for Taskwarrior was agreed to in 2018, the project still hasn't moved; many pull requests remain open, including one to implement some important functionality: filtering by tags.

There are two options for Android. TaskwarriorC2 is a cross-platform GUI client for Taskwarrior available on Google Play. Despite using the Taskwarrior logo, the app does not come from the Taskwarrior project; in addition, TaskwarriorC2 does not have a license in its repository, though the source is available. While the app offers many filters and reports, I didn't find the interface to be intuitive. Foreground is an Android app that is visually more appealing and easier to use. It shows much promise but is quite limited at the moment. For example, you cannot filter by project and there are no notifications, which is a feature some users expect from a task manager on a mobile device.

Of course, the question of syncing data will come up when someone wants to use Taskwarrior on multiple devices. Unlike todo.txt, Taskwarrior offers a solution in the form of Taskserver. For those who don't want to run their own server, there are several hosted alternatives. FreeCinc is an open-source, shared Taskserver where users can store tasks at no charge. Inthe.AM is another open-source online system available at no charge, but it goes beyond merely syncing tasks. It offers several features that extend Taskwarrior, such as RSS and iCalendar feeds, integration with Trello (a proprietary project-management tool), and adding tasks via email or SMS text message. Inthe.AM also offers a web interface to manage tasks with a modern look (seen below), although not all functionality from Taskwarrior is exposed.

Taskwarrior has a healthy ecosystem; there are many other interesting tools that cannot be covered in detail. Bugwarrior enables the import of issues from a number of bug-tracking systems, taskopen is a script for taking notes and opening URLs with Taskwarrior, and kanbanwarrior is a simple script that facilitates a Kanban workflow. There are also extensions for GNOME Shell (Taskwarrior Integration and Taskwhisperer).

Summary

Todo.txt and Taskwarrior show different approaches to task management. While todo.txt follows a simple approach to capturing and dealing with tasks, Taskwarrior offers a feature-rich system that enables different workflows for task management. Both systems are widely used and offer a range of tools. Taskwarrior, in particular, has great text-based tools. For both systems, solutions for the web and mobile devices are more limited at this point. Next up, we'll review tools that use the Org mode file format and iCalendar standard. Stay tuned ...

Comments (6 posted)

Generics for Go

July 1, 2020

This article was contributed by Ben Hoyt

The Go programming language was first released in 2009, with its 1.0 release made in March 2012. Even before the 1.0 release, some developers criticized the language as being too simplistic, partly due to its lack of user-defined generic types and functions parameterized by type. Despite this omission, Go is widely used, with an estimated 1-2 million developers worldwide. Over the years there have been several proposals to add some form of generics to the language, but the recent proposal written by core developers Ian Lance Taylor and Robert Griesemer looks likely to be included in a future version of Go.

Background

Go is a statically typed language, so types are specified in the source code (or inferred from it) and checked by the compiler. The compiler produces optimized machine code, so CPU-intensive code is significantly more efficient than languages like Python or Ruby, which have bytecode compilers and use virtual machines for execution.

Generics, also known as "parameterized types" or "parametric polymorphism", are a way to write code or build data structures that will work for any data type; the code or data structure can be instantiated to process each different data type, without having to duplicate code. They're useful when writing generalized algorithms like sorting and searching, as well as type-independent data structures like trees, thread-safe maps, and so on. For example, a developer might write a generic min() function that works on all integer and floating-point types, or create a binary tree that can associate a key type to a value type (and work with strings, integers, or user-defined types). With generics, you can write this kind of code without any duplication, and the compiler will still statically check the types.

Like the first versions of Java, Go doesn't ship with user-defined generics. As the Go FAQ notes, generics "may well be added at some point"; it also describes how leaving them out was an intentional trade-off:

Generics are convenient but they come at a cost in complexity in the type system and run-time. We haven't yet found a design that gives value proportionate to the complexity, although we continue to think about it. Meanwhile, Go's built-in maps and slices, plus the ability to use the empty interface to construct containers (with explicit unboxing) mean in many cases it is possible to write code that does what generics would enable, if less smoothly.

Part of the reason actual users of the language don't complain loudly about the lack of generics is that Go does include them for the built-in container types, specifically slices (Go's growable array type), maps (hash tables), and channels (thread-safe communication queues). For example, a developer writing blog software might write a function to fetch a list of articles or a mapping of author ID to author information:

    // takes ID, returns "slice of Article" (compiler checks types)
    func GetLatestArticles(num int) []Article {
        ...
    }

    // takes "slice of int" of IDs, returns "map of int IDs to Author"
    func GetAuthors(authorIDs []int) map[int]Author {
        ...
    }

Built-in functions like len() and append() work on these container types, though there's no way for a developer to define their own equivalents of those generic built-in functions. As many Go developers will attest, having built-in versions of growable arrays and maps that are parameterized by type goes a long way, even without user-defined generic types.

In addition, Go has support for two features that are often used instead of generics or to work around their lack: interfaces and closures. For example, sorting in Go is done using the sort.Interface type, which is an interface requiring three methods:

    type Interface interface {
        Len() int           // length of this collection
        Less(i, j int) bool // true if i'th element < j'th element
        Swap(i, j int)      // swap i'th and j'th elements
    }

If a user-defined collection implements this interface, it is sortable using the standard library's sort.Sort() function. Since sort.Slice() was added in Go 1.8, developers can use that function and pass in a "less-than closure" rather than implementing the full sorting interface; for example:

    // declare a struct for names and ages and a slice of those structs with four entries
    people := []struct {
        Name string
        Age  int
    }{
        {"Gopher", 7},
        {"Alice", 55},
        {"Vera", 24},
        {"Bob", 75},
    }

    // sort people using the "less-than closure" specified in the call
    sort.Slice(
        people,
        func(i, j int) bool { // i and j are the two slice indices
            return people[i].Name < people[j].Name
        },
    )

There are other ways to work around Go's lack of generics, such as creating container types that use interface{} (the "empty interface"). This effectively boxes every value inserted into the collection, and requires run-time type assertions, so it is neither particularly efficient nor type-safe. However, it works and even some standard library types like sync.Map use this approach.

Some developers go so far as to argue that generics shouldn't be added to Go at all, since they will bring too much complexity. For example, Greg Hall hopes "that Go never has generics, or if it does, the designers find some way to avoid the complexity and difficulties I have seen in both Java generics and C++ templates".

The Go team takes the complexity issue seriously. As core developer Russ Cox states in his 2009 article "The Generic Dilemma":

It seems like there are three basic approaches to generics:

(The C approach.) Leave them out. This slows programmers. But it adds no complexity to the language.
(The C++ approach.) Compile-time specialization or macro expansion. This slows compilation. It generates a lot of code, much of it redundant, and needs a good linker to eliminate duplicate copies. [...]
(The Java approach.) Box everything implicitly. This slows execution. [...]

The generic dilemma is this: do you want slow programmers, slow compilers and bloated binaries, or slow execution times?

Still, many Go developers are asking for generics, and there has been a huge amount of discussion over the years on the best way to add them in a Go-like way. Several developers have provided thoughtful rationale in "experience reports" from their own usage of Go. Taylor's entry in the official Go blog, "Why Generics?", details what adding generics will bring to Go, and lists the guidelines the Go team is following when adding them:

Most importantly, Go today is a simple language. Go programs are usually clear and easy to understand. A major part of our long process of exploring this space has been trying to understand how to add generics while preserving that clarity and simplicity. We need to find mechanisms that fit well into the existing language, without turning it into something quite different.

These guidelines should apply to any generics implementation in Go. That's the most important message I want to leave you with today: generics can bring a significant benefit to the language, but they are only worth doing if Go still feels like Go.

The recent proposal

Taylor, in particular, has been prolific on the subject of adding generics to Go, having written no fewer than six proposals. The first four, written from 2010 through 2013, are listed at the bottom of his document, "Go should have generics". About them, he notes: "all are flawed in various ways". In July 2019 he posted the "Why Generics?" blog article mentioned above, which links to the lengthy 2019 proposal written by Taylor and Griesemer for a version of generics based on "contracts". Almost a year later, in June 2020, Taylor and Griesemer published the current proposal, which avoids adding contracts. In Taylor's words:

An earlier draft design of generics implemented constraints using a new language construct called contracts. Type lists appeared only in contracts, rather than on interface types. However, many people had a hard time understanding the difference between contracts and interface types. It also turned out that contracts could be represented as a set of corresponding interfaces; thus there was no loss in expressive power without contracts. We decided to simplify the approach to use only interface types.

The removal of contracts comes in part based on work by Philip Wadler and his collaborators in their May 2020 paper, "Featherweight Go [PDF]" (video presentation). Wadler is a type theorist who has contributed to the design of Haskell, and was involved in adding generics to Java back in 2004. Rob Pike, one of Go's creators, had asked Wadler if he would "be interested in helping us get polymorphism right (and/or figuring out what 'right' means) for some future version of Go"; this paper is the response to Pike's request.

The 2020 proposal suggests adding optional type parameters to functions and types, allowing generic algorithms and generic container types, respectively. Here is an example of what a generic function looks like under this proposal:

    // Stringify calls the String method on each element of s,
    // and returns the results.
    func Stringify(type T Stringer)(s []T) []string {
        var ret []string
        for _, v := range s {
            ret = append(ret, v.String())
        }
        return ret
    }

    // Stringer is a type constraint that requires the type argument to have
    // a String method and permits the generic function to call String.
    // The String method should return a string representation of the value.
    type Stringer interface {
        String() string
    }

The type parameter is T (an arbitrary name), specified in the extra set of parentheses after the function name, along with the Stringer constraint: type T Stringer. The actual arguments to the function are in the second set of parentheses, s []T. Writing functions like this is not currently possible in Go; it does not allow passing a slice of a concrete type to a function that accepts a slice of an interface type (e.g., Stringer).

In addition to generic functions, the new proposal also supports parameterization of types, to support type-safe collections such as binary trees, graph data structures, and so on. Here is what a generic Vector type might look like:

    // Vector is a name for a slice of any element type.
    type Vector(type T) []T

    // Push adds a value to the end of a vector.
    func (v *Vector(T)) Push(x T) {
        *v = append(*v, x)
    }

    // v is a Vector of Authors
    var v Vector(Author)
    v.Push(Author{Name: "Ben Hoyt"})

Because Go doesn't support operator overloading or define operators in terms of methods, there's no way to use interface constraints to specify that a type must support the < operator (as an example). In the proposal, this is done using a new feature called "type lists", an example of which is shown below:

    // Ordered is a type constraint that matches any ordered type.
    // An ordered type is one that supports the <, <=, >, and >= operators.
    type Ordered interface {
        type int, int8, int16, int32, int64,
            uint, uint8, uint16, uint32, uint64, uintptr,
            float32, float64,
            string
    }

In practice, a constraints package would probably be added to the standard library which pre-defined common constraints like Ordered. Type lists allow developers to write generic functions that use built-in operators:

    // Smallest returns the smallest element in a slice of "Ordered" values.
    func Smallest(type T Ordered)(s []T) T {
        r := s[0]
        for _, v := range s[1:] {
            if v < r { // works due to the "Ordered" constraint
                r = v
            }
        }
        return r
    }

The one constraint that can't be written as a type list is a constraint for the == and != operators, because Go allows comparing structs, arrays, and interface types for equality. To solve this, the proposal suggests adding a built-in comparable constraint to allow equality operators. This would be useful, for example, in a function that finds the index of a value in a slice or array:

    // Index returns the index of x in s, or -1 if not found.
    func Index(type T comparable)(s []T, x T) int {
        for i, v := range s {
            // v and x are type T, which has the comparable
            // constraint, so we can use == here.
            if v == x {
                return i
            }
        }
        return -1
    }

Taylor and Griesemer have developed a tool for experimentation (on the go2go branch) that converts the Go code as specified in this proposal to normal Go code, allowing developers to compile and run generic code today. There's even a version of the Go playground that lets people share and run code written under this proposal online — for example, here is a working example of the Stringify() function above.

The Go team is asking developers to try to solve their own problems with the generics experimentation tool and send detailed feedback in response to the following questions:

First, does generic code make sense? Does it feel like Go? What surprises do people encounter? Are the error messages useful?

Second, we know that many people have said that Go needs generics, but we don't necessarily know exactly what that means. Does this draft design address the problem in a useful way? If there is a problem that makes you think "I could solve this if Go had generics," can you solve the problem when using this tool?

Discussion

There has been a lot of public discussion about generics on the main golang-nuts mailing list since the latest proposal was published, as well as on Hacker News and reddit.com/r/golang threads.

As Pike said [YouTube] last year, "syntax is not the problem, at least not yet", however, many of the threads on the mailing list have been immediately critical of the syntax. Admittedly, the syntax is unusual, and it adds another set of (round) parentheses to Go, which is already known for having lots of parentheses (for example, Go's method definitions use one set for the method's receiver type, and another for the method's arguments). The proposal tries to preempt the syntax bikeshedding with an explanation of why they chose parentheses instead of angle brackets:

When parsing code within a function, such as v := F<T>, at the point of seeing the < it's ambiguous whether we are seeing a type instantiation or an expression using the < operator. Resolving that requires effectively unbounded lookahead. In general we strive to keep the Go parser efficient.

Most responders on the mailing list are proposing the use of angle brackets like C++, Java, and C#, for example, using List<T> instead of List(T). Taylor is much more interested in whether the semantics of the new proposal make sense, but has been patiently replying to each of these syntax threads with something like the following:

Let's see what real code looks like with the suggested syntax, before we worry about alternatives. Thanks.

This has happened so many times that one mailing list contributor, Tyler Compton, compiled a helpful list of all the syntax-related threads.

Generics will help eliminate types and functions repeated for multiple types, for example sort.Ints, sort.Float64s, and sort.Strings in the sort package. In a comment on Hacker News, Kyle Conroy showed "a four-line replacement for the various sql.Null* types in the standard library":

    type Null(type T) struct {
        Val   T
        Valid bool // Valid is true if Val is not NULL
    }

Mailing list contributor Pee Jai wondered whether there's a way to constrain a type to only allow structs, but Taylor indicated that's not possible; he noted that "generics don't solve all problems". Robert Engels said that the reflect package would still be needed for this case anyway.

In one thread, "i3dmaster" asked some questions about custom map types, and Taylor clarified that "custom container types aren't going to support len() or range". Creators of collection types won't have access to this special syntax, but will need to define their own Len() method, and their own way to iterate through the collection.

Go core contributor Bryan Mills has posted insightful replies on a number of threads. He has also created his own repository with various notes and code examples from his experiments with generics, including an explanation about why he considers type lists less than ideal. The repository also includes various attempts at re-implementing the append() built-in using generics as proposed.

Timeline

In their recent blog entry, Taylor and Griesemer are clear that adding generics to the language won't be a quick process — they want to get it right, and take into account community feedback:

We will use the feedback we gather from the Go community to decide how to move forward. If the draft design is well received and doesn't need significant changes, the next step would be a formal language change proposal. To set expectations, if everybody is completely happy with the design draft and it does not require any further adjustments, the earliest that generics could be added to Go would be the Go 1.17 release, scheduled for August 2021. In reality, of course, there may be unforeseen problems, so this is an optimistic timeline; we can't make any definite prediction.

My own guess is that August 2021 (just over a year away) is optimistic for a feature of this size. It's going to take quite a while to solicit feedback, iterate on the design, and implement generics in a production-ready way instead of using the current Go-to-Go translator. But given the number of proposals and the amount of feedback so far, generics are sure to be a much-used (and hopefully little-abused) feature whenever they do arrive.

Comments (28 posted)

Page editor: Jonathan Corbet
Next page: Brief items>>