|
|
Subscribe / Log in / New account

Leading items

Welcome to the LWN.net Weekly Edition for February 4, 2021

This edition contains the following feature content:

This week's edition also includes these inner pages:

  • Brief items: Brief news items from throughout the community.
  • Announcements: Newsletters, conferences, security updates, patches, and more.

Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.

Comments (none posted)

Avoiding "supercookie" tracking

By Jake Edge
February 3, 2021

The release of Firefox 85 at the end of January brought a new technique for thwarting yet-another web-tracking scheme. The use of browser cookies for tracking is well-established and the browser makers have taken steps to block the worst abuses there, but users can also take steps to manage and clear those cookies. The arms race continues, however, as tracking companies are using browser caches to store what Mozilla calls "supercookies", which allow users to be tracked across the web sites that they visit. That has led the browser makers to partition these caches by web site in order to prevent this tracking technique.

In the interest of faster browsing, web browsers cache lots of resources so that they do not need to make another network round-trip to obtain them. That includes such items as images, style sheets, fonts, HTTP resources (including JavaScript code), DNS query results, TLS certificates, and more. In addition, browsers reuse long-lived connections when another site makes a relevant request; that too can be abused by tracking companies. These companies then sell that information to advertisers and others, which are able to build up a truly creepy amount of correlated information about a user's interests and activities.

So, as described in a Mozilla security blog post that accompanied the Firefox 85 release, the new browser will be partitioning these caches and connections based on the associated top-level domain. That means there will be no reuse when other sites request the same resources (or could use existing connections). The post notes that Chrome has rolled out a similar change.

There are timing side channels that can be used to see if a particular resource has already been accessed, but images and other resources can be directly seeded with a unique identifier, which gets cached as part of a resource when it is first retrieved. When those items are loaded from the cache at a new site, that ID can be extracted in order to track the user from site to site. The Mozilla post identifies a number of other ways that these supercookies—distinct from other uses of that term—have been stored over the years (e.g. Flash storage).

The general idea of the cache-timing side channel is that the faster performance, due to reuse of a local resource, is measurable in a way that gives local JavaScript programs information that can be used to gather information about the user's activities. All of these shenanigans are done in the hopes of being able to convince users to buy more of an advertiser's product, of course.

Beyond that, the Google post points out that these techniques can be used for more than just cross-site tracking. For example, a site can query to see if the user has visited a particular site (or list of sites) based on whether a known resource from those sites has been cached. An even worse privacy invasion can come via a cross-site search attack that will reveal if certain terms have been searched for at a search engine.

The obvious impact of a change like cache partitioning is in site-loading performance, but the results so far seem fairly promising:

This partitioning applies to all third-party resources embedded on a website, regardless of whether Firefox considers that resource to have loaded from a tracking domain. Our metrics show a very modest impact on page load time: between a 0.09% and 0.75% increase at the 80th percentile and below, and a maximum increase of 1.32% at the 85th percentile.

The Google post on cache partitioning reports similar results: "The overall cache miss rate increases by about 3.6%, changes to the FCP (First Contentful Paint) are modest (~0.3%), and the overall fraction of bytes loaded from the network increases by around 4%." More data is available in a white paper on the feature as well. All of that seems to point to a fairly mild impact for an important change to remove these side channels.

There is no real doubt that tracking companies will find other mechanisms; none of the previous prevention techniques has changed their behavior much. Over the years, the browser makers have worked on preventing user tracking with browser fingerprints, blocking known third-party tracking sites, privacy sandboxes, and more. A cynic—realist?—might guess that the tracking companies are hard at work on the next privacy-busting attack. There is just too much demand for that kind of information from advertisers, which makes it is a high-value target, with a relatively low cost to gather; changing one or both of those attributes seems needed to truly put a stop to the practice.

Comments (24 posted)

Tackling the monopoly problem

By Jonathan Corbet
January 29, 2021

LCA
There was a time when people who were exploring computational technology saw it as the path toward decentralization and freedom worldwide. What we have ended up with, instead, is a world that is increasingly centralized, subject to surveillance, and unfree. How did that come to be? In a keynote at the online 2021 linux.conf.au event, Cory Doctorow gave his view of this problem and named its source: monopoly.

Doctorow started by saying that many see the people who pushed technology in the last century as blind, naive optimists. In this view, technologists thought that if we just gave everybody a computer, everything would be fine; they failed to foresee how technology could become a dystopian force. He knows some of those people, mostly through his 20 years working with the Electronic Frontier Foundation (EFF), and he does not agree with this view. The truth of the matter is that nobody starts an organization like EFF because they think that everything is going to be great. Those founders were excited about how amazing things could be, but also terrified about how badly it could all go. They wanted to get technology into people's hands, but also to get the technology policy right.

When all this was just getting started, he said, technology was a fringe activity; the only reason to get involved was if you were truly passionate about it. Think about the first time you wrote some code; it was about crisply expressing your will. If this were done with sufficient precision, the computer would manifest your desire forever. Add a modem to the system and you could manifest that will around the world. Solve a problem you [Cory Doctorow] have, and everybody else can click a button to have your code perform the same trick for them.

Programs are often compared to recipes, but the truth is not quite like that; with a recipe, others still have to gather the ingredients and actually carry out the necessary steps. Software is a self-cooking recipe, which is much more powerful.

With software and the Internet, it also become possible to discover communities. There are, out there, people with rare traits that are important to you; these may be people who understand "black lives matter" or non-binary gender — or they may be Nazis. When you find people who know the words for the things you feel, it is empowering and makes you feel passionate. This ability, like the ability to write code, delivers self-determination, the ability to decide how life will go.

Fights about technology policy are all about self determination, he continued. He gave as examples Richard Stallman's lack of access to a printer driver (leading to the creation of the GNU project) and the classic Bill Gates rant against those who were copying his BASIC interpreter. Doctorow also talked about the USENET alt.* hierarchy as an example of how these stories can go. USENET was the first social part of the Internet; in its early days it was under the control of a group called the "backbone cabal". These were the administrators of systems that were able to get away with using large amounts of their employers' long-distance phone time for long-haul links. Worried that their bosses would discover a USENET full of "porn and bomb recipes", the cabal retained a veto on the creation of new groups (among other things).

The straw that broke the camel's back was a decision to block the creation of "rec.cooking", saying that it should be called "talk.cooking" instead. In response, the community created the "alt" hierarchy where anybody could create a new group; the first such group was "alt.gourmand". After a couple of years, the "alt" hierarchy was larger than the rest of USENET. Everybody, it seemed, wanted self determination.

A perfect market?

We could use more self determination now, he said. We have "total tech penetration", but also centralization, surveillance, and digital manipulation. "How did we lose so badly?" It comes down to a threat that was under-appreciated at the time. With such a dynamic environment, where new technologies could quickly supersede old ones and new companies were popping up left and right, monopolization seemed unlikely. But the technologists of the time didn't understand the role of anti-monopoly enforcers in creating the world they were seeing.

For example, he first got a modem in 1982, which happens to be the year that AT&T was broken up. Among other things, AT&T did not want people to own or use modems; managers there understood that the spread of digital technology would likely take the premium out of many telecom services. After the breakup by antitrust enforcers, though, modems and bulletin-board systems took over, thanks to the crash in long-distance rates.

Or think about the rise of the IBM PC — a machine without an operating system supplied by IBM. In those days, IBM was the bully of the computing industry and the subject of constant antitrust efforts by the US government. The company spent twelve years in "antitrust hell", but eventually settled the case. When it came time to market the PC, though, IBM was still recovering from that experience, part of which was the result of its forced bundling of hardware and software. Feeling that it couldn't create its own PC operating system without getting back into trouble, it instead bought one from Microsoft.

The first PC clones showed up almost immediately — clones that were able to run Microsoft's operating system. Soon there was a thriving, competitive industry among clone sellers, leading to the rise of companies like Dell and Compaq. These companies took over the market initially created by IBM. It seemed like that was just how the technology market works, but there is a fundamental reason why IBM didn't crush that industry at its birth: the antitrust case. IBM did not want to become the effort of renewed antitrust enforcement, so it had to stand by while the PC-clone market took off.

Then, there is Microsoft, which was an outcome of that antitrust enforcement. Microsoft, too, became a monopolist and became the target of antitrust action; that action arrived too late to save Netscape but, after enduring it for years, Microsoft was "terrorized" by the experience. So when Google was founded, Microsoft did not crush it; Google subsequently "ate Microsoft's lunch".

It seemed like we had a perfect market in technology; one just needed a cool idea and a keyboard, and there would be a global audience just waiting to hand over its money. Switching costs would always be low; if some firm became dominant, it was just a matter of reverse-engineering its formats and protocols to open things up again.

The end of antitrust

What we didn't understand, Doctorow said, was that antitrust law was destroyed in the US by a man named Robert Bork. He is a "perfect market" theorist, who thought that monopolies were good. Laws against monopolies, he argued, only applied if it could be shown that a given situation was causing harm to consumers. At the same time, he made proving that harm nearly impossible. In this world, companies could create monopolies with impunity.

It is fashionable now to say that the concentration in the technology industry is a result of factors like network effects, first-mover advantages, and data moats. But that is not how these companies created and grew their monopolies; when you have all the money you could need, he said, you can just buy success. Google has made "1.5 successful products" in-house (the search engine and a Hotmail clone); everything else has been bought from elsewhere. These are companies that Google would have been blocked from buying under a strong antitrust regime. Meanwhile many of the other things Google did try to create internally have ended up in the "Google graveyard".

Network effects are real, but they are also a double-edged sword when interoperability comes into play. One source of interoperability is technology standards, but another is what he calls "adversarial interoperability" or "competitive compatibility". AT&T used to block interoperability by forbidding the attachment of outside equipment to the phone network; once that ban went away, the market for telephone equipment took off. Myspace had a set of captive users — until Facebook created a bot to scrape users' information from the site and port it over.

Given a chance, companies will create interoperability one way or another, making the market more competitive. This kind of interoperability has been criminalized, though, through mechanisms like copyright, patents, and terms of service. Oracle's ongoing lawsuit alleging that Google violated the copyright on its Java APIs is a classic example. Companies that own this sort of monopoly are doubly fortunate, since the government will intervene to defend the monopoly against those who would try to break it.

What to do

Maybe, Doctorow allowed, early technology enthusiasts were a little blind after all and, once the money started to flow, they wanted some of it for themselves. The mythology of that age, where a bright idea developed in a garage somewhere could take over the world, helped to salve a lot of consciences. But those days are behind us; if you have a bright idea now, you are not going to be making a lot of money from it; nobody wants to invest in companies trying to compete with the big monopolists. The dream now is not to create a successful company, it's to be acquired by a technology giant. Meanwhile, big tech, which no long need fear losing its customers, has pivoted to abusing them. The code to carry out this abuse is written by techies like those in the audience, he said; for many people, their full-time job is now taking away others' self determination.

The last couple of years, though, have started to see mass walkouts from some of the big firms. There have been protests against the development of facial-recognition technology and pushes for workplace unionization. An early precedent for this kind of change of heart, he said, can be found in Robert Oppenheimer, who managed the Manhattan Project to create atomic weapons for the US. He came to deeply regret his role in the creation of those weapons and redirected his life toward fighting nuclear proliferation. Doctorow said that he has spent much of his life "trying to create more Oppenheimers" through his books.

In the early days of the ecology movement there was no "ecology movement"; instead, there was a large collection of individual causes. Some people were fighting to save owls, others were working on water issues, and so on. They all thought they were fighting for different things until the word "ecology" pulled it all together into a single movement, helped by the celebration of the first Earth Day.

We are, he said, approaching a similar moment with the fight against monopoly, which is not unique to the technology industry. Almost every field is dominated by a small number of companies, to our detriment. The fact that there is only one company in the US that makes the right sort of glass bottles has hampered vaccine distribution, for example. As people come to understand the fights against monopoly, they'll come to understand that there are allies everywhere; we are all on the same side.

The tech giants, he concluded, say that there is something special about the technology industry that leads to the creation of monopolies. But we have been there before, and we know how to deal with monopolies. We just have to mobilize and make it happen.

Comments (35 posted)

A major vulnerability in Sudo

By Jake Edge
February 3, 2021

A longstanding hole in the Sudo privilege-delegation tool that was discovered in late January is a potent local vulnerability. Exploiting it allows local users to run code of their choosing as root by way of a bog-standard heap-buffer overflow. It seems like the kind of bug that might have been found earlier via code inspection or fuzzing, but it has remained in this security-sensitive utility since it was introduced in 2011.

Qualys reported the bug on January 26; it has been in Sudo from version 1.8.2, released in August 2011, up through 1.9.5p1, which was released on January 11. At the same time as the announcement, Sudo released version 1.9.5p2 to fix the problems. The bug has been assigned CVE-2021-3156, which Qualys has dubbed "Baron Samedit". That name combines Baron Samedi, the name of the vodou loa of the dead, with sudoedit, which is integral to the exploit.

Unlike last year's Sudo vulnerability, which exploited an uncommon, non-default configuration, this time around the problem is more widespread. Any systems with untrusted users will want to upgrade Sudo to avoid the problem. The major Linux distributions have already issued updates at this point.

One fairly straightforward test to see if a system is vulnerable is shown in the report:

    $ sudoedit -s '\' `perl -e 'print "A" x 65536'` 
    malloc(): corrupted top size 
    Aborted (core dumped)

Systems that have the problem will see the "corrupted top size" warning message. Another, perhaps less worrisome test, sans scary messages, is shown in the Sudo project advisory:

    $ sudoedit -s /
    sudoedit: /: not a regular file     # or it might prompt for a password

In both cases, systems that are not vulnerable will simply give a usage string. As we will see, attackers can control what gets written beyond the end of a heap buffer, which effectively allows them to make the program do their bidding. That means attackers can subvert sudoedit, which is simply a symbolic link to the setuid-root sudo binary. Game over, as they say.

The "-s" option is important to the flaw, but it is also not really meant for sudoedit at all; it specifies that the user's shell should be used, but sudoedit invokes an editor. The "-s" (or "--shell") option is valid and does make sense for the underlying sudo command, however. Part of the fix for CVE-2021-3156 is to restrict the valid options for sudoedit, which is why the patched versions give a usage string instead of a crash or other error.

Defeating the escape code

The buffer overflow itself also needs to be addressed, even though the option fix leaves no known path to get to that code. The basic problem is that a command-line argument that ends with a backslash can break the code that escapes meta-characters during argument parsing, leading to writing past the end of a heap-based buffer.

When sudo is executing in shell mode (due to -s or the related -i), it collects up all of the command-line arguments into a single buffer and escapes any meta-characters found with a backslash. It creates a new version of the argv array that consists of the shell to be executed, "-c" as the command argument to the shell, followed by this new buffer. That new argv will then be used when the shell is executed.

Later in the processing, Sudo processes the new argv and, if necessary, un-escapes the meta-characters into another buffer in order to match them in the sudoers file as well as for logging purposes. But that un-escaping process can go awry if any command-line argument ends with a single backslash. Under that condition, the un-escaping code copies the character after the backslash, even if it is the NUL at the end of the string. It merrily keeps on copying until it hits an un-escaped NUL, so it copies into a heap buffer beyond its end, the size of which was calculated based on the first NUL.

In theory, every buffer that gets un-escaped has already been escaped internally so every backslash has been escaped with another. If true, that would mean this condition could not occur, but, of course, it turns out not to be true in (at least) one case. The test used to decide whether to force the escaping is subtly different than the test to decide whether to un-escape. By using sudoedit with the -s option, the "proper" path could be tickled, so that the options were not escaped, but were later un-escaped, which led to the buffer overflow. That was fixed with a separate patch addressing both the tests and the broken logic when the buffer ends with a single backslash.

As described by the report, this buffer overflow is ideal from the attacker's perspective because they can control everything about the contents and size of the overflow. The last command-line argument to sudo is conveniently followed by the environment variables, so the attacker can precisely arrange the contents and even include NUL bytes since ending arguments or environment variables with a single backslash results in a NUL. The report gives an example of how to write attacker-controlled values to the data structure used by malloc():

For example, on an amd64 Linux, the following command allocates a 24-byte "user_args" buffer (a 32-byte heap chunk) and overwrites the next chunk's size field with "A=a\0B=b\0" (0x00623d4200613d41), its fd field with "C=c\0D=d\0" (0x00643d4400633d43), and its bk field with "E=e\0F=f\0" (0x00663d4600653d45):
env -i 'AA=a\' 'B=b\' 'C=c\' 'D=d\' 'E=e\' 'F=f' sudoedit -s '1234567890123456789012\'
------------------------------------------------------------------------

--|--------+--------+--------+--------|--------+--------+--------+--------+--
  |        |        |12345678|90123456|789012.A|A=a.B=b.|C=c.D=d.|E=e.F=f.|
--|--------+--------+--------+--------|--------+--------+--------+--------+--
              size  <---- user_args buffer ---->  size      fd       bk

There is even more information about exploiting the flaw in the "Exploitation" section of the report. It makes for some fascinating reading for those who are curious about how these kinds of exploits can work.

In the end, though, this is caused by a typical C programming error leading to a complete compromise of a setuid-root program—but only for local users. It once again highlights the dangers of using C for these kinds of tools, but it also points to a certain amount of complacency within our community. For a bug of this sort to persist for this long in a tool of this nature would seem to indicate that we are not really scrutinizing our code as well as we should be. Or testing and fuzzing enough either.

As Hanno Böck noted, though, sudo likely has far more complexity than a tool of this sort should have; there are alternatives, but those have potential downsides as well. Wholesale replacement in a safer language (and perhaps with many fewer features) has its attractions, but someone has to do that work too; then there is a new code base, in a possibly less-familiar language, that needs a lot of scrutiny as well. As always, there are no silver bullets.

Comments (51 posted)

Finding real-world kernel subsystems

By Jonathan Corbet
February 1, 2021

LCA
The kernel development community talks often about subsystems and subsystem maintainers, but it is less than entirely clear about what a "subsystem" is in the first place. People wanting to understand how kernel development works could benefit from a clearer idea of what actually comprises a subsystem within the kernel. In an attempt to better understand how kernel development works, Pia Eichinger (and her colleagues Ralf Ramsauer, Stefanie Scherzinger, and Wolfgang Maurer) spent a lot of time looking for the actual boundaries; Eichinger presented that work at the 2021 linux.conf.au online gathering.

This work was undertaken to develop a more formalized model of how kernel development works. With such an understanding, it is hoped, ways can be found to make the process work better and to provide new tools. The researchers have a particular interest in safety-critical deployments of Linux. Safety-critical environments are highly sensitive; working software can make a life-or-death difference there. So safety-critical developers have to ensure software quality by any means available.

One such means is to take a close look at the development process, on the reasonable assumption that the process impacts the quality of the final result. Assuming that the process itself makes sense, a project that adheres more closely to its defined process should [Pia Eichinger] produce higher-quality software. So if it can be proved that a project's developers strictly comply with their development process, the level of assurance is higher and certification — generally necessary for safety-critical systems — is easier to achieve.

The Linux kernel presents some major challenges when it comes to certification due to its open development process. Nobody documents the process or the degree to which it is adhered to. But, she said, with a bit of data mining, much of that information can be recovered after the fact. Her focus is on patch integration in particular and whether patches are being merged by the appropriate subsystem maintainers. If patches are taking "strange paths", that is a sign that the process is not being followed.

Eichinger ran into a little problem on the path to that goal, though: where can one find the subsystem hierarchy that defines this process? Where are the documents describing these subsystems; more to the point, what is a subsystem, exactly? It may seem like a trivial question, she said; that is what the MAINTAINERS file is for. But it is not that easy; as was covered in this article (which she cited during the talk), the information in this file is neither complete nor 100% accurate.

First of all, many kernel subsystems do not appear in MAINTAINERS at all. But the picture is less than clear even for those that are present. Consider, for example, the "media subsystem"; there is no entry for it. There are, however, over 100 MAINTAINERS entries with "media" in the name somewhere. Which of those is the true media subsystem? The answer is not clear for somebody who is not closely familiar with the kernel community.

Eichinger and company needed a definition of a "subsystem", so they made their own. Entries in the MAINTAINERS file do not clearly describe subsystems, so they were deemed instead to be "sections" that describe some part of the kernel code base. Many of these sections share files with each other; those were designated as "thematically related". By finding and grouping clusters of related sections, the kernel's true subsystems could be found.

To do so, she processed all of the section entries and plotted them on an undirected graph, where the sections themselves were the vertices and shared lines of code make up the edges. The initial graph looked like this (from Eichinger's slides [PDF]):

[Subsystem graph]

That was, she allowed, a bit messy. To try to create something more useful, she cut the graph down to the largest 20% of the sections in the MAINTAINERS file. The result for the aforementioned media subsystem looked like this:

[media subsystem
graph]

Therein one sees a number of sections for specific drivers, including a sizeable sub-cluster in the staging directory and a small blob in the Android drivers. The section that ties it all together is "media input infrastructure" — the actual media subsystem.

The picture for the direct rendering (DRM) subsystem looks a little different:

[DRM subsystem
graph]

This subsystem appears as a large collection of related small clusters, with a lot of overlap between them. She described this organization as "non-conforming" with the hierarchical subsystem model; it seems likely that what is actually seen here is the distributed, group-maintainer model used by the DRM developers.

At this point, she has some sort of definition of subsystems, twelve of which were identified at the top level. Those twelve were the Arm architecture, drivers, crypto, USB, DRM, networking, media, documentation, sound, SCSI, more Arm stuff (OMAP architecture code, for example), and Infiniband. Along with that, she has a tool that can automate this sort of subsystem detection. It is, she said, "just scratching the surface" of the problem, but it is a start.

There are a number of ways this work could go in the future. One would be to examine historical kernel releases to build a history of how kernel subsystems have evolved over time. This model can also be used, of course, for the original purpose of determining how well the actual kernel patch flow conforms to the maintainer model. There may be scope for applying this technique to other projects as well.

For more information, readers can go to Eichinger's bachelor thesis describing the entire project. The code for performing this analysis (called "PaStA") can be found in this GitHub repository.

Comments (8 posted)

Wayland support (and more) for Emacs

By Jonathan Corbet
January 28, 2021

LCA
Jeffrey Walsh started off his 2021 linux.conf.au presentation with a statement that, while 2020 was not the greatest year ever, there were still some good things that happened; one of those was the Emacs 27.1 release. This major update brought a number of welcome new features, but also led to yet another discussion on the future of Emacs. With that starting point, Walsh launched into a fast-moving look at the history of Emacs, why users still care about it, what changes are coming, and (especially) what was involved in moving Emacs away from the X window system and making it work with the Wayland compositor.

There were a number of good things to be found in the 27.1 release, which was a "huge jump" in functionality. Perhaps at the top of the list is support for the HarfBuzz library, which brings improved text-rendering support in multiple languages — and the support for color emoji that no self-respecting 2020s application can be without. Portable dumping was finally added, leading to faster startup and less system-dependent code. Emacs also now supports a tab-based interface, something that "had been asked-for forever", Walsh said.

After that release, though, "the navel-gazing set in" with lots of questions about how Emacs could be restored to its one-time popularity. This is something that happens every year, he said, but the discussion was perhaps deeper this time around. The problem with this kind of discussion [Jeffrey Walsh] in the Emacs community is that any resulting action can take years to come to fruition; there tends to be a lot of resistance to change from "old hands" (in whose company Walsh counted himself). But if people present changes in the form of code, they can eventually gain traction.

Some ideas that were discussed are not going to happen; these include replacing elisp with JavaScript or some other "more modern" language. Similarly, porting Emacs's C code to Rust is not something to hold one's breath for. Other desired changes are more "socially easy", but are hard technically; these include making elisp faster so that more of Emacs could be written in Lisp rather than C. Another is rewriting the Unix graphical interface, which is fragile; small changes can break obscure platforms in surprising ways.

Taking a step back, Walsh asked why people bother to use Emacs in the first place. In his own case, it's simply too late to change — and he has tried a couple of times. It is his toolbox at this point. In any case, whenever some other editor demonstrates an interesting feature, it gets ported to Emacs eventually. He also likes the fact that, unlike other editors, Emacs treats C code as C, rather than as some poor-developer's C++.

It is one thing to ask why a developer like Walsh would use Emacs, but it is another to ask why he would want Emacs to work with Wayland, and why he would want to work on that project. The answer was that Wayland works well on his box; he finds it to be "less clunky" than X at this point. Emacs was the only application he had that still used X; fixing that was his opportunity to get the X libraries off his system entirely.

The Wayland port

Emacs, he said, is not really a text editor; it's a Lisp environment that has been ported to Unix, which just happens to have a useful editor built into it. Without Lisp, he said, there is no Emacs. Everything that happens in Emacs goes through Lisp; for example, if one types the letter "x", Emacs will (usually) run:

    (self-insert-command "x")

to insert that letter into the current buffer. Everything can be reconfigured at almost all levels. Among other things, that means that Emacs has no concept of a "plugin" because none is needed; add-on features are equivalent to the core of Emacs. If one looks in the source repository, 60% of what's there is Lisp, while only 15% is C. Amusingly, 16% is changelog containing 35 years of development history. The C code includes the Lisp interpreter, the Athena toolkit, and the core "redisplay" code that manages the screen.

That redisplay code, he said, is C code that was written by Lisp developers while many of the core C standards were still being developed. It is characterized by "heavy macro use" that is trying to hide a lot of history. It is efficient code, but also complex; it's one of the trickiest parts of the entire code base. But this is where changes need to be made to bring Emacs to Wayland.

The problem is complicated by the fact that Emacs was never designed to be a graphical user interface; it has always been a terminal application. To work with the X window system, Emacs pretends that an X window is, in fact, a terminal. As is its usual practice, Emacs deals with X at the lowest level, processing as much X interaction through Lisp code as possible.

At some point, the X redisplay code was ported to the GTK toolkit, but that port is a sort of false front; windows are created with GTK, but then all events in that window are forwarded directly to the Lisp engine. Even with GTK3, Emacs is still using much of its old code, handling drawing and events at the X level. Later, Cairo support was added to help with printing; it has been "periodically maintained" since.

There has been interest in Wayland support for some time; after all, Emacs now uses GTK and Cairo, both of which support Wayland, so how hard can it be? A rudimentary port was done in 2014 using Cairo for drawing; it was missing a number of features but was a useful learning experience. That exercise did lead to the adoption of Cairo as the core rendering library in 2015, though Cairo did not become the default until the 27.1 release in January 2020. Nonetheless, it was one of the first signs that Emacs may be moving away from X.

In 2019, Stefan Monnier pointed out an in-progress Wayland port and asked if anybody had looked at it. That was, it turns out, a pure GTK port that migrates all low-level X code to its GTK3 equivalents. All drawing is done through Cairo and GTK, which should be wonderful for Emacs going forward, in that the maintenance burden for that code rests with the GTK developers.

It works by creating a new terminal type that uses GTK to get a window up on the screen and handles GTK events. There is a new Emacs widget that serves as the endpoint for those events and implements the Emacs display. In the future, this widget could conceivably be used in any program that wants to be able to put up an Emacs window. This widget can hold others, things like GTK scrollbars seem to work well within it, for example.

Getting all of this working has required dealing with a number of challenges, many of which are based in the fact that Emacs is both huge and old. The redisplay code is about 20,000 lines, full of #ifdefs and macros; it's hard to work with. There are a lot old workarounds in this code that may not be needed anymore, but it can be hard to tell.

The X code in Emacs "wants to know everything" and handle everything itself; this approach is wired deeply into Emacs as a whole. GTK, instead, wants to abstract things and deal with as much as possible itself. So, for example, Emacs wants to see and respond to raw X events, but that's not what GTK provides. Emacs is designed around having all events be handled through Lisp code, but that may not be a model that can be supported going forward.

In Wayland, all windows are independent, even when they are nested inside each other. Windows in X, instead, are interrelated and applications can use global coordinates. That leads to confusion in the Wayland switch because it's not always clear where the origin should be; that leads to annoying problems where popups appear in the wrong place. Walsh lost three weeks of his life figuring this out, he said.

The pure-GTK work, in the end, came down to about 100 commits touching 58 files and adding about 16,000 lines of code. Does it work? Yes, it does. It was developed on Arch Linux and Fedora, but has been tested on FreeBSD as well; he has been using it daily for the last year. He does not see a lot of differences between the X and Wayland ports, but prefers the Wayland version. It is "bug-compatible" with X Emacs, though there are a few rendering differences that one might notice. He thinks it renders more quickly, but had no measurements to prove that.

There are a few things that will not be supported in the Wayland version, though. Xsettings has been replaced with GSettings, for example. Wayland doesn't support self-positioning of windows — that is left to the compositor — so Emacs is unable to support options that fix the position of windows.

This code currently lives in a feature branch in the Emacs Savannah repository; he would like to see more people testing it. He also said that, if there are developers who could provide GTK help, that would be appreciated.

Looking forward

Now that there is a pure GTK3 port of Emacs, GTK4 has naturally been released. There is some experimental work going on to port Emacs to GTK4, he said, but there may not be a lot of benefits to be had from that port. GTK4 is an even further departure from X, making the port harder yet. There has also been interest in using features like OpenGL, Vulkan, and DirectX, but he advised patience; it took twelve years to get the Cairo port in, for example. Beyond that, text rendering is relatively hard on a GPU, so it's not clear what Emacs would gain from this work.

What else can be expected from Emacs in general? It probably will not break away from its "ugly look", he predicted. The GTK port was not meant to change the editor's appearance. The default key bindings will stay the same; there are too many old hands who would resist changing them. Those old hands are the core Emacs user base, so there is no desire to make them unhappy. Elisp will not be replaced anytime soon either — except, maybe, with a different flavor of Lisp. Whenever that conversation comes up, though, people start arguing about which flavor, with lots of bikeshedding going on.

What is changing can be seen in the ongoing push toward common libraries rather than doing everything by hand within Emacs itself. The adoption of libraries like HarfBuzz, Cairo, and Jansson are examples; he thinks this trend could continue.

A big thing to look forward to is native compilation of Lisp code; that will have "a huge ripple effect" in the Emacs community. Andrea Corallo first asked about this capability in 2019, and delivered a prototype implementation one month later. It uses libgccjit as the back-end for the byte compiler and supports a wide range of platforms. That is important: by using GNU tools for the compilation, it meets the "GNU political requirements" and can actually be included in Emacs. This work will allow writing more of Emacs in Lisp, migrating some code back from C. That will allow for more customization; in the future, customized Emacs "distributions" may take on a bigger role as a result.

He finished by crediting Yuuki Harano as the main developer behind the pure GTK work in Emacs.

Your editor asked afterward about how interested users could try out this port. One way is to check out the branch from Savannah and build it, but there are also a number of repositories out there with builds for various distributions now. Arch Linux packages it as part of the distribution "because it's Arch". This article was written using a build from the Fedora Copr repository. The xeyes test shows that it is indeed running natively on Wayland, and it seems to work well, with the one strange exception that mouse-wheel scrolling happens at half speed.

Comments (24 posted)

Page editor: Jonathan Corbet
Next page: Brief items>>


Copyright © 2021, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds