Leading items

Emacs and LLDB

By Jake Edge
February 11, 2015

Back in January, we looked at a dispute regarding extracting internal representation data from GCC for use by Emacs. Because of concerns about commercial entities circumventing the GPL, Richard Stallman opposed making the abstract syntax tree (AST) available from GCC, which was a feature that some Emacs developers were hoping to make use of. There are also technical barriers to using GCC's AST, it would seem, but it is the political barrier being erected that stuck in the craw of many. A more recent thread on the emacs-devel mailing list looks a bit like history repeating itself.

Andrew L. Moore obviously recognized the potential pitfalls when he asked about merging a patch adding support for the LLDB debugger (part of the LLVM project) to the code for the Emacs Grand Unified Debugger mode: "I’d be interested to know if and how this might be accepted into the Emacs distribution, RMS’s opinion of LLVM notwithstanding." Emacs maintainer Stefan Monnier was quick to respond positively to the idea. His only concern was to "to clear the copyright status of this code".

On the other hand, Stallman was not in favor of the change. In fact, he saw it as an attack on the GNU project:

It looks like there is a systematic effort to attack GNU packages. The GNU Project needs to respond strategically, which means not by having each GNU package cooperate with each attack. For now, please do NOT install this change.

As might be guessed, others saw things a little differently. Several likened the situation to that of Windows and Mac OS X support in Emacs, though Stallman rejected that comparison. In addition, no one saw basic LLDB support for Emacs as the attack that Stallman made it out to be. Monnier said Stallman sounded paranoid:

LLVM is not meant to kill GCC more than Windows is meant to kill GNU/Linux.

Yes, they compete. But the intention is not to replace one with another. The intention of people working on LLVM is to solve their immediate problem, and for one reason or another, they don't consider GCC as a good way to solve their problem.

Nobody would benefit from killing GCC, really. Not even control freaks who think the GPL is the plague.

Furthermore, as David Kastrup put it: "If we try to close down every cooperation with non-GNU free software, we are sacrificing our goals for the sake of our temples." Stallman disputed that he was suggesting that course, but there is, at least, a perception problem that Stephen J. Turnbull pointed out:

What is feared is that your reflexive opposition to initiatives that you admit you don't understand ("I must study ... don't rush me") merely because they involve LLVM *are* having a chilling effect. And if you can oppose something as innocuous as adding to ELPA [Emacs Lisp Package Archive] *existing* software (which is *already widely distributed*) merely because it involves LLVM, that chilling effect could effectively become a freeze, deterring *many* potential contributors and alienating users who (think they) need the features offered by LLVM that the GNU toolchain doesn't have.

There are several issues for Emacs (and GNU) that Stallman is currently studying, including getting AST information from GCC and support for LLDB but, to some, those studies are holding development back unnecessarily. Kastrup would like to see some rules that would help reduce the amount of study required:

At any rate, it stresses my point that we don't have enough Richard to reasonably cover all of GNU's decision-making needs without some hard and fast rules.

And to me some hard and fast rules for interoperation seem feasible: after all, we can hardly "defend" ourselves against software that is licensed in GPL-compatible ways. Any measure against such software will equally well keep GPLed/GNU software at bay.

Stallman, though, seems to be convinced that Apple is attacking GCC through LLVM: "Apple intends LLVM and Clang to make GCC cease to be a signal success and a reason for all sorts of companies to work on a compiler that always gives users freedom". But, as Eric S. Raymond pointed out, Apple's adoption of LLVM is actually a victory, since LLVM is licensed under a free-software license (just not a copyleft license like the GPL):

As David Kastrup notes, the existence of the clang project is victory - it's Apple conceding in practice that it is no longer realistically possible to develop some kinds of critically important tools in a proprietary lockup.

As a result of this victory, all sorts of companies are now working on *two* compilers that always give users freedom. One is GCC. The other is clang (I haven't noticed my freedom being diminished even a little bit when I set CC=clang). That is a good thing.

Apple is not composed of angels. Apple does things that you and I would both regard as scummy. But to suppose that Apple has any desire, need, or intention to attack GCC is to attribute an importance to GCC in Apple's eyes that it has not possessed since the day clang shipped 1.0.

An interesting side note to the debate emerged when Liang Wang posted about LLVM creator Chris Lattner's offer to try to get LLVM's copyright assigned to the FSF back in 2005. It was part of an effort (that seemingly went nowhere) to integrate LLVM and GCC. Stallman never heard about the message:

I am stunned to see that we had this offer. Now, based on hindsight, I wish we had accepted it.

If I had seen it back then, I would not have had the benefit of hindsight, but it would clearly have been a real possibility. Nothing would have ruled it out.

Helmut Eller suggested contacting Lattner to see if LLVM might be interested in switching its license but, as Kastrup pointed out, LLVM is already licensed in a GPL-compatible way. If someone wanted to, they could release a GPL fork of LLVM tomorrow. On the other hand, though, LLVM is targeted at being modular so that other tools can use it in novel ways—something that GCC has generally resisted (largely at Stallman's behest). So, Kastrup continued:

For better or worse, a lot of decisions _have_ been made, _by_ the GNU project. These decisions had consequences with companies and individuals seeking their own solutions for problems that the GNU project considered too dangerous to approach. The current situation is not the outcome of a coordinated attack against the GNU project but rather the most obvious and natural consequence of our own actions, and it's time that we started to deal with the consequences of our actions in a graceful and mature and most particularly not self-destructive manner.

So far, at least, there has been no response from Stallman to that. In the meantime, however, Raymond posted a broadside furthering his earlier message entitled "Defending GCC considered futile". It is too late to save GCC, he said, so it is time to move on:

Already my own experiments suggest that LLVM is a superior compiler, by every metric I know of, at least in deployments that don't require bug-for-bug compatibility with GCC. If GCC were to vanish from existence tomorrow I'm not sure I myself would be even seriously inconvenienced. CC=clang in one dotfile; problem solved, done.

Obsolescence happens; this is nobody's fault. It will happen to clang/LLVM someday, too, but today is not that day.

Reaction to Raymond's message was fairly muted on the mailing list, which is a bit surprising, but may partly be due to the fact that it was posted to an Emacs list, rather than one for GCC. In any case, Stallman chose to focus on the narrower issue of whether supporting LLDB is right for Emacs—and to ignore the larger LLVM issue taken up by most others in the threads. But he admitted a total lack of knowledge of LLDB, though he guessed it is a "noncopylefted debugger", based on the name. Others confirmed that guess and went on to describe LLDB at some length.

Stallman is still awaiting information about LLDB to try to make a decision about whether he thinks Emacs should support it or not. As he did in the earlier thread, though, Monnier made it clear that Stallman's opinions on the matter carry no weight with him. He has said he will merge LLDB support once there is code with a proper copyright assignment available.

On multiple issues, Stallman could be seen to be standing in the way of progress in ways that don't serve Emacs, GCC, or the GNU project all that well. While Raymond's assertion that he would not personally miss GCC is, characteristically, over the top (compiling a Linux kernel still requires GCC, for example, even as the LLVMLinux project gets closer to its goal), it may not be all that long before it is true. That's unfortunate in many ways, but perhaps not completely unexpected.

By all accounts, LLVM is an excellent project that provides infrastructure for numerous compilers (including Clang) and other tools (including LLDB). While it doesn't have the license that Stallman, the FSF, and others might wish, it is certainly free software. There is plenty of room for two (or more) compiler suites in the free-software world—we just may be seeing a switch in the dominant player. Trying to stand in the way of that switch may well turn out to be a poor place to stand.

Comments (121 posted)

Matrix: a new specification for federated realtime chat

By Nathan Willis
February 11, 2015

The free-software community has frequently advocated the development of new decentralized, federated network services—for example, promoting XMPP as an alternative to AOL Instant Messenger, StatusNet as an alternative to Twitter, or Diaspora as an alternative to Facebook. The recently launched Matrix project takes on a different service: IRC-like multi-user chat. If the thought of replacing IRC sounds like a strange goal for a new project, though, Matrix is extensible, and the developers have already added support for one-to-one audio and video calling. Though it is still in development, Matrix is simple enough that one can already get a feel for how it works.

Matrix was launched in mid-2014, with most of the developers coming from the enterprise software firms Amdocs and OpenMarket. The Matrix site hosts the main specification as well as a description of the client-to-server API. There is also a server-to-server API which, so far, has not been published on its own (although it is referred to in the Matrix specification). At GitHub, the project maintains repositories for a reference Matrix server called synapse and for Android and iOS client libraries.

The synapse repository also includes two demo clients: one that runs as a web application, and one command-line client written in Python. The most recent release is version 0.6.1f, from February 10. Interested users can test out the system by connecting to the Matrix demo server through the web client at https://matrix.org/beta/ or through the IRC gateway at irc://irc.freenode.net/matrix.

Matrix basics

The Matrix system allows individual users to join persistent realtime chat rooms (hence the comparison to IRC and its channels). But the messages posted in any particular room are continuously synchronized between every participating Matrix server, thus protecting against the single-point-of-failure problem well known to IRC users. Whenever a new server connects a room—which it would only do when a user attempts to join that room—the other participating servers propagate the room's history to the new server, so that all servers eventually reach a consistent state, and all users have access to the full message history.

Matrix rooms feature management functions similar to those found in IRC, such as inviting, kicking, and banning room members. While the general-purpose Matrix message format is text, the protocol also supports presence (so that users can advertise "here" or "away" status as desired) and public user profiles. It can also be extended to support other message types. As published, it already supports emoticon messages, geolocation references, and file attachments.

Rooms can have user-friendly names, like #help:example.org, that are similar to IRC channel names. But those user-friendly names are actually aliases: under the hood, the real room identifier is a randomly-generated string. All that matters to someone wishing to join the room is that the server running at example.org maintains the current mapping between the alias and the room identifier. Furthermore, the real room identifier could change over time—rooms persist even if the alias becomes unavailable (such as the original server going offline).

The purpose of separating room identifiers and user-friendly aliases is two-fold. First, the domain component of the alias identifies the server that is advertising the alias-to-room mapping; this is intended to provide a globally unique namespace—as long as two users on example.org do not both try to start #help rooms, it does not matter that there may be other, disjoint #help rooms originating somewhere else. Second, users that want to connect to a persistent room need only remember the alias to join—if the original server is restarted, the room identifier might change, but the alias can remain the same.

User identities use a syntax much like room identifiers; the format is @username:example.com. Interestingly enough, the specification also advertises a way for users to publicly link a Matrix user ID to a "third-party identifier" (3PID), such as an email address or phone number. To do this, the Matrix network would support a global, federated network of identity servers that maintains the links between accounts and 3PIDs. The goal would seem to be making it simpler to invite another user to join a chat. Unfortunately, the relevant section of the Matrix specification is still empty (marked with "This section is a work in progress.").

In addition to its more IRC-like facets, Matrix also includes some features geared towards one-to-one communication. Users can send invitation messages to other users, and the basic Matrix protocol supports extensions for WebRTC audio and video. Together, those functions allow Matrix to provide a VoIP-like service. Users can also create a chat room and invite others to it without advertising it via a public alias; that functionality more closely resembles traditional instant messaging.

There are also some interesting features to the Matrix system without a clear analogue to other network services. For example, each user connects by signing in from their client application to the Matrix server where they have an account. But the chat room is a server-to-server construct. Users can log on to their server with more than one client simultaneously (e.g., a desktop machine and a mobile device), and the other servers participating in the room never need to know. The synapse reference server provides this feature. With IRC or IM, each client application connects directly to the central server, and logging in from two locations creates a variety of problems.

Networks and security

One of the more fundamental questions a new project like Matrix must answer is why an entirely new protocol is required to do what the new system does—after all, XMPP, SIP, and IRC are all well-established at this point. The Matrix project's answer consists of several pieces. First, there is the general rationale for decentralized and federated services. No central server is required to connect two users, anyone can run their own server, and users with accounts on different servers can interact easily.

Second, the project has its own list of shortcomings with IRC, XMPP, and the like. Some of these are well-known—like IRC's limitation to text-only messages and the inability of users to simultaneously connect to an IRC channel from more than one device. Others are a little more subjective, such as the claim that XMPP has "no strong identity system".

Third, and perhaps the most difficult to quantify, Matrix is designed to be friendly to web implementations—and, in fact, builds heavily on web underpinnings. Client-server and server-server connections are made with HTTP on top of TLS. All of the messages (client-server and server-server) are in JSON and are exchanged with RESTful requests.

Speaking of TLS, the security model of Matrix is one area in which the project has several surprises in store. For starters, end-to-end encryption is not built in. The traffic on the wire is protected by TLS, but that does not prevent unwanted users from joining what is meant to be a private discussion room. Users who wish to restrict access to a room must rely on Matrix's room-management API: room moderators can kick out users or ban them (by user ID), but little else. Furthermore, although TLS encrypts the messages between servers, the servers themselves have access to the message content. Several spots in the specification and on the Matrix site as a whole make reference to optional end-to-end encryption (where only the users could decrypt the contents of messages), but this appears to be another unfinished component at present.

That said, Matrix does allow for room moderators to "redact" old messages. A m.room.redaction event permanently removes the content of a message, but it leaves the message's event ID and other generic metadata fields in place; that allows other servers in the room to similarly strip out the (presumably offensive) contents without causing problems for the synchronization algorithm.

The final piece of the Matrix puzzle is identity management. Server-to-server Matrix messages are authenticated (in addition to TLS) using public-key signatures in the HTTP Authorization header that accompanies the message. When establishing a new connection, each Matrix server can retrieve the both the TLS certificate and the signing key of its peer server from the well-known location http://foo.com/_matrix/key/v1. Thus, the system places trust in the usual TLS certificate-authority system to verify the identity of a server.

Authenticating the identity of a remote user in a chat room is a different matter. Here again, the Matrix specification is essentially blank, and all that can be gathered about the scheme that the project has in mind are bits and pieces referred to elsewhere. The synapse repository includes a brief description on its landing page, noting that:

the role of managing trusted identity in the Matrix ecosystem is farmed out to a cluster of known trusted ecosystem partners, who run 'Matrix Identity Servers' such as sydent, whose role is purely to authenticate and track 3PID logins and publish end-user public keys.

The sydent server referenced is another GitHub repository, albeit one that has not been touched in five months. But there is slightly more detail if one is willing to dig. The authenticity of any user account is the purview of the Matrix server where the account resides. An Identity Server can only verify that a given Matrix ID has been asserted to go with a particular email address (or other 3PID), and that the assertion came from the correct Matrix server. In short: the trust eventually falls back to the Matrix server anyway.

But perhaps that is good enough; after all, one of the main purposes of designing a federated network like Matrix is that anyone has the freedom to set up a server and start some conversations. Much like one cannot automatically trust the validity of an unknown email address, other (non-technical) means will be required to determine whether or not a Matrix user is who they claim to be. It will be far more interesting, from a security standpoint, when Matrix's identity-management model is fleshed out to support things like end-to-end encryption and public-key cryptography.

Whether or not Matrix takes off as a realtime chat protocol will be interesting to watch. It is certainly true that IRC, despite all of its limitations, has resisted numerous challenges from up-and-coming communication schemes. But Matrix may have a key factor working in its favor: no reliance on a centralized server to coordinate conversations. In some circles, certainly, that is a long-requested feature.

Comments (46 posted)

Page editor: Jonathan Corbet
Next page: Security>>