|
|
Subscribe / Log in / New account

Leading items

Welcome to the LWN.net Weekly Edition for March 18, 2021

This edition contains the following feature content:

This week's edition also includes these inner pages:

  • Brief items: Brief news items from throughout the community.
  • Announcements: Newsletters, conferences, security updates, patches, and more.

Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.

Comments (none posted)

Handling brute force attacks in the kernel

By Jake Edge
March 17, 2021

A number of different attacks against Linux systems rely on brute-force techniques using the fork() system call, so a new Linux security module (LSM), called "Brute", has been created to detect and thwart such attacks. Repeated fork() calls can be used for various types of attacks, such as exploiting the Stack Clash vulnerability or Heartbleed-style flaws. Version 6 of the Brute patch set was recently posted and looks like it might be heading toward the mainline.

This patch set has been in the works since it was first posted as an RFC by John Wood in September 2020 (the resend from Kees Cook a few days later may make it easier to see the whole set). It was originally called "fork brute force attack mitigation" or "fbfam", but that name was deemed too cryptic by Jann Horn and Cook. In addition, it was suggested that turning it into an LSM would be desirable. Both of those suggestions were adopted in version 2, which was posted in October.

But the idea goes back a lot further than that. The grsecurity kernel patches have long had the GRKERNSEC_BRUTE feature to mitigate brute-force exploits of server programs that use fork() as well as exploits of setuid/setgid binaries. A patch from Richard Weinberger in 2014 used a similar technique to delay fork() calls if forked children die due to a fatal error (which may imply it is part of an attack). That effort was not pushed further, so Cook added an issue to the kernel self-protection project (KSPP) GitHub repository, which is where Wood picked up the idea.

In the documentation patch, Wood described the kinds of behaviors that are being targeted by the Brute LSM. The basic idea is that there are several types of attacks that can use fork() multiple times in order to receive a desired memory layout; each child forked can be probed in various ways, if those probes fail and cause the child to crash, another child can simply be forked to try again. Because the child created with fork() shares the same memory layout as the parent, successful probes can give information that can be used to defeat address-space layout randomization (ASLR), determine the value of stack canaries, or for other nefarious purposes.

Brute takes a different approach than either grsecurity or Weinberger's patch did, in that it does not simply delay subsequent fork() calls when a problem is detected. Instead, Brute kills all of the processes associated with the attack. In addition, Brute detects more types of fork()-using attacks, including those that probe the parent, rather than the child process. It also focuses on crashes in processes that have crossed a privilege boundary to try to reduce the number of false positives.

It does its detection by focusing on the rate of crashes, rather than just their occurrence. Brute collects information on the number of "faults" that occur in a group of processes that have been forked, but where nothing new has been executed with execve(). A brute_stats structure is shared between all of those processes; executing a new program results in a new structure to track faults in the new (potential) fork() hierarchy.

The period of time between a process being started and it or any of its children that share its memory layout (i.e. no execve()) have crashed, or between consecutive crashes, is ultimately what is being used to determine if an attack is taking place. But in order to not be too sensitive, the exponential moving average (EMA) of the period is calculated once five crashes have occurred. The EMA is used to determine if a "fast brute force" attack variant is occurring; if the EMA of the period between crashes drops below a threshold of 30 seconds, attack mitigation is triggered. For "slow brute force" variants, the absolute number of crashes in the hierarchy is compared against a threshold of 200. Some way to configure these values would seem like a desirable addition to Brute.

The crashes are detected using the task_fatal_signal() LSM hook that was added as the first patch in the set. It will be called whenever the kernel is delivering a fatal signal to a process. Brute also uses the existing task_alloc() hook to detect fork() calls, the bprm_committing_creds() hook to detect execve() calls, and the task_free() hook to clean everything up.

The security boundary checks are implemented by tracking changes to the various user and group IDs (real, effective, saved, and filesystem) that occur when executing new programs. There is no mention of Linux capabilities in the patches, but capability changes would also indicate that a privilege boundary is being crossed; perhaps that is something that will be added down the road. Beyond the ID changes, the use of networking is detected using the socket_sock_rcv_skb() LSM hook. The idea is to restrict the crash checking to those processes that are crossing privilege boundaries by either doing things like executing setuid/setgid programs or receiving data over the network. That is intended to reduce the number of false positives.

As can be seen in the changelog in the top-level patch, the last few versions (which are helpfully linked) have drawn minimal comments needing attention; this latest round has not drawn any at all at the time of this writing. It seems like a useful feature for some users without imparting any real burden on the rest of the kernel when it is not configured in; the new security hook that gets called in the case of a fatal signal being delivered is the only change in that case. LSMs are often looked upon as a place to put code that some folks want, but others don't want to pay a price for in their kernels—Brute seems to fit that model well.

Comments (12 posted)

Creating an SSH honeypot

March 11, 2021

This article was contributed by Marta Rybczyńska


FOSDEM

Many developers use SSH to access their systems, so it is not surprising that SSH servers are widely attacked. During the FOSDEM 2021 conference, Sanja Bonic and Janos Pasztor reported on their experiment using containers as a way to easily create SSH honeypots — fake servers that allow administrators to observe the actions of attackers without risking a production system. The conversational-style talk walked the audience through the process of setting up an SSH server to play the role of the honeypot, showed what SSH attacks look like, and gave a number of suggestions on how to improve the security of SSH servers.

A honeypot is a network-accessible server, typically more weakly protected than ordinary servers. System administrators deploy honeypots to attract attackers and record their actions, which allows the administrators to analyze those actions and improve the defenses of their production systems based on the information gained. Honeypots may reveal new ways for attackers to get in or confirm the most common ones. They exist in different flavors for different types of servers; Bonic and Pasztor concentrated on honeypots providing a publicly accessible SSH server. A number of elements are needed to build such honeypot: the SSH server itself, an environment the attackers will be allowed into (that is able to contain any damage), and a logging (audit) system that will record all of the information on the attacker's actions.

They started with the logging system, which has uses beyond honeypots. In large companies, audit trails are often recorded "in case some super-secret company stuff leaks". The solution Bonic and Pasztor chose for their honeypot was asciinema, a tool for recording and replaying console sessions. The asciinema log consists of JSON fragments, making it easy to parse. It starts with a header (with information like the format version and the terminal size); all subsequent lines are arrays with three items: a timestamp, the mode (input or output), and the content. Interested readers can see what can be done with the tool on the asciinema examples page. Bonic and Pasztor's original idea was to be provide a video-like replay of attacker's sessions.

The second element of the configuration is the SSH server. Pasztor explained that there are multiple projects working on fake SSH servers; they simulate an environment and give simulated results. The problem, from the point of view of a honeypot builder, is that the tool has to simulate a shell and a honeypot needs a directory structure (and content in its files, presumably). Providing all of the necessary files leads to something similar to assembling a virtual machine, Pasztor said, and that not an easy thing to do. He added that honeypots try to prevent the attacker from actually running programs on a machine, as that may cause security problems. If the reason to run the honeypot is just to see what commands the attacker is issuing, a fake server is enough. However, for an in-depth analysis, more will be needed.

They decided to use a standard SSH server (OpenSSH), but then redirect the sessions it creates to a separate, safe environment. Pasztor explained that their initial idea was to use Docker, which, while it is not as separated from the host system as a virtual machine would be, does still create a security boundary. Setting up a Docker container to be run when someone logs in with SSH requires a standard SSH installation and just one line added to sshd_config:

    ForceCommand /usr/bin/docker run -ti ubuntu

The path to the docker executable might need adjusting and ubuntu is the name of the container to use. When someone connects, they will "land in a container and can't do anything about it", Pasztor said.

However, adding asciinema (or probably any other logging tool) to the command leads to something like this:

    ForceCommand /usr/bin/asciinema rec /tmp/ssh.cast –c \
		 "/usr/bin/docker run -ti ubuntu"

This approach adds complexity and is prone to mistakes. The situation gets worse if the honeypot wants to simulate the execution of the command actually sent over the connection (which was overridden by ForceCommand). Without a lot of care, this setup is susceptible to command injection by way of the SSH_ORIGINAL_COMMAND environment variable, which contains the attacker's command (before it was replaced with ForceCommand). If this variable is not properly sanitized, it could allow the attacker to gain access to the host system, which is something "you absolutely do not want" in a honeypot.

To solve this problem, Pasztor developed ContainerSSH, which allows the SSH server to talk to a container (or any other backend) using an HTTP API. Their slides listed Docker, Podman, and Kubernetes as supported backends; all of them provide such an API. When a user connects to ContainerSSH, the server talks to the backend using its API and launches a new container to handle the connection. ContainerSSH can connect to a custom authentication server and dynamically change the configuration for each user. In the honeypot case, they used an option to send the audit log to asciinema and another one to make ContainerSSH accept all users logging in (without the usual requirement for a valid account and password).

They put the server into production and started getting various strange error messages rather than the console log they were hoping for. Pasztor was confused. It turns out most attackers are bots, he explained. They tend to send commands directly via SSH rather than starting a console session, but the original setup was meant for humans, and was designed to log a console session. To solve that problem, they switched to a different format for the audit log. The new format was binary and recorded everything (including passwords). This time the log contained the information the needed to see what was going on.

Some attackers just run a program and want the output of it; they want to get information about the system, the CPU type, for example. Half of the attackers just checked the password and did not do anything else. Others did things Bonic and Pasztor did not expect. For example, some attackers uploaded payloads using the SSH file transfer protocol (SFTP), uncompressed them, ran the resulting programs, and finally deleted their tracks.

Pasztor described another attack as "really interesting". Attackers were looking for GSM devices, or mobile phones directly connected to the system (and accessible via device files like /dev/ttyGSM*). For people who have not worked in traditional IT or in data centers, it might be hard to guess why someone would connect a phone to an SSH server. Pasztor explained that many system administrators use phones or GSM devices to send alert messages. Such a setup is useful if Internet access goes down; the mobile phone will probably still work and can send a message to the administrator. What the attackers probably want to do, instead, is to send spam messages from the monitoring mobile phone number.

SSH attacks used to be different, Pasztor said. Nowadays many programming languages offer SSH libraries or SFTP modules. That has made it easy for attacks to become more sophisticated. For example, attackers will establish one single connection, so connection-rate limits will not constrain them. In that one connection they open multiple channels to execute their payloads.

The talk moved on to recommendations for anyone running an SSH server. The first one is to change the port used by the server (the default is 22). Once this is changed, the system will see few attacks. Bonic remembers reading about port renumbering as a recommendation 10 or 15 years ago and she said it is still valid. Still, it will only protect against bots, not directed attacks.

Their second recommendation is to use keys for authentication and disable passwords "completely if you can". If it is necessary to keep passwords, a good practice is to avoid common combinations like user test and password test. They saw cases where someone used a default password and attackers were in within 20 minutes. Finally, SSH private keys should be protected by a password. If a targeted attack happens, it will often involve a developer's laptop, because developers typically have elevated privileges and direct access to production systems. If that laptop contains unprotected keys, attackers have an easy way into the server. During the question session, some audience members disagreed with the order of the suggestions, and recommend using keys instead of a passwords as the main protection measure.

Pasztor concluded by showing other uses of containers with SSH. ContainerSSH was originally built to solve a web-hosting problem when the owner of a site needs to move data between servers, but has different user names on each server. This can be done with SFTP but can be awkward. Containerized SSH allows the necessary servers to be wrapped with a simple environment and users do not need to deal with the underlying permission system at all.

Another use of ContainerSSH is in education. In traditional education systems, there will be many leftover files after a student completes a set of exercises. With containers, when the student logs out, the container is removed along with all of the leftover files. The use of containers allows easier teaching of Linux and it is now even possible to run a complete Kubernetes cluster in a container. The last use case is in high-security environments. ContainerSSH makes it easy to automatically upload session logs to an external object store, ensuring that the log never lives on the system that is being monitored and, thus, cannot be tampered with by an attacker.

The slides [PDF] and video from the talk are available.

Comments (20 posted)

Unprivileged chroot()

By Jonathan Corbet
March 15, 2021
It is probably fair to say that most Linux developers never end up using chroot() in an application. This system call puts the calling process into a new view of the filesystem, with the passed-in directory as the root directory. It can be used to isolate a process from the bulk of the filesystem, though its security benefits are somewhat limited. Calling chroot() is a privileged operation but, if Mickaël Salaün has his way with this patch set, that will not be true for much longer, in some situations at least.

Typically, chroot() is used for tasks like "jailing" a network daemon process; should that process be compromised, its ability to access the filesystem will be limited to the directory tree below the new root. The resulting security boundary is not the strongest — there are a number of ways to break out of chroot() jails — but it can still present a barrier to attackers. chroot() can also be used to create a different view of the file system to, for example, run containers within.

This system call is not available to just anybody; the CAP_SYS_CHROOT capability is required to be able to call chroot(). This restriction is in place to thwart attackers who would otherwise try to confuse (and exploit) setuid programs by running them inside a specially crafted filesystem tree. As a simple example, consider the sort of mayhem that might be possible if setuid programs saw a version of /etc/passwd or /etc/sudoers that was created by an attacker.

The limitations of chroot() have long limited its applicability; in recent years it has fallen even further out of favor. Mount namespaces are a much more flexible mechanism for creating new views of the filesystem; they can also be harder to break out of. So relatively few developers see a reason to use chroot() for anything new.

Thus, some folks were a bit surprised when Salaün showed up with his chroot() patch. Once applied, unprivileged processes are able to call chroot(), but only if a few conditions apply:

  • The process in question must have done a prctl() call with the PR_SET_NO_NEW_PRIVS option. That prevents the process from gaining any new privileges; running setuid and setgid programs will no longer gain the privileges of the owner of the executable file, for example. Since privileged programs no longer exist in that mode, their privileges cannot be exploited.
  • The process cannot be sharing its filesystem context (struct fs_struct, which contains the root and current working directories) with any other processes; otherwise the chroot() call would affect both processes, and the other one may not be expecting its filesystem environment to change abruptly.
  • The new root must be underneath the current root in the filesystem hierarchy. This prevents trickery that could otherwise facilitate escape from an existing jail or mount namespace.

If these conditions are met, it is argued, it is safe to allow a process to call chroot().

There is still the question of why one might want to do that. Among other things, a functioning chroot() environment normally needs to have a minimally populated /dev directory; creating device nodes remains a privileged operation. And, as noted above, Linux has had better options than chroot() for some time now. But Salaün says that there are use cases where a process might want to sandbox itself after the things it needs from the wider environment (libraries, for example) have been loaded, and device files can often be done without.

The initial reception for this patch has been a bit chilly at best. Eric Biederman worried about the security implications of unprivileged chroot() when mixed with other mechanisms:

Still allowing chroot after the sandbox has been built, a seccomp filter has been installed and no_new_privs has been enabled seems like it is asking for trouble and may weaken existing sandboxes.

Casey Schaufler argued that chroot() is obsolete and also worried about interactions: "We're still finding edge cases (e.g. ptrace) where no_new_privs is imperfect". He also pointed out that access to chroot() is already finely controlled with the CAP_SYS_CHROOT capability:

CAP_SYS_CHROOT is specific to chroot. It doesn't give you privilege beyond what you expect, unlike CAP_CHOWN or CAP_SYS_ADMIN. Making chroot unprivileged is silly when it's possibly the best example of how the capability mechanism is supposed to work.

Salaün has not answered all of these points, but seems undeterred; he posted a second version of the patch set after that discussion had occurred. Without a stronger answer, though, upstreaming this change is likely to be difficult. Security-oriented developers will need some convincing that chroot() merits any improvements at all; the bar for changes that raise worries about unexpected interactions with other security mechanisms will be higher.

The discussion is likely to come down to use cases in the end; is there truly a need for unprivileged chroot() in 2021? If there are users out there who could benefit from this feature, now would probably be a good time for them to come forward and explain why they need it. In the absence of that information, unprivileged chroot() seems likely to be one of those ideas that didn't quite make it.

Comments (40 posted)

Lockless patterns: an introduction to compare-and-swap

March 12, 2021

This article was contributed by Paolo Bonzini


Lockless patterns

In the first part of this series, I showed you the theory behind concurrent memory models and how that theory can be applied to simple loads and stores. However, loads and stores alone are not a practical tool for the building of higher-level synchronization primitives such as spinlocks, mutexes, and condition variables. Even though it is possible to synchronize two threads using the full memory-barrier pattern that was introduced last week (Dekker's algorithm), modern processors provide a way that is easier, more generic, and faster—yes, all three of them—the compare-and-swap operation.

From the point of view of a Linux kernel programmer, compare-and-swap has the following prototype:

    T cmpxchg(T *ptr, T old, T new);

where T can be either an integer type that is at most as wide as a pointer, or a pointer type. In order to support such polymorphism, cmpxchg() is defined as a macro rather than a function, but the macro is written carefully to avoid evaluating its arguments multiple times. Linux also has a cmpxchg64() macro that takes 64-bit integers as the arguments, but it may not be available on all 32-bit platforms.

cmpxchg() loads the value pointed to by *ptr and, if it is equal to old, it stores new in its place. Otherwise, no store happens. The value that was loaded is then returned, regardless of whether it matched old or not. The compare and the store are atomic: if the store is performed, you are guaranteed that no thread could sneak in and write a value other than old to *ptr. Because a single operation provides the old version of the value and stores a new one, compare-and-swap is said to be an atomic read-modify-write operation.

In Linux, the cmpxchg() macro puts strong ordering requirements on the surrounding code. A compare-and-swap operation comprises a load and a store; for the sake of this article, you can consider them to be, respectively, load-acquire and store-release operations. This means that cmpxchg() can synchronize with both load-acquire or store-release operations performed on the same location by other threads.

Lock-free stacks and queues

"Lockless algorithms for mere mortals" already mentioned the use of compare-and-swap for lock-free lists. Here, we'll look at how a lockless, singly linked list could be implemented in C, and what it could be useful for. First of all, however, let's recap how a single-threaded C program would add an item in front of a singly-linked list:

    struct thing {
        struct thing *next;
        ...
    };
    struct thing *first;

    node->next = first;
    first = node;

Armed with the knowledge from the first part of the series, we know that we should turn the assignment to first into a store-release, so that node->next is visible to other threads doing a load-acquire. This would be an instance of the pattern presented there.

However, that pattern only worked for a single producer and a single consumer; in the presence of multiple producers, the two instructions would have to be placed under a lock. This is because the value of first can change between the two instructions, for example if another element is added at the same time by another thread. If that happens, the outgoing pointer (node->next) in the new element will point to whatever first held before the assignment happened. This teaches us an important, if obvious, lesson: acquire and release semantics are just one part of designing and proving the correctness of lockless algorithms. Logic mistakes and race conditions can and will still happen.

Instead of using a lock, cmpxchg() lets us catch the other thread in the act of modifying first. Something like this would work for any number of producers:

    if (cmpxchg(&first, node->next, node) == node->next)
        /* yay! */
    else
        /* now what? */

There are still a few things to sort out, as you can see. First and foremost, what to do if the cmpxchg() notices that first has changed. The answer in that case is simply to read the new value of first to node->next and try again. This is possible, because node is still invisible to other threads. Nobody will notice our stroke of bad luck.

A second and more subtle question is: how do we load first? The load need not have either acquire or release semantics, because the code is not doing other memory accesses that depend on the value of first. On the other hand, perhaps the big bad optimizing compiler might think that first cannot change across iterations of the loop? Even though Linux's cmpxchg() does prevent this kind of compiler optimization, it is a good practice to mark relaxed loads and stores of shared memory using READ_ONCE() and WRITE_ONCE().

Putting everything together, we get:

    struct thing *old, *expected;
    old = READ_ONCE(first);
    do {
        node->next = expected = old;
        old = cmpxchg(&first, expected, node);
    } while (old != expected);

This is all nice, but it's only half of the story. We still have not seen how the list can be read on the consumer side. The answer is that it depends on the relationship between producers and consumers, the number of consumers, and whether the consumers are interested in accessing elements in LIFO (last-in-first-out) or FIFO (first-in-first-out) order.

First of all, it could be that all reads happen after the producers have finished running. In this case, the synchronization between producers and consumers happens outside the code that manipulates the list, and the consumers can access the list through normal, non-atomic loads. The synchronization mechanism could be a thread-exit/thread-join pair such as the one we saw in the first article, for example.

If reads are rare or can be batched, a more tricky implementation could allow producers to proceed locklessly, while reads would be serialized. Such an implementation could use a reader-writer lock (rwlock); however, the producers would take the lock for shared access (with a read_lock() call) and the consumer(s) would take the lock for exclusive access (with write_lock())! This would also avoid reads executing concurrently with writes and, therefore, the consumer would be able to employ non-atomic loads. Hopefully, this example will show that there's no such thing as too many comments or too much documentation, even if you're sticking to the most common lockless programming patterns.

If many consumers run concurrently with the producers, but they can consume the elements in any order, the consumers can obtain a whole batch of elements (removing them from the list) with a single instruction:

    my_things = xchg_acquire(&first, NULL);

xchg(), like cmpxchg(), performs an atomic combination of a read and a write to a memory location. In this case it returns the previous head of the list and writes NULL in its place, thus emptying the list. Here I am using the xchg_acquire() variant, which has acquire semantics for its load of first, but (just like WRITE_ONCE()) does not apply release semantics when it stores NULL. Acquire semantics suffice here, since this is still basically the same store-release/load-acquire pattern from part 1. More precisely, it is a multi-producer, multi-consumer extension of that pattern.

Should we do the same on the writer side and replace cmpxchg() with cmpxchg_release()? Indeed we could: in principle, all that the writer needs is to publish the store of node->next to the outside world. However, cmpxchg()'s acquire semantics when loading the list head have a useful side effect: they synchronize each writer with the thread that wrote the previous element. In the following picture, the load-acquire and store-release operations are all part of a successful series of cmpxchg() calls:

    thread 1: load-acquire first (returns NULL)
              store-release node1 into first
                  \
      thread 2: load-acquire first (returns node1)
                store-release node2 into first
                    \
         thread 3: load-acquire first (returns node2)
                   store-release node3 into first
                       \
            thread 4: xchg-acquire first (returns node3)

Thread 3's cmpxchg() is the only one to synchronize with thread 4's xchg_acquire(). However, because of transitivity, all cmpxchg()s happen before the xchg_acquire(). Therefore, if cmpxchg() is used in the writers, the readers can go through the list with regular loads.

If, instead, the writers used cmpxchg_release(), the happens-before relation would look like this:

    thread 1: load-acquire first (returns NULL)
              store-release node1 into first

      thread 2: load first (returns node1)
                store-release node2 into first

         thread 3: load first (returns node2)
                   store-release node3 into first
                       \
            thread 4: xchg-acquire first (returns node3)

Thread 4 would always read node2 from node3->next, because it read the value that thread 3 wrote to first. However, there would be no happens before edge from thread 1 and thread 2 to thread 4; therefore, thread 4 would need a smp_load_acquire() in order to see node1 in node2->next.

The above data structure is already implemented in Linux's linux/llist.h header. You're highly encouraged not to reinvent the wheel and use that version, of course. That API, in fact, includes two more interesting functions: llist_del_first() and llist_reverse_order().

llist_del_first() returns the first element of the llist and advances the head pointer to the second element. Its documentation warns that it should only be used if there is a single reader. If, instead, there were two consumers, an intricate sequence of adds and deletes could lead to the so-called ABA problem. Since this article rests firmly on the principle of "if it hurts, don't do it", a detailed explanation is beyond its scope. However, it's worth pointing out the similarity with the earlier rwlock example. Just as in that case, multiple consumers will have to use locking to serialize concurrent access to the data structures. llist_del_first(), instead, lets writers call llist_add() without taking a lock at all; readers instead can use a spinlock or a mutex.

llist_del_first() provides LIFO semantics for the llist. If your application requires FIFO order, however, there is a useful trick that you can apply, and that's where llist_reverse_order() comes into play. Removing a batch of items with xchg() (as is done with llist_del_all()) does provide the batches in FIFO order, only the items in each batch are ordered back to front. The following algorithm then comes to mind:

    struct thing *first, *current_batch;

    if (current_batch == NULL) {
        current_batch = xchg_acquire(&first, NULL);
        ... reverse the order of the nodes in current_batch ...
    }
    node = current_batch;
    current_batch = current_batch->next;

Every execution of the previous pseudocode will return an element of the linked list in FIFO order. This is also a single-consumer data structure, as it assumes that only a single thread accesses current_batch at any given time. It is left as an exercise for the reader to convert the pseudocode to the llist API.

That is all for this installment. The next article in this series will continue exploring read-modify-write operations, how to build them from compare-and-swap, and how they can be put into use to speed up reference-counting operations.

Comments (26 posted)

Software platforms for open-source projects and foundations

March 17, 2021

This article was contributed by Martin Michlmayr

Open-source projects have many non-technical needs as they grow. But, running a FOSS non-profit organization for supporting these projects is a lot of work, as anyone involved in such an organization will attest. These days, some software platforms, such as LFX from the Linux Foundation and Open Collective, are in development to provide important services, such as crowdfunding, to projects and other organizations. These platforms have the potential to improve both the quality and range of services available to projects.

Paperwork is taxing

Operational issues with project-backing FOSS foundations are not unheard of. The X.Org Foundation, for example, briefly lost its charity status in 2013 due to paperwork that was not filed. In 2016, the organization joined Software in the Public Interest (SPI), in part due to the paperwork headaches; in 2017, it dissolved its legal entity. When X.Org was considering joining SPI, LWN observed that organizations which enjoy tax-exempt status and are eligible to receive tax-deductible donations in the US need to "adhere to some strict paperwork and filing requirements at the IRS [Internal Revenue Service]". Those requirements turned out to be "a bit of a burden over the course of the past few years" for X.Org.

X.Org is not the only organization that has struggled with paperwork. The Gentoo Foundation, which lost its charter briefly in 2007, is currently mulling its future: should it continue to exist as a legal entity, join an umbrella organization (such as the Software Freedom Conservancy or SPI), use a platform like Open Collective, or simply dissolve?

One interesting thing about the X.Org Foundation change is that it kept its governance structure, including its board, intact when it joined SPI. The Open Bioinformatics Foundation, which is also part of SPI, similarly operates as a virtual foundation. Essentially, they are operating as a foundation within a foundation. This is possible because SPI's relationship with its associated projects is fairly loose; there are few restrictions imposed on the governance structure. Increasingly, projects and whole organizations join umbrella organizations in order to benefit from services without taking on too much of an administrative burden.

The Linux Foundation is probably best known for this "foundation as a service" model. The organization hosts a growing number of entities, such as Let's Encrypt, the Cloud Native Computing Foundation (CNCF), and the OpenJS Foundation. The Linux Foundation observed that overhead "goes up exponentially" and that developers sometimes have to deal with issues they never anticipated, such as setting up legal entities and bank accounts, filling out paperwork to obtain sponsorships, and creating a financial reporting process. The organization therefore offers a portfolio of support programs to its communities. The service portfolio is quite wide-ranging, spanning areas such as project operations, training, certification, and event management. Many projects and organizations find the range of services quite attractive. For example, FINOS, an organization that promotes open source for financial services, joined the Linux Foundation in 2020 in part to benefit from the "support program offerings including but not limited to training, certification and events management".

We live in a time where virtual machines can be deployed with the click of a button. Stripe, a financial technology (fintech) company, provides the Atlas platform to seamlessly form a company. The platform promises to remove "lengthy paperwork, legal complexity, and numerous fees". Would better tooling allow organizations to meet all the administrative needs of open-source projects with the click of a few buttons?

LFX

The Linux Foundation launched CommunityBridge in 2019, as a platform of tools to serve open-source developers, including for fundraising and security. The platform was rebranded as LFX in November 2020, and its scope was expanded to cover more areas. Tools that are available today include ones for crowdfunding, mentorship, community events, contributor license agreements, and more.

When CommunityBridge was originally launched, it was criticized by the Software Freedom Conservancy as a "proprietary software system". Heeding this criticism, the LFX announcement mentions plans to "release much of this code as open source in the near future". However, it's not clear why this initiative didn't start as open source from its inception, especially given Linux Foundation's focus on open collaboration through open source. To be fair, though, some LFX tools incorporate third-party applications, which may make it harder to publish the code.

The Linux Foundation is also sometimes criticized for its focus on larger projects that can attract corporate backers and funding. The LFX Platform Use Agreement reflects that focus by differentiating between projects managed by the Linux Foundation ("TLF Project") and others ("Community Project"). While the majority of the LFX tools are currently only available to Linux Foundation projects, several tools (including crowdfunding and mentorship) are available to any open-source project. It's not clear from the web site which additional tools will be available to community projects in the future, but it seems possible that the scope will be expanded.

Linux Foundation projects can benefit from Insights, a data-driven tool to measure the health and sustainability of projects. Insights shows metrics about both the source code and the community, such as top contributors; the tool can be used to identify areas where the project should focus its resources. Member Enrollment assists with the onboarding of new members, such as by offering role-based email subscriptions. Community Events enables virtual-event hosting and the management of attendee lists; other tools are available or under development.

Community projects can raise funds through LFX Crowdfunding by connecting the tool with their GitHub project repositories. The platform also allows crowdfunding for events, open-source initiatives, and travel funds. The Linux Foundation covers all of the fees for the first ten-million dollars raised through the platform; after that, a 5% platform fee plus a payment processor fee will be charged. There are no fees for the other tools on the LFX platform.

Currently, the majority of projects on LFX are affiliated with the Linux Foundation. However, more community projects may evaluate the platform for their needs in the future, especially if additional tools become available to community projects. It will be interesting to watch the progress and adoption of LFX. If the tools mature and eventually released as open source, it's possible that other umbrella organizations could integrate them into their infrastructure.

Open Collective: transparent crowdfunding

Another initiative that has developed an interesting platform is Open Collective. It allows projects to accept donations and sponsorship, pay expenses, and to keep donors informed. Open Collective puts a strong emphasis on openness and the company itself operates in an open manner. It publishes metrics and other documents on Google Drive, and runs its finances through its own platform. The source code is open source. The requirements for openness expand to those using the platform: donations and expenses are visible in public (although donors can stay anonymous).

Open Collective itself is a software platform and it's made available to projects through a number of fiscal hosts. Open Source Collective brings the platform to open-source projects and currently serves around 2,500 projects ("collectives"), including Qubes OS and F-Droid. The platform charges 10% of incoming funds, plus payment processor fees. Projects using the Open Collective platform can also receive funding through GitHub Sponsors, which is an increasingly popular way to support open-source projects.

Of course, fundraising and paying expenses is only a small part of what FOSS foundations typically offer. Open Source Collective can also manage trademarks with the help of trademark expert Pamela Chestek, but other services are not currently offered. Providing a neutral organization to hold domain-name ownership would be an interesting addition. Nevertheless, funding and trademarks cover the basic needs of many projects, thereby removing the need to start their own organization and deal with the associated paperwork. Open Source Collective, as the legal entity, will take care of that.

In addition to Open Source Collective, which is open to any open-source project, there are other FOSS foundations that use Open Collective. The WordPress Foundation and the .NET Foundation use the crowdfunding platform for their organizations and member projects. Open Collective is a great example of how tooling can solve a problem that many organizations have.

Summary

FOSS foundations offer important services to open-source projects, but operating those organizations can be burdensome. Tooling has the potential to ease that burden, and to expand the services an organization can offer as well as the number of projects that can be served. These platforms also reduce the need for projects to start their own organizations, which is helpful in avoiding the surprise that comes from the unexpected work that entails.

Comments (none posted)

Page editor: Jonathan Corbet
Next page: Brief items>>


Copyright © 2021, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds