Leading items

Welcome to the LWN.net Weekly Edition for July 15, 2021

This edition contains the following feature content:

Planning the CentOS 8 endgame: what, if anything, should the CentOS project do about the remaining CentOS 8 systems when support ends?
Copyleft-next and the kernel: an attempt to document another license for kernel code runs into resistance.
The conclusion of the 5.14 merge window: the rest of what was merged for the 5.14 kernel.
Another misstep for Audacity: a new privacy policy further upsets Audacity users.
Syncing all the things: a review of the Syncthing synchronization system.

This week's edition also includes these inner pages:

Brief items: Brief news items from throughout the community.
Announcements: Newsletters, conferences, security updates, patches, and more.

Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.

Comments (none posted)

Planning the CentOS 8 endgame

By Jake Edge
July 14, 2021

CentOS 8 is reaching its end of life (EOL) at the end of 2021, though it was originally slated to be supported until 2029. That change was announced last December, but it may still come as a surprise to some, perhaps many, of the users of the distribution. While the systems running CentOS 8 will continue to do so, early next year they will stop getting security (and other) updates. The CentOS project sees CentOS Stream as a viable alternative, but users may not agree—should the project simply leave CentOS 8 systems as ticking time bombs in 2022 and beyond?

A discussion of the CentOS 8 EOL was kicked off by Rich Bowen in a post to the CentOS-devel mailing list. He noted that there will be more questions about the EOL process as that date approaches, so he wants "to make sure we have clear documentation, prominently displayed, that sets expectations". He outlined the process of archiving CentOS 8 to vault.centos.org and wondered if there were changes that should be made because this particular EOL event is rather different.

Alex Iribarren was concerned about "pulling the plug" right on December 31 and suggested that "an extra month or so would be nice, particularly given the holiday period". CentOS release manager Johnny Hughes said that the holiday will delay the switch a bit because people will not be working on those days, but that security updates will not happen past that point in any case. While the dates are still fluid, he described the plan in some detail:

[...] Our goal is, so long as RHEL 8.5 releases before 31 DEC 2021, we will get the files from 8.5 released before we remove CentOS Linux 8 from the mirrors. We will not be adding any updates from RHEL source code released after 31 DEC 2021 to CentOS Linux 8.
This release will be put into at least vault.centos.org/8.5.xxxx/ (where xxxx is the date). Of course, if the RHEL 8.5 release happens after 01 Jan 2022, we would not be doing that release in CentOS Linux 8.
New items (to CentOS Stream 8) will be being built and still going into CentOS Stream 8 after 01 JAN 2022 until CentOS Stream 8 EOLs 5 years after the RHEL 8 release (EOL is 31 May 2024).

But Carl George wondered if a more radical plan was in order. He believes that Stream is effectively the continuation of CentOS 8, so maybe "we should have mirrorlist.centos.org respond to requests for 8 repos with 8-stream repos, which effectively converts any remaining CentOS Linux 8 systems to CentOS Stream 8". That could either be done at EOL time or, perhaps, one to three months later. The third alternative he presented is the status quo—no more security updates—but he is worried that there are quite a number of people who are unaware of the EOL.

Multiple commenters in the thread seemed to agree that switching to CentOS Stream 8 is the right path forward, at least on a technical level, but there are some obvious concerns with that approach. Bowen put it this way:

While I agree with you, that it's an upgrade, I anticipate that moving people from CentOS Linux 8 to CentOS Stream 8 automatically will result in a lot of backlash from people who continue to believe that Stream is a alpha/beta/testing/buggy/unstable/[pick your favorite complaint] distribution. Y'know, once they noticed that it had happened.
Surprising users seldom goes well, even if it's an overall positive surprise.

Stephen John Smoogen thinks that auto-switching systems to CentOS Stream 8 "ends up with lawsuits and very very angry people". There may well be CentOS 8 systems controlling safety-critical infrastructure, even though that may not be a particularly smart choice for those types of systems. Given that, he does not see either of George's switching options working out "no matter how well it is messaged or done". Looking at the statistics shows lots of new CentOS systems since the announcement of the EOL change; the alternatives (e.g. AlmaLinux, Rocky Linux) are not even close to catching up, but:

While many of the current 450k CentOS 8 systems may function well on CentOS Stream, we don't know which ones are just web servers in some advertising farm and which ones are controlling the flow rates on a dam or petroleum flows at a refinery in Texas.

Fabian Arrotin split CentOS users into two categories; the first is paying attention to the announcement and making plans, while the second is not. For the former, neither of George's auto-switch options would affect them; they have presumably already arranged a switch: to Stream, one of the alternative, compatible distributions mentioned above, or to some other distribution entirely. But the latter group, whose systems will have more and more known vulnerabilities over time, should just be forced into CentOS Stream 8, he said. There are, after all, users still running even older EOL versions of CentOS:

I admit that if you deploy CentOS, a good sysadmin would be up2date with what happens in the distro land *but* also by looking at the number of mirrorlist requests even for CentOS 5, I can tell you for sure that some don't ..... (ouch).

Julien Pivotto said that there are two good reasons to auto-switch users to CentOS Stream 8:

Providing stream as a continuity of 8 is the best thing to do, to show that we are confident and that the whole "stream fits most of the CentOS use case". It also has the side effect of better protecting the internet.
[...] I think that it will not backfire if we announce it soon.

Safety of the internet was also on Leif Madsen's mind. He suggested adding a "security updates only" mode that CentOS 8 systems would switch to at EOL. It is "what a good netizen would do", rather than "leave 450k+ systems idling on the internet just waiting to be scooped up into a bot net".

But there are practical considerations with switching a bunch of systems to CentOS Stream 8 at once, as Phil Perry pointed out. "Whilst I completely understand the desire to take this approach", users voluntarily switching are already creating an extra support load, in part because they are finding that kernel module packages created for the CentOS 8 kernel no longer work on CentOS Stream 8. "If you switch all C8 users en mass at EOL, you run the risk of creating a potentially huge support burden that we are simply not in a position to manage."

Josh Boyer agreed:

There are too many situations like the kernel, or internal policy compliance, or other reasons make automated migration problematic. We should encourage and advocate for people to switch to CentOS Stream, and focus on making tools and documentation for that easy to use and easily accessible, but we should not be doing that kind of migration unilaterally.

It may in fact be better for some users to move on from CentOS if they are not interested in what CentOS Stream is offering, Smoogen said. CentOS was a drop-in replacement for Red Hat Enterprise Linux (RHEL), but CentOS Stream is not that:

CentOS Stream is about building a co-operative relationship between the consumers and the distribution where what the distribution builds is evaluated and feedback is given. If a consumer is expecting never to have problems, to never have to file/track a bugzilla or ever even check to see what is being delivered.. then CentOS Stream is not the distribution for them. Because even if it doesn't look like it, there is a very very large gap between 'never' and 'once in a while' that might happen with Stream. There is a need for co-operation between the consumer of stream and the makers to get things right. If you do not have time, energy, or want to do that, then Stream is going to be a constant thorn. The consumer in that case would be better off with another rebuild.

But George disagreed strongly with that characterization of CentOS Stream; multiple people have told him that they switched and never noticed any difference. While it is recommended that users participate in the process, he said, it is not required by any means:

CS8 hasn't been a constant thorn for them. I'm not claiming it's been perfect, there have certainly been regressions, but they are fixed faster than they ever were in CL8 (or previous major versions) and I strongly feel that most users will be best served by getting switched to CS8 at or just after the CL8 EOL.

But Leon Fauster said that when he tried switching some systems over to CentOS Stream 8, he encountered problems. Applications from third-party repositories (e.g. RPM Fusion) stopped working because their dependencies were not available. Installations that are more complex run the risk of failing to switch to CentOS Stream successfully. For his purposes, complex simply means that the system "uses more [than] just CentOS artifacts (ISV/proprietary software, 3rd/custom repos, etc.)".

Whatever the technical (and internet-safety) merits of forcibly switching systems to CentOS Stream 8, it is a little hard to see a company like Red Hat taking that risk. As unfortunate as it may be for some inattentive users (and, perhaps, the rest of us on the internet at large), it is much safer to simply point to the EOL announcement as a defense against any claims of fault for insecure CentOS 8 systems in 2022—and beyond. It is worrisome that these systems will be out there spamming (and worse), but it does not seem any more so than systems still looking for updates to CentOS 5, which was released in 2007 and stopped getting updates in 2017.

Comments (44 posted)

Copyleft-next and the kernel

By Jake Edge
July 13, 2021

The Linux kernel is, as a whole, licensed under the GPLv2, but various parts and pieces are licensed under other compatible licenses and/or dual-licensed. That picture was much murkier only a few years back, before the SPDX in the kernel project cleaned up the licensing information in most of the kernel source by specifying the licenses, by name rather than boilerplate text, directly in the files. A recent move to add yet another license into the mix is encountering some headwinds, but the license in question was already being used in a few kernel files, and has been for four years at this point.

SPDX is more formally known as the Software Package Data Exchange; it is a Linux Foundation project that has created an "open standard for communicating software bill of material information, including provenance, license, security, and other related information". In the kernel, SPDX identifiers are used to identify the license as a comment at the top of a source file; for example:

    // SPDX-License-Identifier: GPL-2.0-only

For tooling reasons, SPDX headers in .c files use the "//" form of comments, while .h files use the more traditional "/* ... */" form; both use license identifiers that refer to licenses that are stored in the LICENSES directory of the kernel source tree.

On July 7, Luis Chamberlain posted a patch set that added the copyleft-next 0.3.1 license to kernel tree and cleaned up four uses of that license in the tree. The copyleft-next project goes back a ways; it started in 2012 as a GPLv3 fork called GPL.next, but soon took on a more neutral name. It is an attempt to create a strong copyleft license, in the mold of the GPL, but in simpler language that is easier to understand than GPLv3. It is explicitly written to be compatible with the GPL, so one could imagine kernel contributions that were solely licensed under copyleft-next. But, at least so far, all of the contributions using copyleft-next are dual-licensed as GPLv2 and higher (GPLv2+) as well.

Backstory

Chamberlain's patch set did not come out of the blue. In an earlier patch set, he proposed a kernel self-test for sysfs. As with other tests he has written, this test was dual-licensed under GPLv2+ and copyleft-next (0.3.1). But Greg Kroah-Hartman said that the GPLv2+ boilerplate in the code was not needed, "only the spdx line is needed". He also asked that the copyleft-next license be removed: "Please no, this is a totally different license :(".

But, as Chamberlain pointed out, the use of copyleft-next in the kernel had been discussed back in 2016; Linus Torvalds had no objection to its use and the comment text was worked out with Alan Cox and Ted Ts'o at the time. In 2017, test_sysctl was merged using that text to indicate the dual license covering the code. During the discussion, Kroah-Hartman acked a patch that added copyleft-next as an option in the kernel tree.

The copyleft-next license is not listed in the kernel's LICENSE directory, however, so the SPDX lines in Chamberlain's test drivers only refer to GPLv2. That is not correct, as Kroah-Hartman noted, but he also had a more fundamental objection:

And given that this is directly interacting with sysfs, which is GPLv2-only, trying to claim a different license on the code that tests it is going to be a total mess for any lawyer who wants to look into this. Just keep it simple please.

Chamberlain, however, sees things differently with regard to the license compatibility:

The [fault] injection code I added follows the exact license for sysfs. The only interaction with the test_sysfs and sysfs is an exported symbol for a completion structure. The other dual gpl OR copyleft-next test drivers already present in the kernel also use exported symbols too, so I see nothing new here.

Adding copyleft-next

There was a problem with copyleft-next not being in the kernel's license list, though, and thus the SPDX lines not truly reflecting the license status of the four files that had already been added (lib/test_kmod.c, lib/test_sysctl.c, and the corresponding shell scripts in tools/testing/selftests). For the C files, Chamberlain's patch removes the boilerplate and updates the SPDX line as follows:

    // SPDX-License-Identifier: GPL-2.0-or-later OR copyleft-next-0.3.1

The shell scripts have the equivalent change but, naturally, use "#" for the SPDX comment.

Christoph Hellwig replied to the cover letter of the patch set, asking about the need for a "random weirdo license" to be added to the kernel tree. Chamberlain noted that the license is already being used on kernel code; in the patch adding the license text, he also gave a list of a dozen copyleft-next benefits that he sees:

It is much shorter and simpler
It has an explicit patent license grant, unlike GPLv2
[...]
There is a built-in inbound=outbound policy for upstream contributions (cf. Apache License 2.0 section 5)
There are disincentives to engage in the controversial practice of copyleft/ proprietary dual-licensing
In 15 years copyleft expires, which can be advantageous for legacy code
There are explicit disincentives to bringing patent infringement claims accusing the licensed work of infringement (see 10b)
There is a cure period for licensees who are not compliant with the license (there is no cure opportunity in GPLv2)
copyleft-next has a 'built-in or-later' provision

But Kroah-Hartman is concerned about adding more licenses to the kernel; instead "we should be trimming them down to be less as it makes things simpler and more obvious". He noted that Chamberlain could switch the licenses of the four files, thus avoid needing to add copyleft-next. He also reiterated his arguments about the dual-licensing for test_sysfs, but said that he is sympathetic to proponents of copyleft-next:

[...] I do not want to see your "test_sysfs.c" module as a dual-licensed file, as that makes no sense whatsoever. It is directly testing GPL-v2-only code, so the attempt to dual license it makes no sense to me. How could anyone take that code and do anything with it under the copyleft-next license only? And where would that happen?
I understand the appeal of copyleft-next in that it resolves many of the "grey" areas around gplv2, but given that no one is rushing to advise us to relicense all of the kernel with this thing, there is no need to encourage the spread of it given the added complexity and confusion that adding another license to our mix can only cause.

The main organizer behind the copyleft-next project is Richard Fontana, but Bradley M. Kuhn worked on it as well, which he was quick to point out in a disclaimer on his response to Kroah-Hartman. Kuhn noted that there is already a bunch of code in the kernel that is dual-licensed, many with either the two- or three-clause versions of the BSD license, which is evidently not a problem for kernel developers: "There is no cogent argument that I can see that says '(GPLv2-only|{2,3}-Clause-BSD) is so special that it should be grandfathered in over other forms of dual licensing'." Beyond that, though, since no one has done so, Kuhn wanted to "be the first to advise" the kernel community to switch the kernel license to copyleft-next, though he recognized the impossibility of that task.

Tim Bird pointed out that the dual-licensing with BSD has resulted in "the interchange of a lot of code between the BSD Unixes and Linux, that otherwise would not have happened". It is very much in keeping with Torvalds's "tit-for-tat compact" to allow code improvements to flow both ways, he said. Kuhn agreed with Bird and hopes to see the same happen with projects that are released under copyleft-next, though there are far fewer of those.

In the final analysis, as long as the other license is compatible with GPLv2, which copyleft-next is (so are BSD and others, of course), then it is up to the contributor to decide on the license(s), as Joe Perches said. The situation is analogous to the addition of the CC-BY-4.0 license to the kernel back in December; that was done because a documentation contributor wanted to dual-license their text. The contributor in this case, Chamberlain, feels strongly that copyleft-next is the right license for his code. He understands that there are other considerations for a large project like Linux, so he is taking a slow approach while trying to be conscious of the needs of others and the project as a whole. "My personal development goal is I will embrace copyleft-next for anything new I write, and only use GPLv2 or another license when I am required to do so."

Of the benefits that he listed, the explicit patent grant is the most important to Chamberlain. He is concerned about a future without such a grant:

The license is one of the only few licenses (if not only?) which is GPLv2 compatible and also has an clear patent grant. I have reasons to believe, we as a community face serious challenges if we don't grow our collection of code with explicit patent grants. And so any new project I create will have such licenses. It is simply my preference, and if I can contribute code to Linux in a "safe place" to slowly build traction of it, then fantastic.

Given that the license has been present in the kernel since 2017, and that it did not come in under cover of darkness, the changes Chamberlain has proposed seem like they should be relatively uncontroversial. There are certainly valid concerns about license proliferation, both within the kernel and without, but the main issue for the kernel community would seem to be satisfied by GPLv2 compatibility. It is possible that other compatible licenses will also need to be added to the LICENSES directory from time to time, but that seems a fairly small price to pay for useful contributions.

Comments (13 posted)

The conclusion of the 5.14 merge window

By Jonathan Corbet
July 12, 2021

The 5.14 merge window closed with the 5.14-rc1 release on July 11. By that time, some 12,981 non-merge changesets had been pulled into the mainline repository; nearly 8,000 of those arrived after the first LWN 5.14 merge-window summary was written. This merge window has thus seen fewer commits than its predecessor, which saw 14,231 changesets before the 5.13-rc1 release. That said, there is still a lot of interesting work that has found its way into the kernel this time around.

Some of the more significant changes pulled in the second half of the 5.14 merge window include:

Architecture-specific

The s390 architecture now supports booting kernels compressed with the Zstandard (zstd) algorithm.
The RISC-V architecture has gained support for transparent huge pages and support for the KFENCE memory-safety checker.

Core kernel

The control-group kill button patch set has been merged; this feature allows the quick killing of all members of a control group by writing to the cgroup.kill virtual control file.
There are two new options for the madvise() system call:
- MADV_POPULATE_READ will fault in all pages within the indicated mapping for read access; the effect is the same as if the caller had manually looped through the range, accessing each page. No COW mappings will be broken by this operation.
- MADV_POPULATE_WRITE, instead, will fault in the pages for write access, breaking COW mappings if need be.
The purpose of these operations, in either case, is to pay the cost of faulting in a range of memory immediately, allowing the application to run without page-fault-induced delays later on. They differ from the MAP_POPULATE option to mmap() in that they can be invoked at any time rather than just when the memory is mapped. See this commit for more information.
The memfd_secret() system call has been merged. It creates a region of memory that is private to the caller; even the kernel cannot directly access it. See this commit for a bit more information.

Filesystems and block I/O

The ext4 filesystem has gained a new ioctl() command called EXT4_IOC_CHECKPOINT. This command forces all pending transactions out of the journal, and can also overwrite the space on the storage device used by the journal. This operation is part of an effort to prevent information leaks from filesystems. This documentation commit describes the new operation and its options.
The quotactl_fd() system call has been added. This is the new form of quotactl_path() that was briefly added to 5.13 before being disabled as the result of API concerns.
The F2FS filesystem can now compress files that are mapped with mmap(). There is also a new nocompress_extension mount option that disables compression for any file whose name matches the given extension(s).

Hardware support

Clock: Qualcomm MDM9607 global clock controllers, Qualcomm SM6125 global clock controllers, Qualcomm SM8250 camera clock controllers, Renesas RZ/G2L family clock controllers, TI LMK04832 JESD204B-clock jitter cleaners, Ingenic JZ4760 clock controllers, and Huawei Hi3559A clocks.
Graphics: ITE IT66121 HDMI bridges, ChromeOS EC ANX7688 bridges, Hyper-V synthetic video devices, and TI SN65DSI83 and SN65DSI84 DSI to LVDS bridges. There is also a new "simpledrm" driver that provides a direct-rendering interface for simple framebuffer devices; there are also the inevitable 200,000+ lines of new amdgpu register definitions.
Industrial I/O: TI TMP117 digital temperature sensors, TI TSC2046 analog-to-digital converters, TAOS TSL2591 ambient light sensors, Murata SCA3300 3-axis accelerometers, Sensirion SPS30 particulate matter sensors, STMicroelectronics LSM9DS0 inertial sensors, NXP FXLS8962AF/FXLS8964AF accelerometers, and Intel quadrature encoders.
Miscellaneous: Microchip 48L640 EERAM chips, PrimeCell SMC PL351 and PL353 NAND controllers, SparkFun Qwiic joysticks, Richtek RT4831 backlight power controllers, Qualcomm PM8008 power-management ics, Xillybus generic FPGA interfaces for USB, Qualcomm SC7280 interconnects, generic CAN transceivers, Rockchip Innosilicon MIPI CSI PHYs, Allwinner SUN6I hardware spinlocks, and MStar MSC313e watchdogs.
Pin control: Mediatek MT8365 pin controllers, Qualcomm SM6125 pin controllers, and IDT 79RC3243X GPIO controllers.
Sound: NXP/Goodix TFA989X (TFA1) amplifiers, Rockchip RK817 audio codecs, and Qualcomm WCD9380/WCD9385 codecs.
Removals: the "raw" driver, which provided unbuffered access to block devices under /dev/raw, has been removed. Applications needing this sort of access have long since moved to O_DIRECT, or at least that's the belief.

Virtualization and containers

User-mode Linux now supports PCI drivers with a new PCI-over-virtio driver.

Testing and tracing

The kunit self-test subsystem now supports running tests under QEMU; see this documentation commit for details.
There are two new tracing mechanisms in 5.14. The "osnoise" tracer tracks application delays caused by kernel activity — interrupt handling and such. The "timerlat" tracer gives detailed information about delays in timer-based wakeups. The osnoise and timerlat commits have more details and instructions on how to use these features.

The 5.14 kernel is now in the stabilization phase. Unless something highly unusual happens, the final 5.14 release will happen on August 29 or September 5. There is a lot of testing and bug-fixing to be done in the meantime.

Comments (10 posted)

Another misstep for Audacity

By Jonathan Corbet
July 8, 2021

While it has often been said that there is no such thing as bad publicity, the new owners of the Audacity audio-editor project may beg to differ. The project has only recently weathered the controversies around its acquisition by the Muse Group, proposed telemetry features, and imposition of a new license agreement on its contributors. Now, the posting of a new privacy policy has set off a new round of criticism, with some accusing the project of planning to ship spyware. The situation with Audacity is not remotely as bad as it has been portrayed, but it is a lesson on what can happen when a project loses the trust of its user community.

On July 2, the Audacity web site acquired a new "desktop privacy notice" describing the privacy policies for the desktop application. Alert readers immediately noticed some things they didn't like there; in particular, many eyebrows were raised at the statement that the company would collect "data necessary for law enforcement, litigation and authorities’ requests (if any)" as part of the "legitimate interest of WSM Group to defend its legal rights and interests". What data might be deemed necessary was not defined. The fact that WSM Group, the listed data controller, is based in Russia did not help the situation. And a statement that anybody under the age of 13 should not use Audacity at all was seen as a violation of the GPL by some.

A full-scale Internet red alert followed, with headlines that Audacity was becoming spyware and users should uninstall it immediately. A fork of the project was promptly launched, promising: "No telemetry, crash reports and other shenanigans like that!". Alerts were sounded in various distributions, including Debian, Fedora, openSUSE, and others, suggesting that Audacity should be dropped or at least carefully reviewed. Audacity, it seemed, had gone fully over to the dark side and needed to be excised as soon as possible.

It only took a few days for the project to issue a "clarification" to the new privacy policy, stating that "concerns are due largely to unclear phrasing" that would soon be updated. The data that is collected was enumerated; it is limited to the user's IP address, operating-system version, and CPU type. The IP address is only kept for 24 hours. The company's compliance with law enforcement is limited to what is actually required by law. The update also pointed out that this policy does not even come into effect until the upcoming 3.0.3 release; current releases perform no data collection at all.

Meanwhile, others have actually looked at the code to see what data is being collected. That is, after all, one of the major benefits of free software: we can see what a program is doing rather than depending on the assurances of some corporation. The conclusion was quite clear:

Almost every mature desktop app you have ever used does at least two if not all three of these things. I cannot emphasize enough that it's difficult to impossible to even enable these features right now, and they're completely harmless besides.

Since then, the situation would appear to have calmed down somewhat; the mob with the flaming torches broke up and went home prior to reaching the gates (though some of them appear to have found their way to the Tenacity fork instead). Audacity, it seems, has not quite become the evil menace that some people thought it might.

It is worth thinking about how this situation came about, though. Nobody who runs a free-software project, regardless of whether they are building a business around it, wants to be the subject of this sort of attention, after all. Sadly, this episode demonstrates one important aspect of life in this era: if the Internet decides that you are the entity that it is going to hate next, there is little to be done about it. The claims that Audacity is "spyware" far outpaced any efforts to correct the record, and that association will remain in the minds of many for a long time.

But it must also be said that the Muse Group has mishandled the acquisition of this project in ways that have made this kind of blowup more likely. The early attempt to add telemetry, which would have sent significant amounts of user data to third-party servers, understandably upset a lot of users and was eventually withdrawn. The disagreement over contributor license agreements has not helped either. All of this adds up to an impression, whether merited or not, that the Muse Group is looking to exploit a longstanding free-software project in unethical ways. When that is the lens through which your users see you, your actions are likely to be interpreted in the worst possible ways.

Hopefully the Muse Group will learn from these missteps and proceed a bit more carefully from here on out. A focus on real improvements for users and better communication with the user community would help to rebuild trust. It would also be nice if the Internet would learn to damp its reactions a bit — but there seems to be little hope of that. If the Audacity project can find a way to reconnect with its wider community, though, at least one thing will have gotten a little better.

Comments (23 posted)

Syncing all the things

By Jonathan Corbet
July 9, 2021

Computing devices are wonderful; they surely must be, since so many of us have so many of them. The proliferation of computers leads directly to a familiar problem, though: the files we want are always on the wrong machine. One solution is synchronization services that keep a set of files up to date across a multitude of machines; a number of companies have created successful commercial offerings based on such services. Some of us, though, are stubbornly resistant to the idea of placing our data in the hands of corporations and their proprietary systems. For those of us who would rather stay in control of our data, systems like Syncthing offer a possible solution.

The core idea behind synchronization systems is essentially the same for all of them: given a list of directories and a list of systems, ensure that those directories have the same contents on each system. If a file is added on one, it is copied out to the rest; modifications and deletions are (usually) propagated as well. The trouble is always in the details, though; from fiddly setup procedures to data corruption and security problems, there are a lot of ways in which synchronization can go wrong. So users have to put a lot of trust in these systems; open source code is an important step toward that goal, but it is also necessary to believe that the developers involved have thought carefully through the issues.

Syncthing

When the northern-hemisphere summer sets in and fresh news becomes relatively scarce, the opportunity arises to finally get around to checking out an interesting software project or two. Your editor, thus, has been duly playing around with the Syncthing 1.17.0 release, obtained from the Fedora and CentOS (EPEL) repositories. Starting the system on Fedora is just a matter of running the syncthing command to start the synchronization daemon and trying to not be dismayed at the volume of log data generated. The EPEL package, instead, appears to install Syncthing as a systemd user service, meaning that all that was needed was to log into the system and the daemon started automatically.

The daemon is managed through an internal web server that shows up, by default, on port 8384; see the example image on the right. There is initially no authentication required; users will likely want to fix that as one of the first things they do. The web server is, by default, only accessible via the loopback interface; a change to a configuration file can make it available to the Internet as a whole, but that sounds like a daring thing to do even with authentication enabled. An alternative for gaining access to the web interface on a remote machine, as suggested in the documentation, is to set up an SSH tunnel from the local system.

When it starts for the first time, Syncthing generates a "device ID" identifying the local system; it looks like this:

    RS5RZ7K-CORJAP3-TZECYOH-IBLFDZM-KSFWOXB-VBEIYSB-F7MWECH-VQCGLAZ

Setting up synchronization between two machines resembles the Bluetooth pairing process; it is done by providing each side with the device ID belonging to the other. Use of copy-and-paste is advisable here. Alternatively, if both systems are on the same local net, they will discover each other through broadcasts and ask (through the management interface) whether a connection should be established.

After a connection between systems is made, users must tell Syncthing which directories should be synchronized; that is a matter of setting up folders and sharing them with any or all of the known remote systems (which Syncthing calls "devices"). Once the share has been accepted on the remote end, file changes will be propagated back and forth. When possible, Syncthing requests file-change notifications from the kernel; that leads to relatively fast propagation times.

There are a lot of options that can be set to control sharing. Sharing can be made one-way, for example, so that a particular system might create files and send them out without accepting changes from the other systems. One especially interesting (though new and "beta") feature is the ability to share files to specific systems in encrypted form. If one system is, for example, a cloud server that is used primarily for backup or distribution purposes, it can be given encrypted data that it cannot read. Any other system in the sharing network that has the correct password will be able to read those files, though. There are also various ways of handling versioning, which keeps older versions of files around when one system changes them.

It's worth noting that, while it is possible to configure a set of Syncthing clients all connected to a central server, nothing in Syncthing requires that sort of architecture. Systems can be connected in any way that seems to make sense. If a system finds that it needs files that have already propagated to multiple connected peers, it can receive the needed data in blocks, BitTorrent-style, from whichever system can provide it first.

Discovery and security

Interestingly, neither host names nor IP addresses are involved in any stage of the configuration process — by default, at least; the systems find each other based only on the device ID regardless of which networks they are attached to. This, clearly, requires some third-party help. The Syncthing project runs a set of "discovery" servers that will help systems find each other based on their device IDs. There is also a set of "relay servers" that can relay data in situations where the systems involved cannot reach each other directly — when they are both behind NAT firewalls, for example.

Some thought has clearly gone into the security implications of this architecture. Data only goes through relay servers if there is no alternative, for example, and it is encrypted at the endpoints. But there is still some information that a hostile discovery or relay server could obtain that might worry some users. For anybody who is truly worried, the code for both types of server is available; anybody can set up private servers and configure their Syncthing instances to use only those.

According to the documentation, device IDs need not be kept secret, since an affirmative action is required on both sides to set up a connection. One might wonder whether an attacker might try to set up a system with a target's device ID and thus gain access to the managed files. That ID, though, is essentially a public key, and the connection process involves proving possession of the associated private key, so such an attack should not be possible. This page describes device IDs in more detail.

Syncthing on the move

Perhaps the most common use of synchronization on today's net is copying photos from a phone handset to a central server. Since Android phones, at least, are Linux-based, one need only set up a normal shell environment on it and put Syncthing there to achieve this goal; the process shouldn't take more than a day or so. Or one could just install the Android app, which is available on F-Droid and the Google Play Store as well. This app, shown on the right, comes with a folder for the camera (set for send-only sharing) configured out of the box, so it is just a matter of setting up the peers. And, lest one worry about typing one of those device IDs with an on-screen keyboard, the app can read the QR code that the web interface will helpfully provide, easing that process considerably.

One slightly surprising behavior is that the app asks for location permission, which doesn't seem like something it would need. That permission is needed to determine which WiFi network (if any) the phone is on, which is useful for the feature configuring when synchronization should (and should not) be performed. Users of metered WiFi services may want to use this mechanism to avoid synchronization when it could cost them money. In the absence of this permission, the app will, by default, perform synchronization whenever it is connected to any WiFi network.

One need not look far to find complaints from users that the Android app drains the battery quickly. Your editor has not observed this behavior in a limited amount of testing; it is possible that the worst problems have already been fixed.

Closing thoughts

The project states that "security is one of the primary project goals", and the developers do appear to have put some thought into the issue. Encryption is used in the right places, certificates are verified, etc. A quick CVE search turns up two entries over the last four years, one of which enabled the overwriting of arbitrary files. Exploiting that vulnerability would require first gaining control of one of the machines in the sharing network, at which point the battle is likely lost anyway. It does not seem that any sort of formal security audit has been done, but the Syncthing developers are at least making the right kinds of noises.

With regard to reliability, it is not hard to search for (and find) various scary stories from users who have lost data with Syncthing. It seems that many of those problems are the result of operator error; if you set up a system and allow it to delete all your data, it may eventually conclude that you want it to do exactly that. Synchronization can be amazingly efficient at propagating mistakes. Use of versioning can help, as can avoiding the use of two-way synchronization whenever possible. Syncthing doesn't seem like it has a lot of data-losing bugs, but backups are always a good idea.

Syncthing has been syncing things since at least 2013, when the first commit appears in its Git repository; LWN looked at it in 2014. The project is written mostly in Go, and is distributed under the Mozilla Public License. The current Syncthing release is 1.18.0; it came out on July 6 — while this article was being written. The project shows a nearly monthly release cadence in the last year; 1.7.0 was released on July 7, 2020. There have been 728 non-merge commits to the Syncthing repository over the last year from 40 developers; the top three developers (Simon Frei, Jakob Borg, and Jesse Lucas) account for just over 76% of of those commits. The project is thus not swarming with developers, but it appears healthy enough for now.

A company called Kastelo offers support subscriptions for Syncthing and provides significant resources for Syncthing development. The company also is part of the Syncthing Foundation which, in turn, manages the project's infrastructure and makes grants for development projects.

All told, Syncthing leaves a favorable impression. The developers seem to have done the work to create a system that is capable, reliable, secure, and which performs reasonably well. But they have also done the work to make it all easy to set up and make use of — the place where a lot of free-software projects seem to fall down. It is an appealing tool for anybody wanting to take control of their data synchronization and replication needs.

Comments (21 posted)

Page editor: Jonathan Corbet
Next page: Brief items>>