|
|
Subscribe / Log in / New account

Growing pains for Fedora CoreOS

By Jake Edge
June 2, 2021

When last we looked in on Fedora CoreOS back in December, it was under consideration to become an official Fedora edition. That has not happened, yet at least, but it would seem that the CoreOS "emerging edition" is still undergoing some difficulties trying to fit in with the rest of Fedora. There are differences between the needs of a container operating system and those of more general-purpose distributions, which still need to be worked out if Fedora CoreOS is going to "graduate".

Catching up

In mid-May, Dusty Mabe posted an announcement that the stable stream of Fedora CoreOS was being updated to Fedora 34. In it, he noted a few caveats (e.g. "systemd-resolved is still enabled but not used yet [1]"), some recently added features, and some new features that are coming soon. All pretty normal stuff except that Fedora 34 was released at the end of April and Mabe's post showed that Fedora CoreOS has not really kept up.

In fact, as Tomasz Torcz pointed out, the systemd-resolved change was made for Fedora 33, while an upcoming feature ("Move to cgroup v2 by default [5]") was originally made for Fedora 31, which was released in October 2019. That seems to indicate that Fedora CoreOS is lagging the main distribution, which may cause confusion for users, he said. "Should Fedora CoreOS use the same version number while not containing all the changes from main Fedora Linux?"

But Fedora CoreOS does not have version numbers like those of the editions, Clément Verna said:

I think this is the fundamental difference here, Fedora CoreOS does not have a version number. It has 3 streams, stable, testing and next, these streams are based on a version of Fedora Linux but that should just be a detail that most end users should not have to care about.

In addition, Fedora CoreOS has automatic updates, which need to be "rock solid" so that users will trust (and enable) them. But, up until recently, Docker has not had support for version 2 of control groups (cgroups), so a container distribution, which has many users dependent on Docker, could not roll out that change without major disruption. Verna suggested that user confusion might actually be "a good thing" if it leads them to investigate Fedora CoreOS and to learn more about how it works.

Neal Gompa said that Verna's response was "a cop-out and a bad answer". The problem, he said, is that the Fedora CoreOS (or FCOS as he and others abbreviate it) working group has historically not participated in the development of Fedora, and the Changes process in particular. Instead of adapting to the feature changes made for Fedora, FCOS generally just rolls them back, "which has frustrated pretty much everyone". Beyond that, it is not just FCOS that needs to have solid upgrades; breaking upgrades for Fedora are not acceptable either, Gompa said.

But Verna believes that the working group is actually participating in the process. He pointed to four GitHub issues tracking changes for Fedora 32-35 (e.g. for Fedora 32 and for Fedora 35) that were (or need to be) incorporated into FCOS. Vít Ondruch replied that most or all of that work is not visible within the rest of Fedora, though. Verna agreed and suggested that the working group should be more vocal on mailing lists and the like.

Verna was also concerned about changes that are not backward-compatible. Regular Fedora can make those kinds of changes when the major version of the distribution changes, but there is no such opportunity for FCOS:

Breaking or non backward compatible changes are acceptable in Fedora Linux tho between major version bump. Again here the cgroups v2 is a good example, folks using Docker had to perform some manual steps to switch back to cgroups v1 to keep using their workflow working. This is fine when you have a major version bump but this does not happen in FCOS.

One of Verna's questions remained unanswered, though: what should happen if a new Fedora feature conflicts with the needs of another edition (or emerging edition for that matter)?. How are those differing needs going to be resolved?

[...] what happens when a Change proposals breaks FCOS (like cgroups v2 for example) ? Should that just be rejected ? AFAIK not all changes are adopted by every Editions or Spins.

As Fedora evolves and adds more official editions, those kinds of situations are likely to become more frequent. It may be difficult to be on the forefront of new features—part of Fedora's mission is to be "First" with Linux innovations, after all—if some environments and communities are unable to move as quickly. It is something that the Fedora project will need to resolve moving forward.

What's in a name?

Joe Doss disagreed with Verna's initial reply as well. Since FCOS has the Fedora name in it, "it should have the same fundamental features and changes that ship with each Fedora release". He found Verna's arguments "pretty dismissive". Verna was apologetic, but acknowledged that he has a bias that may not be universally shared:

I am a developer and I don't have a strong interest in the OS, I just expect it to work and provide me the tools needed to do my job. To me that's the beauty of FCOS, I get a solid, tested OS that get automated updates and just works, I honestly don't care to know which version of Fedora Linux it is based on or which features it has. I want to spin-up an instance make sure that my application works and forget about it.

I also understand that there are other type of users that will care much more about the base OS than me:-).

It is the inclusion of "Fedora" in the FCOS name that is causing much of the problem, Ron Olson said. "I was surprised when I learned Fedora CoreOS didn't support cgroups v2 and that confused me; it's Fedora, of course it would have the latest-n-greatest." He noted that he had used CoreOS before Red Hat bought the company and did not have those kinds of expectations in those days. Though he recognized the likely futility of the idea, he suggested that a name change might help:

I'm guessing this is laughably not possible, but I'm going to suggest anyway that maybe it be renamed either back to simply "CoreOS" or something new like "Bowler" or whatever that indicates that it is its own special thing and expectations can be set accordingly.

Verna acknowledged that the Fedora name brought along some expectations, he also noted that FCOS is less than two-years-old at this point, so it is to be expected that there will be some rough spots that need to be worked out:

FCOS has a different release model than Fedora Linux and I think it is fair to give it time to, on one hand continue to improve how features are making their way in FCOS, and on the other hand get people be more familiar with what FCOS is and what expectations to have about it.

The cgroups issue reared its head several times in the discussion, though Colin Walters thought that the issue had been beaten to death long before. In addition, as Mabe noted, FCOS does already support cgroups v2, it is just not the default. Over the next month, that will be changing so that v2 is the default going forward:

We're trying to make sure users have a good experience. Docker users are a big part of that. Changing the default before Docker supported cgroups v2 was really not an option for us at the time.

The proposal to make Fedora CoreOS into an edition was originally targeted for Fedora 34, but that was not to be. The Change entry has been pushed to Fedora 35 and the Fedora Engineering Steering Committee (FESCo) issue tracking the change proposal was closed at the end of February. So far, no change proposal has been submitted for Fedora 35, though there is still plenty of time to do so. This discussion might indicate that it is still a bit too early to make that change, but time will tell.



to post comments

Growing pains for Fedora CoreOS

Posted Jun 3, 2021 5:42 UTC (Thu) by geuder (subscriber, #62854) [Link] (12 responses)

Automatic updates, new features and full backwards compatibility are an unsolvable equation.

We only recently had a small breakage in our FCOS-based system: Podman used to have some defaults where it pulled images from. Our code worked. The defaults changed (or were removed, don't remember from the top of my head) in some automatic update and our code stopped working. Just a little detail, but it demonstrated that basically after every automatic upgrade you need to test your system and be prepared to fix something.

Of course that's not completely different from manually upgraded systems, especially if you run something that others might consider fragile code or just not consider at all.

For a rolling distro additional difficulty are how/when to do these bigger changes which are more likely to break something.

Maybe some selected automatic updates should bundle bigger changes and be announced as higher risk in advance???

Growing pains for Fedora CoreOS

Posted Jun 3, 2021 15:38 UTC (Thu) by mattdm (subscriber, #18) [Link]

> Maybe some selected automatic updates should bundle bigger changes and be announced as higher risk in advance???

That's basically what a Fedora Linux release is.

Growing pains for Fedora CoreOS

Posted Jun 3, 2021 17:55 UTC (Thu) by dbnichol (subscriber, #39622) [Link] (3 responses)

For Endless we use something referred to as a checkpoint release to help handle some of these upgrade issues when you have a rolling automatic ostree process. Normally, the updater pulls the tip of the ostree ref and deploys that. However, if the commit has some additional metadata, it will see that there's a new ref it should follow, but only after deploying and booting into the tip of the current ref. This allows us to stuff some migration code into the commits on the old ref and ensure it'll run before something tries to upgrade to the current ref. This is the only way we can truly remove old features or ensure systems are prepared for a major change. In a way it acts like a traditional upgrade tool.

Growing pains for Fedora CoreOS

Posted Jun 4, 2021 3:12 UTC (Fri) by bgilbert (subscriber, #4738) [Link] (2 responses)

Fedora CoreOS has a barrier release mechanism that does something similar: all updates that traverse the barrier release must update to exactly that release before updating any further. The Fedora CoreOS update client selects the target OS release from a graph of permissible updates maintained outside of the ostree, so barrier releases can be accomplished without an ostree ref switch.

Growing pains for Fedora CoreOS

Posted Jun 4, 2021 5:32 UTC (Fri) by dbnichol (subscriber, #39622) [Link] (1 responses)

Oh, that's neat. The ref switch is simple but pretty ugly. I hadn't considered anything besides "I want to be at the head of the ref", but having a client that negotiates specific commits is nice.

How often do you actually use a barrier release?

Growing pains for Fedora CoreOS

Posted Jun 4, 2021 14:47 UTC (Fri) by dustymabe (guest, #107864) [Link]

> How often do you actually use a barrier release?

The barrier releases and a link to the reason behind it are kept in https://github.com/coreos/fedora-coreos-streams/blob/main... Usually about once every 6 months or so.

Growing pains for Fedora CoreOS

Posted Jun 3, 2021 19:29 UTC (Thu) by walters (subscriber, #7396) [Link] (3 responses)

> We only recently had a small breakage in our FCOS-based system: Podman used to have some defaults where it pulled images from. Our code worked. The defaults changed (or were removed, don't remember from the top of my head) in some automatic update and our code stopped working.

That's a really great example of a bug on the risk/reward spectrum around automatic updates and a relatively "fresh" Linux userspace.

Do you have a bit more detail on this? I'm guessing it was something around short names i.e. just `busybox` and not `docker.io/busybox` or so? Has it been fixed since? Did you engage with an upstream issue? How hard was the workaround?

Personally I think it's all around worse for everyone if admins stay on relatively frozen userspace or we try to lump things like this even around e.g. 6 month windows because I think in practice if it's just every 6 months, a good number of people fall out of habit of upgrading at all (when it requires manual intervention) and drop off the train entirely. And that's bad because you're not applying critical kernel security updates etc. that are particularly relevant with containers.

Growing pains for Fedora CoreOS

Posted Jun 4, 2021 9:37 UTC (Fri) by geuder (subscriber, #62854) [Link] (2 responses)

> Do you have a bit more detail on this? I'm guessing it was something around short names i.e. just `busybox` and not `docker.io/busybox` or so?

That was also my understanding after seeing the original error because I have noticed the need to change that in my (very rare) manual use of podman. I did neither debug nor fix the problem myself and our git log tells
 source /etc/os-release
 cat <<EOF >/usr/local/foo/Dockerfile
-FROM f${VERSION_ID}/fedora-toolbox:latest
+FROM registry.fedoraproject.org/fedora-toolbox:latest
 
 RUN dnf install foo
 EOF
(This code is being run on CoreOS)

So I wonder what they did there. Before before the code fetched f34/fedora-toolbox:latest, I believe from docker.io. Now they fetch fedora-toolbox:latest from registry.fedoraproject.org. Where did the version number go??? Of course lwn is not a code review site for the code of our company, but interesting in the context of this article is
$ grep VERSION_\\\|VARIANT /etc/os-release 
VERSION_ID=34
VERSION_CODENAME=""
VARIANT="CoreOS"
VARIANT_ID=coreos
The article quoted without correction

> I think this is the fundamental difference here, Fedora CoreOS does not have a version number. It has 3 streams, stable, testing and next,

So is that really true??? No such number is being advertised AFAIK, but internally it is there and I guess at some point in the future it will change. With potentially surprising effects to those who have used it.

Growing pains for Fedora CoreOS

Posted Jun 4, 2021 13:27 UTC (Fri) by zdzichu (guest, #17118) [Link] (1 responses)

So is that really true??? No such number is being advertised AFAIK, but internally it is there and I guess at some point in the future it will change. With potentially surprising effects to those who have used it.

Yes, that's true. I would expect “stable” branch is equivalent of current stable Fedora release (which today is 34), but there are Fedora features missing in FCOS.

Growing pains for Fedora CoreOS

Posted Jun 4, 2021 15:46 UTC (Fri) by geuder (subscriber, #62854) [Link]

But some day 34 will no longer be current and that day FCOS stable needs to make a bigger jump with your logic of being equivalent.

Growing pains for Fedora CoreOS

Posted Jun 4, 2021 14:44 UTC (Fri) by dustymabe (guest, #107864) [Link] (2 responses)

> Automatic updates, new features and full backwards compatibility are an unsolvable equation.

Part of the way we try to make automatic updates more reliable is by offering 3 different update streams (`next`, `testing`, and `stable`) to our users and encouraging everyone to run `next` and `testing` on a percentage of their systems. If you're "testing" nodes encounter a problem you can report it and we can hopefully get it fixed before the much larger pool of "stable" nodes are affected.

More info at https://docs.fedoraproject.org/en-US/fedora-coreos/update...

Growing pains for Fedora CoreOS

Posted Jun 8, 2021 17:00 UTC (Tue) by geuder (subscriber, #62854) [Link] (1 responses)

> encouraging everyone to run `next` and `testing` on a percentage of their systems.

Good point.

However, we have (only) 2 instances, not 200. One is for production and one for testing (of our systems, not of FCOS). Running our testing system with a different version than the production system does not sound like a great idea. All test results would basically be possibly non-reproducable.

So we would need to run a 3rd one just for FCOS testing, a 50% overhead. And of course someone would need to check the instance at every update and run some test set. Which is a bit a against the idea of having automatic updates.

Well, no free lunch, I know...

Growing pains for Fedora CoreOS

Posted Jun 9, 2021 5:02 UTC (Wed) by raven667 (subscriber, #5198) [Link]

In this situation you'd be doing all your changes on the test system first, right? It's not that much of a departure as there would often be a difference between what is running in test and what is in prod, test can't guarantee repro of problems found in prod unless you reset it back to the versions used in prod. There is value in finding upgrade related problems in test first, but as you note the initial advice is probably targeted more toward admins with dozens or hundreds of systems where having a small cadre running bleeding edge code is relatively low risk to the overall system health. The quality benefits of going from prod-only to prod & qa to prod, qa & test to prod, qa, test & dev environments are diminishing while the cost increases but its the scalability of work that goes up the most which is mainly of benefit to larger organizations.

Growing pains for Fedora CoreOS

Posted Jun 3, 2021 17:07 UTC (Thu) by highvoltage (subscriber, #57465) [Link]

So, some new Fedora sub-project isn't quite ready for prime-time yet? Must be a very, very slow news day :)

Growing pains for Fedora CoreOS

Posted Jun 16, 2021 12:54 UTC (Wed) by geuder (subscriber, #62854) [Link]

Today I get a motd
############################################################################
WARNING: This system is using cgroups v1. For increased reliability
it is strongly recommended to migrate this system and your workloads
to use cgroups v2. For instructions on how to adjust kernel arguments
to use cgroups v2, see:
https://docs.fedoraproject.org/en-US/fedora-coreos/kernel-args/

To disable this warning, use:
sudo systemctl disable coreos-check-cgroups.service
############################################################################
So they are proceeding, but as expected that won't work fully automatically in all cases.


Copyright © 2021, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds