Growing pains for Fedora CoreOS
Growing pains for Fedora CoreOS
Posted Jun 3, 2021 5:42 UTC (Thu) by geuder (subscriber, #62854)Parent article: Growing pains for Fedora CoreOS
We only recently had a small breakage in our FCOS-based system: Podman used to have some defaults where it pulled images from. Our code worked. The defaults changed (or were removed, don't remember from the top of my head) in some automatic update and our code stopped working. Just a little detail, but it demonstrated that basically after every automatic upgrade you need to test your system and be prepared to fix something.
Of course that's not completely different from manually upgraded systems, especially if you run something that others might consider fragile code or just not consider at all.
For a rolling distro additional difficulty are how/when to do these bigger changes which are more likely to break something.
Maybe some selected automatic updates should bundle bigger changes and be announced as higher risk in advance???
Posted Jun 3, 2021 15:38 UTC (Thu)
by mattdm (subscriber, #18)
[Link]
That's basically what a Fedora Linux release is.
Posted Jun 3, 2021 17:55 UTC (Thu)
by dbnichol (subscriber, #39622)
[Link] (3 responses)
Posted Jun 4, 2021 3:12 UTC (Fri)
by bgilbert (subscriber, #4738)
[Link] (2 responses)
Posted Jun 4, 2021 5:32 UTC (Fri)
by dbnichol (subscriber, #39622)
[Link] (1 responses)
How often do you actually use a barrier release?
Posted Jun 4, 2021 14:47 UTC (Fri)
by dustymabe (guest, #107864)
[Link]
The barrier releases and a link to the reason behind it are kept in https://github.com/coreos/fedora-coreos-streams/blob/main... Usually about once every 6 months or so.
Posted Jun 3, 2021 19:29 UTC (Thu)
by walters (subscriber, #7396)
[Link] (3 responses)
That's a really great example of a bug on the risk/reward spectrum around automatic updates and a relatively "fresh" Linux userspace.
Do you have a bit more detail on this? I'm guessing it was something around short names i.e. just `busybox` and not `docker.io/busybox` or so? Has it been fixed since? Did you engage with an upstream issue? How hard was the workaround?
Personally I think it's all around worse for everyone if admins stay on relatively frozen userspace or we try to lump things like this even around e.g. 6 month windows because I think in practice if it's just every 6 months, a good number of people fall out of habit of upgrading at all (when it requires manual intervention) and drop off the train entirely. And that's bad because you're not applying critical kernel security updates etc. that are particularly relevant with containers.
Posted Jun 4, 2021 9:37 UTC (Fri)
by geuder (subscriber, #62854)
[Link] (2 responses)
Posted Jun 4, 2021 13:27 UTC (Fri)
by zdzichu (guest, #17118)
[Link] (1 responses)
Yes, that's true. I would expect “stable” branch is equivalent of current stable Fedora release (which today is 34), but there are Fedora features missing in FCOS.
Posted Jun 4, 2021 15:46 UTC (Fri)
by geuder (subscriber, #62854)
[Link]
Posted Jun 4, 2021 14:44 UTC (Fri)
by dustymabe (guest, #107864)
[Link] (2 responses)
Part of the way we try to make automatic updates more reliable is by offering 3 different update streams (`next`, `testing`, and `stable`) to our users and encouraging everyone to run `next` and `testing` on a percentage of their systems. If you're "testing" nodes encounter a problem you can report it and we can hopefully get it fixed before the much larger pool of "stable" nodes are affected.
More info at https://docs.fedoraproject.org/en-US/fedora-coreos/update...
Posted Jun 8, 2021 17:00 UTC (Tue)
by geuder (subscriber, #62854)
[Link] (1 responses)
Good point.
However, we have (only) 2 instances, not 200. One is for production and one for testing (of our systems, not of FCOS). Running our testing system with a different version than the production system does not sound like a great idea. All test results would basically be possibly non-reproducable.
So we would need to run a 3rd one just for FCOS testing, a 50% overhead. And of course someone would need to check the instance at every update and run some test set. Which is a bit a against the idea of having automatic updates.
Well, no free lunch, I know...
Posted Jun 9, 2021 5:02 UTC (Wed)
by raven667 (subscriber, #5198)
[Link]
Growing pains for Fedora CoreOS
For Endless we use something referred to as a checkpoint release to help handle some of these upgrade issues when you have a rolling automatic ostree process. Normally, the updater pulls the tip of the ostree ref and deploys that. However, if the commit has some additional metadata, it will see that there's a new ref it should follow, but only after deploying and booting into the tip of the current ref.
This allows us to stuff some migration code into the commits on the old ref and ensure it'll run before something tries to upgrade to the current ref. This is the only way we can truly remove old features or ensure systems are prepared for a major change. In a way it acts like a traditional upgrade tool.
Growing pains for Fedora CoreOS
Fedora CoreOS has a barrier release mechanism that does something similar: all updates that traverse the barrier release must update to exactly that release before updating any further. The Fedora CoreOS update client selects the target OS release from a graph of permissible updates maintained outside of the ostree, so barrier releases can be accomplished without an ostree ref switch.
Growing pains for Fedora CoreOS
Growing pains for Fedora CoreOS
Growing pains for Fedora CoreOS
Growing pains for Fedora CoreOS
> Do you have a bit more detail on this? I'm guessing it was something around short names i.e. just `busybox` and not `docker.io/busybox` or so?
Growing pains for Fedora CoreOS
That was also my understanding after seeing the original error because I have noticed the need to change that in my (very rare) manual use of podman. I did neither debug nor fix the problem myself and our git log tells
source /etc/os-release
cat <<EOF >/usr/local/foo/Dockerfile
-FROM f${VERSION_ID}/fedora-toolbox:latest
+FROM registry.fedoraproject.org/fedora-toolbox:latest
RUN dnf install foo
EOF
(This code is being run on CoreOS)
So I wonder what they did there. Before before the code fetched f34/fedora-toolbox:latest, I believe from docker.io. Now they fetch fedora-toolbox:latest from registry.fedoraproject.org. Where did the version number go???
Of course lwn is not a code review site for the code of our company, but interesting in the context of this article is
$ grep VERSION_\\\|VARIANT /etc/os-release
VERSION_ID=34
VERSION_CODENAME=""
VARIANT="CoreOS"
VARIANT_ID=coreos
The article quoted without correction
>
I think this is the fundamental difference here, Fedora CoreOS does not have a version number. It has 3 streams, stable, testing and next,
So is that really true??? No such number is being advertised AFAIK, but internally it is there and I guess at some point in the future it will change. With potentially surprising effects to those who have used it.
Growing pains for Fedora CoreOS
So is that really true??? No such number is being advertised AFAIK, but internally it is there and I guess at some point in the future it will change. With potentially surprising effects to those who have used it.
Growing pains for Fedora CoreOS
Growing pains for Fedora CoreOS
Growing pains for Fedora CoreOS
Growing pains for Fedora CoreOS
