Linux's missing CRL infrastructure

By Daroc Alden
August 25, 2025

In July 2024, Let's Encrypt, the nonprofit TLS certificate authority (CA), announced that it would be ending support for the online certificate status protocol (OCSP), which is used to determine when a server's signing certificate has been revoked. This prevents a compromised key from being used to impersonate a web server. The organization cited privacy concerns, and recommended that people rely on certificate revocation lists (CRLs) instead. On August 6, Let's Encrypt followed through and disabled its OCSP service. This poses a problem for Linux systems that must now rely on CRLs because, unlike on other operating systems, there is no standardized way for Linux programs to share a CRL cache.

CRLs are, as the name might suggest, another solution to the problem of certificate revocation. If a web server loses control of its signing certificate, the administrator is supposed to report this fact to the web site's certificate authority, which will publish a revocation for it. Clients periodically download lists of these revocations from the certificate authorities that they know about; if an attacker tries to use a key that has been revoked in order to impersonate a web site, the client can notice that the key is present in the list and refuse to connect. If the certificate revocation system were ever to stop working, an attacker with stolen credentials could perform a man-in-the-middle attack in order to impersonate a web site or eavesdrop on the user's communications with it.

This system worked well enough in 1999, but as the internet grew, the number of certificate revocations grew along with it. In 2002, RFC 6960 was standardized, creating the online certificate status protocol. Clients using OCSP send a request directly to the certificate authority to validate the certificate of each site they wish to contact. This had the benefit of freeing every client from having to store the CRL of every certificate authority they trusted, but it had a number of drawbacks.

Firstly, using OCSP means waiting to establish an entire second TCP connection to the certificate authority's potentially busy server before completing the TLS handshake with a web site. Perhaps more seriously, it also exposes which web sites a client is visiting to one or more certificate authorities. The original OCSP protocol did not even mandate that requests and responses should be encrypted, so the information about which web sites a client was visiting were potentially visible to everyone who could eavesdrop on data on the wire somewhere along the path.

Those problems were partially solved by the introduction of "OCSP stapling", an extension whereby the server makes its own OCSP request to the certificate authority, and then includes the response (which is signed by the certificate authority, and therefore can't be tampered with) in its opening TLS handshake. The response can be cached by the server for a short time, reducing the load on the certificate authority.

This led to OCSP recapitulating the way certificates work in the first place. A server makes a request to a certificate authority for timestamped cryptographic proof that it is who it claims, with the proof expiring after a certain time, and then presents this proof to clients that connect to it. But since OCSP is not universally supported, the absence of a stapled OCSP response could be as the server intends, or it could be a sign that the connection is being tampered with. Similarly, if a certificate authority doesn't respond to an OCSP request, it could be that its server is down, or it could be that an attacker is blocking the response. Because of the spotty support and extra complexity, in practice some clients began to ignore OCSP timeouts from certificate authorities — which, in the face of an attacker who can interfere with packets on the network, is equivalent to giving up on OCSP entirely.

This is part of what made browser vendors and other interested parties start pushing for certificates with shorter expiration dates: if servers could make requests to OCSP servers automatically, why not do the same for certificates themselves? Short-lived certificates, with lifetimes measured in days, don't really gain anything from OCSP, and don't spend as much time taking up space in a CRL if they are compromised.

In 2023, the CA/browser forum decided to make OCSP optional, and CRLs required once more. The major browsers now use CRLs preferentially, although Firefox will fall back to OCSP if a certificate is not included in its CRL and an OCSP server is available. In 2015, Mozilla began publishing a consolidated CRL that combines the CRLs from various certificate authorities, so that each browser would only need to download and process a single file. It also includes a list of known-good certificates, to avoid the overhead of OCSP for many web sites. Chrome does the same with its own list. Both macOS and Windows also have an OS-wide shared CRL downloading mechanism.

On Linux, outside the browser, there is no such shared mechanism. In practice, many programs just don't support certificate revocation. OpenSSL and other cryptographic libraries include the tools needed to verify certificates against a CRL or an OCSP response, but many programs don't use them. Curl has support for enabling OCSP as well, but it's not enabled by default. Individual programs probably shouldn't be responsible for deciding which CRLs to fetch and keeping them up to date in any case. The Let's Encrypt community forum hosted some discussion about potential solutions, but didn't come to a consensus. Jesper Kristensen noted that certificate validation on Linux is "a mess" in other ways too, with applications often using insecure root certificates:

Most [applications] fill the content of these root stores by copying Mozilla's root store and removing Mozilla-specific attributes from the roots because applications can't read these attributes. But Mozilla has repeatedly said that using their root store while ignoring these attributes is insecure.

Ellie Kanning, who started the discussion, filed follow-up bug reports with OpenSSL, NetworkManager, and Freedesktop's desktop specifications, but none of the projects thought that maintaining a system-wide CRL was in-scope for them. I emailed Kanning to request a comment, and she was clear that she isn't an expert in this area, just a Linux user who is worried about the security of her desktop. She called the situation "an incident waiting to happen", and seemed somewhat resigned to it, given the reception of the bugs she tried to file:

It seems to me like this can only be solved if some of the enterprise users, e.g. rallied behind the Linux Foundation, can be convinced to show some interest in maintaining an up-to-date cert store that has some sort of CRL subscription mechanism like Mozilla's cert push that is available for all Linux systems. Perhaps BSDs could then benefit from this as well.

Some people have suggested that the trend of shortening certificate lifetimes — with the maximum lifetime acceptable to browsers planned to be 47 days by 2029 — means that both CRLs and OCSP are becoming less important, and not worth the difficulty of supporting. Let's Encrypt does offer six-day certificates to a limited number of domains on a trial basis; that validity period is shorter than the lifetime of most OCSP responses. Andrew Hutchings, a developer of WolfSSL, said: "With OCSP being flawed in many implementations and CRLs just not scaling, there isn't really an alternative [to shortening certificate lifetimes] right now."

Kanning doesn't agree, pointing out that even the shortest lifetimes being discussed are still a fairly long time:

The problem is in my opinion that many services will simply not move to short-lived [certificates], especially big enterprises that don't have an IT focus I don't see doing that. The Chrome plan seems to be to limit it to around a month, but that's very long time if a cert were to leak right after issuing.

It's clearly not impossible to have secure TLS connections on a modern Linux desktop. Firefox has the code to support this, and Mozilla already maintains the CRL infrastructure to keep it secure. But with OCSP going away, most applications simply don't have access to the frequently updated CRL that they need in order to correctly establish a modern TLS connection. Linux systems would benefit from a standard way to update and maintain a CRL for all programs to use by default, if anyone decides to tackle the problem.

Automation

Posted Aug 25, 2025 19:47 UTC (Mon) by matp75 (subscriber, #45699) [Link]

This mean more automation for both certificate renewal and deploying them in a non impacting way. It will take some time and effort but this is clearly the direction

Short lived certificate

Posted Aug 25, 2025 21:06 UTC (Mon) by ju3Ceemi (subscriber, #102464) [Link] (52 responses)

"The problem is in my opinion that many services will simply not move to short-lived [certificates], especially big enterprises"

But they have no choice, do they ?

This is the beauty of it : walk or die

In the end, short lived tokens are the way. Revocation is messy at best, harmful at worst, which is probably why so many other related protocol simply do not have anything like that (oauth, krb)

Short lived certificate

Posted Aug 25, 2025 22:51 UTC (Mon) by Wol (subscriber, #4433) [Link] (50 responses)

> In the end, short lived tokens are the way. Revocation is messy at best, harmful at worst, which is probably why so many other related protocol simply do not have anything like that (oauth, krb)

Until you get screwed over by unitentional / unexpected side effects.

Never mind it should have been forseen, and there are ways round it ... we had a short ttl for DHCP, our systems went down, and when they came back up all the IP addresses had changed. Cue massive breakage as various addresses - which were assumed fixed - changed and caused havoc.

Cheers,
Wol

Short lived certificate

Posted Aug 26, 2025 9:15 UTC (Tue) by magfr (subscriber, #16052) [Link] (49 responses)

But then the question is what the error was.

Was the error that the addresses where assumed fixed? Then the client program has a bug.

Was the error that the addresses changed? Then the DHCP setup has a bug (no, not the TTL, that the fixed addresses weren't mapped to the servers they should be mapped to)

When you do a DNS lookup you get a time to live which tells how long the result is valid but that value is sadly not propagated through the gethostent interface so way to many programs just retry the same old address rather than do the needed relookup.

Short lived certificate

Posted Aug 26, 2025 9:46 UTC (Tue) by rschroev (subscriber, #4164) [Link] (12 responses)

The root problem is obviously that either DHCP or the clients were not configured correctly. In general best that problems like that manifest early so that they can get fixed as soon as possible.

But the way DHCP works, in many cases clients get the same address they had before, which masks incorrect configurations, even with short lease times. In fact with long lease times it is quite likely the misconfiguration would never have caused a problem, and you can even start to wonder if the configuration is really that bad if it never causes problems (I would say yes, it's bad, because it's still very fragile if nothing else).

Short lived certificate

Posted Aug 26, 2025 10:24 UTC (Tue) by Wol (subscriber, #4433) [Link] (11 responses)

AIUI, leases are renewed at lifetime/2, so provided the system doesn't go down, the IP address never changed.

Here, I think the network was down for lifetime+something, so everything expired. And when it came up and servers and clients tried to exchange IPs, everything went pear-shaped because a network reboot hadn't been planned for and you can't get an IP for something that hasn't rebooted.

Somewhere in all this, was a hosts file full of dynamic addresses ...

The obvious (to me) way of handling this is what my home router has - a dhcp server that has a hard map between mac and IP for my servers and printers. I expect there are other ways, but if you have a hosts file that's the obvious one ...

It's an industrial mesh network, so it might not be as simple as that though :-)

Cheers,
Wol

Short lived certificate

Posted Aug 26, 2025 17:30 UTC (Tue) by NYKevin (subscriber, #129325) [Link] (9 responses)

> The obvious (to me) way of handling this is what my home router has - a dhcp server that has a hard map between mac and IP for my servers and printers. I expect there are other ways, but if you have a hosts file that's the obvious one ...

On IPv4, this is by far the most reasonable method (nobody wants to run around statically configuring each and every device one at a time). On IPv6, it is even easier:

1. Choose a valid ULA prefix (preferably at random).
2. Advertise it on the network, e.g. with radvd or your router's built-in functionality (if it supports ULAs).
3. That's it.

This works because SLAAC automatically picks deterministic pseudorandom addresses for each device/network pair, and devices automatically SLAAC themselves as soon as they see a router advertisement packet, so every IPv6-capable device on the network will autoconfigure itself with a unique, stable address.

Technically, your IPv6 addresses are already stable even if you don't do this, but then you are vulnerable to your ISP changing your network prefix. There is not much reason for an ISP to do that, but it is theoretically a thing they can do. The point of ULAs is to obtain private address space that you uniquely own and control, without having to pay anybody for the privilege of doing so (and without needing to use NAT66).

Short lived certificate

Posted Aug 26, 2025 17:40 UTC (Tue) by NYKevin (subscriber, #129325) [Link] (1 responses)

Actually, I'm mistaken. The truth is that it's much harder in IPv6, because most consumer devices automatically rotate their addresses frequently, and you have to turn this off separately on each device. This is (probably?) done even on a ULA prefix. In principle the privacy risks of a stable ULA are far lower than the privacy risks of a stable "regular" address (ULAs are not generally routable on the public internet, similar to RFC 1918 addresses in IPv4). But there might be situations where a ULA could be leaked even if it is not used for public addressing, so probably they ought to rotate by default too.

Short lived certificate

Posted Aug 26, 2025 20:00 UTC (Tue) by eythian (subscriber, #86862) [Link]

They shouldn't. They should have a fixed address, as you described, that can be used for incoming connections. Then besides that, they should also have the "privacy" address that rotates that is used for outgoing connections.

This said, when I've set up Linux servers on my network, privacy extensions are disabled by default, but when I connect a desktop, they're enabled. I'm not sure what does this switch.

There may be exceptions, I've been learning IPv6 lately but it's still not up to my decades of absorbed IPv4 knowledge.

Short lived certificate

Posted Aug 26, 2025 19:01 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (6 responses)

> This works because SLAAC automatically picks deterministic pseudorandom addresses for each device/network pair

LOL, no. You get a hellscape of "privacy IPv6" addresses. So each device can (and will!) have multiple addresses active at a time.

But wait, there's more! It's very much possible for a device to prefer a public IPv6 source address even when talking to _other_ ULA addresses. So if your Internet connection goes down and the publicly routable prefix is withdrawn, you can suddenly end up not being able to print.

Short lived certificate

Posted Aug 27, 2025 3:32 UTC (Wed) by ATLief (subscriber, #166135) [Link] (1 responses)

> It's very much possible for a device to prefer a public IPv6 source address even when talking to _other_ ULA addresses.

Most Linux systems automatically use the source address with the largest common prefix to the destination address. The trouble is if your DNS records return both a ULA address and a publicly-routable address, because the selection of the of the destination address is just based on round-robin.

Short lived certificate

Posted Aug 27, 2025 4:14 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link]

Unfortunately, that's not the case. ULAs are de-prioritized because of this: https://www.ietf.org/proceedings/113/slides/slides-113-6m...

The RFC specifies that GUA-GUA is preferred, even when ULA-ULA is available: https://www.ietf.org/archive/id/draft-ietf-6man-rfc6724-u...

Short lived certificate

Posted Aug 27, 2025 11:35 UTC (Wed) by cortana (subscriber, #24596) [Link] (2 responses)

A shame there isn't a bit in the prefix information (or maybe a DHCPv6 option, I dunno) to tell clients whether privacy addressing should be enabled or not...

Short lived certificate

Posted Aug 28, 2025 16:46 UTC (Thu) by NYKevin (subscriber, #129325) [Link] (1 responses)

That would be a vulnerability - an open WiFi hotspot, that many devices will automatically connect to by default, should not be allowed to demand a trackable IP address (at least, not without the user's consent).

Short lived certificate

Posted Aug 28, 2025 19:13 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

That access point already has the client's MAC address and can NAT the privacy IPv6 addresses to a non-privacy IPv6.

Short lived certificate

Posted Aug 29, 2025 14:37 UTC (Fri) by raven667 (subscriber, #5198) [Link]

Do hosts not *also* get a stable management address, either EUI-64 based on the MAC (ff:fe old style) or the newer RFC7217 "stable privacy" address based on network prefix, interface, SSID, machine UUID? The temporary privacy addresses should be used for outbound connections based on their preferred lifetime but if you run services the management address should be usable, `scope global dynamic mngtmpaddr noprefixroute` vs `scope global temporary dynamic`.

Short lived certificate

Posted Aug 26, 2025 17:43 UTC (Tue) by pizza (subscriber, #46) [Link]

The way I've traditionally dealt with this is to configure the DHCP server to restrict dynamic leases to a subset (eg x.y.z.80-240) and hand out fixed addresses (based on mac addresses) to the stuff that needs DNS entries.

The only things that I condigure truly statically are the physical infrastructure -- ie switches, routers, and dns+dhcp servers.

Short lived certificate

Posted Aug 26, 2025 14:18 UTC (Tue) by dskoll (subscriber, #1630) [Link] (35 responses)

I think DHCP servers should deliberately change the IP address on each renewal if the address isn't explicitly specified as static, or at least there should be an option to get them to do this. That would quickly shake out misconfigurations before a surprise turned them into a problem.

Short lived certificate

Posted Aug 26, 2025 14:36 UTC (Tue) by ju3Ceemi (subscriber, #102464) [Link] (1 responses)

This would kill all existing TCP connections in the process

Users would probably not be happy

Short lived certificate

Posted Aug 26, 2025 14:46 UTC (Tue) by dskoll (subscriber, #1630) [Link]

Yes, there is that disadvantage. So I guess it should be an option that you only enable if you want to shake out devices that have a dynamic address that should have a static address. It's probably not something you'd want on a public access WiFi network, for example.

Short lived certificate

Posted Aug 26, 2025 14:45 UTC (Tue) by farnz (subscriber, #17727) [Link] (32 responses)

The trouble is that DHCP for IPv4 as defined has no way to allocate one machine multiple IP addresses. This, in turn, means that when DHCP changes your IP address, you have to terminate all ongoing connections immediately. Sucks to be you if you have a renewal every 16 hours, and a process that needs a connection to run for 17 hours, or if you happen to start a critical command over SSH just before renewal.

IPv6 has the concept of "deprecated" addresses, so that a host can keep hold of an address that it's preparing to stop using for as long as it takes for established connections to drop out of use, only actually dropping the address when it's idle. This could, at least in theory, play nicely with MPTCP, multipath QUIC, and anything else that allows an endpoint to add and remove IP addresses, since you can add your new IPv6 address as the primary path, then remove the deprecated address, keeping long-lived connections alive, but dropping the old address "early".

Short lived certificate

Posted Aug 26, 2025 15:08 UTC (Tue) by dskoll (subscriber, #1630) [Link] (30 responses)

But what this really means is that we have a very precarious situation. If, let's say, a DHCP server crashes and it hasn't been keeping a persistent database of address assignments, or that database is damaged somehow, then you could end up getting different IP addresses anyway, similar to Wol's situation.

Instead of this being a regular and annoying thing that would prompt a sysadmin to configure static IP addresses, it would happen rarely and most likely at a very inopportune moment.

Change address on every DHCP renewal

Posted Aug 26, 2025 15:47 UTC (Tue) by farnz (subscriber, #17727) [Link] (29 responses)

As so often, you've now got a tradeoff to make:

Do you have a regular (slightly randomised, since renewal is not on a strict timetable) blip at renewal time, where the server sends DHCPNAK in reply to any DHCPREQUEST not paired with a "recent" offer and requires a client to get a new DHCPOFFER and new IP? Pros: anything that depends on addresses not changing gets found. Cons: all open connections, both inbound and outbound, are forcibly disconnected when the old address is removed, which (in IPv4) happens before the new address is added.
Do you rely on devices using DHCPREQUEST with their existing IP when they're staying up, and change the IP when they use DHCPDISCOVER to get a new offer? Pros: no connections get dropped when a system stays alive. Cons: you now need to remove the client lease database and restart the DHCP client to get a new IP address - but scheduled semi-random reboots are a good way to discover "surprise" dependencies anyway.
Do you migrate to IPv6, and use privacy addressing to make it impossible to store a device's IP address, without the blip? Pros: stable outbound connectivity, inbound address changes frequently (I've configured 24 hours on my devices) unless a static address is configured. Cons: if you're not IPv6 enabled already, the migration is tricky.

And in all cases, the changing IP address gives you a second-order problem; when diagnosing a problem over time, you need to have a time-aware database of IP address to device, so that you can see if weirdness goes with the address or the device.

Change address on every DHCP renewal

Posted Aug 26, 2025 16:09 UTC (Tue) by Wol (subscriber, #4433) [Link] (25 responses)

Do you tell DHCP to always "serve this IP to this MAC"

Pros: you now do have a fixed IP, managed along with all your dynamic addresses

Cons: you actually now have to manage the stuff ...

Cheers,
Wol

Change address on every DHCP renewal

Posted Aug 26, 2025 16:35 UTC (Tue) by farnz (subscriber, #17727) [Link] (20 responses)

To make that work, you'd have to disable DHCP pools, so that only known MAC addresses can get an address at all. This is, of course, a perfectly reasonable way to run an industrial network, but means that you don't have dynamic addressing at all.

Change address on every DHCP renewal

Posted Aug 27, 2025 9:47 UTC (Wed) by paulj (subscriber, #341) [Link] (19 responses)

DHCP servers can assign static addresses from a wider dynamic pool (e.g., dnsmasq, used on openwrt and by libvirt, can). Perhaps the DHCP server you're used to is just a bit limited.

Change address on every DHCP renewal

Posted Aug 27, 2025 9:51 UTC (Wed) by farnz (subscriber, #17727) [Link] (18 responses)

I don't see how to configure dnsmasq such that having assigned an address from a pool, it will maintain that device/address link forever, even in the face of the leases database being corrupted (which, from the sounds of things, is part of what happened at WoL's site). In particular, this means that once an address has been assigned, it's always assigned to that device (even if the pool is exhausted, and even if another device explicitly requests that address), and it's never going to assign a different address to that device (even if the leases database is corrupt, and even if the device requests a different address).

Perhaps you can clarify how your DHCP server of choice handles this.

Change address on every DHCP renewal

Posted Aug 27, 2025 10:05 UTC (Wed) by Wol (subscriber, #4433) [Link] (1 responses)

On my home router, I split my (RFC1918) address space into two. Any server-type devices I manually allocate an address in the first half, and put those addresses in hosts. Any unrecognised macs get allocated (randomly) from the second half. All devices are configured to get their IP via DHCP, but because the DHCPd knows which are static, they always get the same address.

(When I started doing this, I didn't understand how to set up dynamic DNS, and this for my home network was so much simpler.)

As I understand it, what happened at my site was they didn't put the server type stuff into the static half, and relied on the fact that when leases are renewed the IP doesn't change ... except because the down-time was longer than the ttl, the leases weren't renewed, they were re-allocated. And because the IPs were in the hosts file ... whoops!

Cheers,
Wol

Change address on every DHCP renewal

Posted Aug 27, 2025 10:08 UTC (Wed) by farnz (subscriber, #17727) [Link]

Right, but the only way to prevent that being a problem is to not have the second half at all - either you have a manually allocated address, or you're not on the network.

Otherwise, we hit the same problem; you have devices that are dynamically assigned addresses, where downtime causes them to move, but the addresses are hardcoded in places you don't know about.

Change address on every DHCP renewal

Posted Aug 27, 2025 13:39 UTC (Wed) by paulj (subscriber, #341) [Link] (15 responses)

It's just a text file of leases. Now, on OpenWRT, you're not actually configuring dnsmasq directly - you configure the OpenWRT UCI system, and when OpenWRT starts dnsmasq it creates the dnsmasq configs on the fly from the authoritatitive UCI information. Everytime dnsmasq is started, it has a freshly created lease file.

I don't understand the problem you envisage. If dnsmasq knows MAC addr X should get IP Y, it will give out IP Y if MAC X makes a request or a renew. If dnsmasq does not know (cause config was lost or corrupt) that X -> Y, then when MAC X asks, it will get some other IP. I've never had the problem you describe. Closest I've seen is that I've changed a wifi or NIC card/usb stick, and my hosts didn't get its usual static IP - just some random other one from the same pool. Until I update the OpenWRT config with the new MAC (openwrt automatically updates and restarts dnsmasq), and re-join/restart the network on the host.

Something similar can happen with IPv6 DHCP (which is not dnsmasq on OpenWRT, but a different dhcpv6 binary) with no hardware change, because in DHCPv6 static assignments are usually done via the DUID (DHCP UID), not the MAC. And the DUID sometimes can change on hosts (e.g. reinstall), which can be annoying. Still, you just end up with a different address than the static one - not "no address". So v6 the static host suffix (note: note the full IP - the single DHCPv6 suffix can be combined with and sued for multiple on-link prefixes!) survives swapping the network hardware, but not a reinstall on the same hardware.

Change address on every DHCP renewal

Posted Aug 27, 2025 14:43 UTC (Wed) by farnz (subscriber, #17727) [Link] (14 responses)

The problem statement is that the old DHCP server failed. A new one was shipped in, with configuration but not persistent state restored from their VCS; by the time the new DHCP server arrived, all devices had lost their IP address.

When the new DHCP server started up, several systems failed to operate because they had hard-coded IP addresses that were not in config, the time taken meant the client had lost its IP (and was asking for a new DHCPOFFER instead of using DHCPREQUEST to renew the same lease), and the hardware failure meant that the leases database was lost.

How do you transform this into a state where the users discover within a human-reasonable (24 hours or so) timeframe that they'd hardcoded an address without making sure config matches? The two (broad strokes) solutions above are:

Ensure the address changes anyway (by various means) unless it's in config as a MAC→IP mapping, and make appropriate tradeoffs around connections blipping when the address changes.
Don't allow devices to get an IP unless it's in config as a MAC→IP mapping, so that hardcoding is safe.

If you've got another solution, I'm interested, because this is a hard problem to solve.

Change address on every DHCP renewal

Posted Aug 27, 2025 15:34 UTC (Wed) by Wol (subscriber, #4433) [Link]

Given that the hosts file has to be updated and shipped around, regenerate the DHCPd persistent state by importing the hosts file and say "if the requester says 'my name is X', look up X in the hosts file and return that IP if found".

Dunno how it worked :-) but one system I worked on, you put the hostname in /etc/hostname, and added the host/IP to /etc/hosts, and that was how it discovered its own (fixed) IP address.

Cheers,
Wol

Change address on every DHCP renewal

Posted Aug 27, 2025 15:54 UTC (Wed) by paulj (subscriber, #341) [Link] (12 responses)

The scenario of the DHCP server starting up with config, but having lost lease state is the norm in OpenWRT. It's generally running on small APs and consumer routers out of RO flash with a tmpfs for most runtime state. If you reboot an OpenWRT router, clients will either DHCPDISCOVER or DHCPREQUEST, and the OpenWRT router will give them their IP. (I assume dnsmasq does some kind of DAD - seems like it does a ping-check for at least DHCPDISCOVER before OFFER, reading the internet).

If a host has been assigned a static IP in the config, that's what it gets.

Your problem seems to be that a host had an IP, from a dynamic pool, and a human and/or software on that host then took a dynamically assigned IP and hard-coded it in places. Relying on the DHCP server to maintain persistent state of the dynamic IP assignment, and assigning the same (dynamic) IP to hosts. ?

Firstly, the DHCP server /could/ give out the address the client requests, if not in use. Like dnsmasq does on OpenWRT. If the client still has the address.

If the client has removed its IP, cause of whatever timeout, and doesn't know to request it cause that state is gone, and the server doesn't have that state either, well... if this kind of a thing is a critical issue, it seems like the correct fix is to just configure a static IP in the DHCP server config, if your hosts are relying on static IPs?

Change address on every DHCP renewal

Posted Aug 27, 2025 16:20 UTC (Wed) by farnz (subscriber, #17727) [Link] (11 responses)

While you've got the problem correct, you're missing one key point: the clients appear to have static IPs from the perspective of the users of the system; the IPs are technically dynamic, but can stay unchanged for years.

The correct fix is to put a static MAC→IP mapping in config for anything that's hard coded. However, that leads to a human problem; users hard-code the IP as a "temporary quick fix", and thus never ask the DHCP server admin to put a static MAC→IP mapping in place (because this is temporary and will be removed soon), and because the "dynamic" address never changes, this "temporary quick fix" can become semi-permanent because nobody gets round to doing the "proper" fix, or asking for a static address to be configured in DHCP.

So, how do you ensure that the "temporary quick fix" cannot work for more than a day or so, ensuring that the person who did a "temporary quick fix" still remembers what they did when it next breaks?

Change address on every DHCP renewal

Posted Aug 27, 2025 17:31 UTC (Wed) by paulj (subscriber, #341) [Link] (10 responses)

Hmm, your problem lies in the domain of humans, more than the machine. The machine has a way for you to express what you want. ;)

IPv6 may help fix this issue. It has a local, on-link address space large enough to let clients use their own address suffix with (next to) no fear of conflict (and having to pick something else). There are at least 2 ways an IPv6 host can give itself a stable address, without having to rely on the server a) use a MAC address derived suffix; b) using a "stable private" address.

Change address on every DHCP renewal

Posted Aug 27, 2025 17:35 UTC (Wed) by farnz (subscriber, #17727) [Link] (9 responses)

Absolutely - the root cause of the problem is humans assuming that because they've never seen something change, it will never change. The technical question is "how do you make sure that, without unacceptable disruption, IP addresses change unless configured to stay static?", so that humans see them change and know that this is normal.

And IPv6 makes the solution space much easier to work in; you're expected to have multiple addresses per host (indeed, DHCPv6 explicitly supports this when it's assigning addresses, not SLAAC or manual config), and you've got much more address space to work in, so that you can have self-organising stable addresses as you describe.

Change address on every DHCP renewal

Posted Aug 28, 2025 12:19 UTC (Thu) by paulj (subscriber, #341) [Link] (8 responses)

I don't see a technical problem, I see a human problem: "How do we get humans to formally record an expectation with regard to the assignation of a limited resource, so the wider system can grant it?". It's a training and process problem.

You either need (near) infinite addresses, or you need to train your users to follow a process.

Change address on every DHCP renewal

Posted Aug 28, 2025 13:23 UTC (Thu) by farnz (subscriber, #17727) [Link] (7 responses)

The technical problem is "human did not get actionable feedback when they failed to follow the process".

There's two things coming into play here:

Humans will take shortcuts without considering the system as a whole, and unless you have a feedback mechanism to catch this, it will be blamed on the system, not the humans who made a mistake - not least because the person who did the "quick, temporary" fix might well have left the company, so there's no way to retrain them, but the system is still here, and broken.
No matter how good your training and processes are, there will be a non-zero human error rate. Per this table from a journal article in the nuclear industry, you can expect a human error probability on a task between 0.009% for trivial tasks, going up as far as 30% on hard tasks.

That second point means that you need the technical process to back the human process - if the human fouls up (and they will, at some point), you need the technical process to make sure that the foul-up becomes clear quickly.

It's similar to kernel development in that regard; if I tell Linus "hey, latest rc doesn't work, rc1 did, this is the symptoms", I'm quite likely to get a diagnosis of the problem quickly, and a fix. If I say "hey, 6.7 doesn't work, latest rc doesn't work, 3.2 did, this is the symptoms", I'm not going to get a huge amount of sympathy.

Change address on every DHCP renewal

Posted Aug 28, 2025 13:40 UTC (Thu) by paulj (subscriber, #341) [Link] (5 responses)

There is no way for the system to know the human fouled up. How is a host or a DHCP server, or some combination of these and/or other distributed system supposed to - in a generalised way - know that a human has recorded the dynamically assigned IP address of a host somewhere, with the expectation that it will be stable, under the constraint that the address is signed from a small, finite pool (i.e., the IPv4 case)?

I can't think of any way to solve that problem.

The only thing I can see is that you attack that limiting constraint, which can - for many purposes (and almost definitely if we're talking "a human could write it down somewhere", given how that rate limits things) - be achieved by going to an IPv6 network and using some deterministic stateless address assignment.

Change address on every DHCP renewal

Posted Aug 28, 2025 13:50 UTC (Thu) by farnz (subscriber, #17727) [Link] (4 responses)

There is a way to introduce feedback, however; if the host changes address frequently, then instead of the pattern being "quick temporary fix done, new fire comes up so it's never redone properly, system fails 2 years later, blame the system", it's "quick temporary fix done, new fire comes up so it's never redone properly, system fails tomorrow, remember the quick temporary fix and make it permanent".

The technical problem is that you're not getting feedback that you've fouled up in a reasonable timescale after making the mistake; the system cannot know that you've fouled up, but it can be reworked so that either it's impossible to foul up (address space large enough that all addresses are deterministic, thus static), or so that the system breaks reasonably soon after you fouled up, not years later (addresses either static, or forcibly changed every 24 hours).

Change address on every DHCP renewal

Posted Aug 28, 2025 15:29 UTC (Thu) by paulj (subscriber, #341) [Link] (3 responses)

In IPv4 there isn't a good, generally agreed way to rotate addresses - without breaking ongoing connections.

As others have pointed out, in IPv6 there is standardised support, and it would be possible. Don't know if there are common DHCPv6 servers that do it - should be possible though. There are ISP systems that deliberately change IP addresses at regular intervals, to stop residential connections from benefiting from static addresses.

Change address on every DHCP renewal

Posted Aug 28, 2025 16:50 UTC (Thu) by farnz (subscriber, #17727) [Link] (2 responses)

I'm one of the people who's pointed out that IPv6 has standardised support for rotating addresses :-)

And yes, there's no good way to do this in IPv4; which means that you've got a choice between doing it badly (thus ensuring that foul ups break near to the time of the foul up) or not doing it at all (and dealing with the fallout when there's a cascade failure that runs into the foul up).

But there's a general tendency for humans to assume that everything that works is working as intended, and that no foul ups have happened - after all, if a foul up had happened, things would stop working. Any time someone says "I've never had a problem doing this before", you're hearing "this used to work, therefore it must be the right thing to do, and not something that happened to work coincidentally".

This leads to a soft-skills requirement on technology - things that work should either be the right thing, or should break quickly so that the person setting it up doesn't go "this has been working for months - you must have done something". Otherwise, the technology gets the blame for the human problems.

Change address on every DHCP renewal

Posted Aug 29, 2025 9:28 UTC (Fri) by paulj (subscriber, #341) [Link] (1 responses)

At a certain point, you just have to teach humans how to work with the system as it is - while waiting for the AGI that will anticipate your every need, before you even know them.

Change address on every DHCP renewal

Posted Aug 29, 2025 10:06 UTC (Fri) by farnz (subscriber, #17727) [Link]

Sure, and we know that the best way to ensure that humans actually learn that lesson is to not let things appear to work for a long period before breaking due to a human error. Humans need feedback fairly close in time to the mistake in order to learn from their errors. Humans are also guaranteed to have a non-zero undetected error rate - from 0.009% for trivial tasks, to 30% for very difficult tasks - and we need feedback to get us to detect the errors.

That's been a very hard lesson for the commercial aviation and nuclear industries to learn - but we don't have to relearn it from scratch, we can learn from their mistakes.

Change address on every DHCP renewal

Posted Aug 28, 2025 14:08 UTC (Thu) by Wol (subscriber, #4433) [Link]

> That second point means that you need the technical process to back the human process - if the human fouls up (and they will, at some point), you need the technical process to make sure that the foul-up becomes clear quickly.

And what you DON'T do (cit Dick Feynmann, Challenger disaster) is leave a couple of jobs in the system for the human to do, to make them feel involved. Either automate them out of the process entirely, or involve them as much as you can.

I think most recent airline disasters are down to the fact that even very experienced pilots with thousands of hours in their log books, have actually spent very little time *at the controls*. On a ten-hour flight, George probably accrues 9.5 hours, with the humans acquiring half an hour real experience between them. So when something goes wrong, they actually have very little knowledge/feel of *how to fly*. Which in an emergency is an absolute necessity!

Cheers,
Wol

Change address on every DHCP renewal

Posted Aug 26, 2025 17:24 UTC (Tue) by dskoll (subscriber, #1630) [Link] (3 responses)

That's what I do on my home network. For machines that need a fixed IP, or at least for which it's convenient to have a fixed IP, I allocate a specific address. For others, I let them grab an address from the dynamic pool.

And sure, you have to manage it.

Change address on every DHCP renewal

Posted Sep 22, 2025 19:01 UTC (Mon) by fest3er (guest, #60379) [Link] (2 responses)

... and that was the intent. Would an admin prefer (1) to visit every system to set its addressing, or (2) add MAC<->IP mappings to a central configuration that every host queries on startup? I prefer the latter.

As to the CRL problem, I think an expired equine is being flogged. I think that end-to-end encryption is nearing the end of—or was abused beyond—its usefulness. The problem is that nearly the entire industry forces security verification to be performed on insecure systems; face it, Windows, Mac, Linux, BSD, Haiku and all other user-installed and -maintained systems are inherently insecure. Does anyone *really* expect a non-expert user to properly maintain the security of his own system?

The correct solution is OE (host-gateway/gateway-host/host-host encryption). It will be centrally managed (much like centralized IP address management). It will be managed much closer to home. If you can't trust your own gateway, you might as well unplug all of your systems and return to pencil and paper.

In short, it comes down to trusting a far-off, nameless, faceless 'authority' to do the right thing. I don't know about any of you all, but I would far more readily trust the local expert in my home or in my neighborhood with whom I can speak face-to-face than the large, central, all-controlling national government or a self-publicized charlatan expert half-way around the planet. End-to-end encryption that depends on far-away people who are more concerned about profit than security is a losing proposition. End-to-end encryption should be reserved solely for situations that truly require it, such as banking, voting, and site-to-site VPNs.

Change address on every DHCP renewal

Posted Sep 23, 2025 8:07 UTC (Tue) by taladar (subscriber, #68407) [Link] (1 responses)

Your assumption that there is a trusted local network is generally considered quite outdated by today's security models. With mobile devices it usually isn't the case and even with home networks you tend to not have any "local expert".

Change address on every DHCP renewal

Posted Sep 23, 2025 12:57 UTC (Tue) by Wol (subscriber, #4433) [Link]

Yup. I guess the hardware on the warehouse floor is "trusted" in the sense that we physically control it (but it could still be hacked).

But certainly as far of the rest of it (employee laptops for certain, pretty much all computers that I am aware of), they are considered untrusted end points that need to validate before they are allowed to do anything.

Cheers,
Wol

Change address on every DHCP renewal

Posted Aug 26, 2025 17:52 UTC (Tue) by NYKevin (subscriber, #129325) [Link] (2 responses)

> Do you have a regular (slightly randomised, since renewal is not on a strict timetable) blip at renewal time, where the server sends DHCPNAK in reply to any DHCPREQUEST not paired with a "recent" offer and requires a client to get a new DHCPOFFER and new IP? Pros: anything that depends on addresses not changing gets found. Cons: all open connections, both inbound and outbound, are forcibly disconnected when the old address is removed, which (in IPv4) happens before the new address is added.

For devices used by humans (workstations, phones, etc.): Configure this to happen at ~4 AM local time, and very few people will care. In the event that you do need some humans working those hours (e.g. at hospitals), use the next solution instead, or schedule those humans' devices to force a re-discover at a different time (if feasible).

For devices that are serving traffic, or otherwise cannot suffer a total site outage all at once: Spread the DHCPNAKs over several hours during your daily trough (the period when usage is lowest every day), such that you lose no more than one device every few minutes or so. If your error budget does not allow for this, then only flip over a fraction of all devices every day (preferably, split the devices into stable cohorts so that every device is flipped once every few days). If you do not have an error budget, then on paper, your requirements are not compatible with doing this at all (see next item). You should consider whether or not you really intend to provide that level of availability, and for that matter whether you have the necessary resources to do so.

For devices that absolutely, positively must not go down under any circumstances: Obviously, you cannot do this at all, because even the tiniest blip is a catastrophic failure, under this standard of availability. Instead, you should be hardening the network so that it does not go down (e.g. put the router on a UPS, get multiple discontiguous fiber links from separate ASes, etc.). Probably it's also worth the bother of configuring the device with a static address just in case, but that's almost the least important part. You may have noticed that this is extremely expensive. That's the point. "Absolutely must not go down ever" is a very high bar. You are not going to get there by clever configuration of software alone.

Change address on every DHCP renewal

Posted Sep 2, 2025 14:13 UTC (Tue) by vonbrand (subscriber, #4458) [Link] (1 responses)

My phone switches seamlessly between WiFi and 5G, even in the middle of e.g. watching YouTube. Haven't tried changing WiFi connection mid-stream, however.

Change address on every DHCP renewal

Posted Sep 3, 2025 9:05 UTC (Wed) by farnz (subscriber, #17727) [Link]

YouTube is not a good test for seamless switching at the network layer, since it makes many small requests, and copes well with connection drop (by design - YouTube is meant to work even on bad networks).

YouTube (and other streaming media platforms) work by stitching together a lot of small (typically 10 seconds or so) clips at the client side, and YouTube is set up so that if you download the first part of a clip, then change network, it's possible to download the rest of that clip with a range request, instead of a full redownload.

It also adapts to network conditions by using lower bitrate clips if requests are either not completing in full, or taking too long (since 10 seconds at 400 kbit/s is less data to transfer than 10 seconds at 2 Mbit/s).

To test properly, you'd need a server side under your control, and you'd be watching for connection drops of something long-lived like an SSH connection. And, of course, the best fix is to not depend on long-lived connections to begin with - you're in the best possible place if connections can drop freely without issue, and you always find systems by mDNS or similar, not IP.

Short lived certificate

Posted Aug 26, 2025 19:02 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]

MPTCP doesn't work that well in practice, and QUIC assumes that the _server_ address stays fixed.

Short lived certificate

Posted Aug 26, 2025 9:32 UTC (Tue) by claudex (subscriber, #92510) [Link]

This depends on the definition of "short lived". From my understanding, in this case, short lived is the 6 days certificate from Let's encrypt. And the plan is at most 47 days validity in 2029, and won't be less than 30 before a long time. The current consensus is that the CRL/OCSP is still needed for validity of more than 6 days. So these "long lived" (more than 6 days) certificates will still be available for enterprise users.

Linux distributions

Posted Aug 27, 2025 1:23 UTC (Wed) by cesarb (subscriber, #6266) [Link] (1 responses)

> Ellie Kanning, who started the discussion, filed follow-up bug reports with OpenSSL, NetworkManager, and Freedesktop's desktop specifications, but none of the projects thought that maintaining a system-wide CRL was in-scope for them.

To me, that sounds like the sort of job traditionally done by Linux distributions. A Linux distribution can invent a shared mechanism, and patch all relevant software it distributes to conform to that. For instance, Debian invented a shared menu system, to which all applications distributed by Debian added menu entries, and all desktop environments distributed by Debian were patched to display these menu entries.

Once it's proven to work, it can be copied by other distributions (like RPM-based distributions copied the Debian "alternatives" mechanism, even though it was originally closely tied to Debian's package manager), and even standardized (like systemd standardized on Debian's /etc/hostname over other distribution's equivalents).

Linux distributions

Posted Aug 27, 2025 8:23 UTC (Wed) by Wol (subscriber, #4433) [Link]

> like systemd standardized on Debian's /etc/hostname over other distribution's equivalents

I'm pretty certain that convention pre-dates Debian, heck it pre-dates linux.

I remember using it on some Unix that was a bastardised SysV/BSD.

Cheers,
Wol

Mozilla's CRLite

Posted Aug 27, 2025 19:33 UTC (Wed) by jag (subscriber, #3766) [Link]

Any solution on Linux should probably use CRLite:

https://hacks.mozilla.org/2025/08/crlite-fast-private-and...

Local OCSP?

Posted Aug 27, 2025 21:13 UTC (Wed) by pj (subscriber, #4506) [Link] (1 responses)

It's mentioned that other OSs handle this... what do they do? I can think of multiple possibilities off the top of my head:

1. maybe decide/define a well-known location for CRLs, like we do for certs and cert chains. Or maybe add a layer of indirection by defining where to find a pointer to said location (config file, soft link, whatever) If FHS changes are in the works, maybe these things could show up there?
2. maybe promulgate a local OCSP server that can be the cache. Then all the apps that talk OCSP can just talk to localhost, kind of like how talking to multiple DNS upstreams is resolved by running a local nameserver.
3. maybe CRL support could get rolled into OpenSSL such that cert verification (which already is usually its own flag, I believe?) now does CRL things as well?

This seems more a social problem than a technical one.

Local OCSP?

Posted Aug 28, 2025 12:34 UTC (Thu) by daroc (editor, #160859) [Link]

As I understand it, macOS centralizes this check through their Keychain service. An app makes a request to know whether a certificate is valid, and the service walks the chain of trust to a trust anchor, checking revocation at each step along the way.

And OpenSSL already supports this kind of thing — you "just" have to enable CRL checking and tell it where to find the CRL file. So yes, you're totally right that this is more of a social problem than a technical one. In my experience, the open source community is generally quite good at solving technical problems, so all the problems that remain so for long are social.

Forgot distros?

Posted Aug 28, 2025 6:16 UTC (Thu) by zdzichu (subscriber, #17118) [Link] (2 responses)

> Most [applications] fill the content of these root stores by copying Mozilla's root store

Uhm, no? Most applications do not deal with cert stores, because there is one store provided and updated by the distribution. It often comes from Mozilla (https://fedoraproject.org/wiki/CA-Certificates , https://tracker.debian.org/pkg/ca-certificates ), but distros are competent enough not to botch it.

I don't see why CRL couldn't be managed by distros the same way.

Forgot distros?

Posted Aug 28, 2025 7:21 UTC (Thu) by taladar (subscriber, #68407) [Link] (1 responses)

Usually there is (at least) two because the JVM insists on doing its own thing.

Forgot distros?

Posted Aug 28, 2025 8:19 UTC (Thu) by zdzichu (subscriber, #17118) [Link]

On Fedora, Java stack uses system CA list: https://docs.fedoraproject.org/en-US/quick-docs/using-sha...

fetch-crl

Posted Aug 29, 2025 7:45 UTC (Fri) by amadio (subscriber, #96152) [Link]

Some communities (e.g. EGI, WLCG), use a tool called fetch-crl to manage CRLs. It's available both in EPEL and in Debian.

GnuPG's dirmngr can to CRLs

Posted Aug 29, 2025 16:06 UTC (Fri) by ber (subscriber, #2142) [Link]

Note that the dirmngr coming with GnuPG (since many years), can do CRLs and is a daemon serving requests, it maybe a basis for a cache.

See https://www.gnupg.org/documentation/manuals/gnupg/Invokin...

Interesting suggestion

Posted Aug 31, 2025 22:04 UTC (Sun) by SLi (subscriber, #53131) [Link]

Reading the issues in the various projects (all pretty much closed as not in scope), I found this comment quite interesting:

https://github.com/openssl/openssl/issues/28186#issuecomm...

Basically, it suggests that the only reasonable way ahead could be to build some cooperation in how distros do this: Currently there are apparently three ca-certificate distro packages (Fedora, OpenSuse, Debian), all somehow derived from the Mozilla certificate packs. And Mozilla "apparently" doesn't want to be the upstream, and there's a cryptic (but I trust warranted) "Definitely do not pick Debian's".

All this sounds to me like there's some mapping to be done in the social space. Things that are quite nonobvious to me:

1. In what way are the Mozilla specific attributes critical for security (as mentioned in the article, that stripping them is insecure)?
2. What do the three distros do differently, and why? I'd hope the maintainers even talk informally every now and then, although apparently a mid-stream "adapted from Mozilla" project does not exist.
3. What does it mean that Mozilla, allegedly, doesn't want to be the upstream? Probably that their focus is Firefox and other Mozilla products. Hopefully not that they'd go to much lengths to sabotage anything. Could beneficial-to-everyone solutions exist?

And, of course, that still leaves the question of whether this realistically solves things that are not solved by shorter certificate lifetimes. I'd certainly hope so, because 47 days is a long time, but the reality seems to be messy.

(There was also a suggestion of p11-kit as a possible upstream in an earlier comment in that issue. I have little idea what that even is.)