Short lived certificate
Short lived certificate
Posted Aug 26, 2025 15:08 UTC (Tue) by dskoll (subscriber, #1630)In reply to: Short lived certificate by farnz
Parent article: Linux's missing CRL infrastructure
But what this really means is that we have a very precarious situation. If, let's say, a DHCP server crashes and it hasn't been keeping a persistent database of address assignments, or that database is damaged somehow, then you could end up getting different IP addresses anyway, similar to Wol's situation.
Instead of this being a regular and annoying thing that would prompt a sysadmin to configure static IP addresses, it would happen rarely and most likely at a very inopportune moment.
Posted Aug 26, 2025 15:47 UTC (Tue)
by farnz (subscriber, #17727)
[Link] (29 responses)
And in all cases, the changing IP address gives you a second-order problem; when diagnosing a problem over time, you need to have a time-aware database of IP address to device, so that you can see if weirdness goes with the address or the device.
Posted Aug 26, 2025 16:09 UTC (Tue)
by Wol (subscriber, #4433)
[Link] (25 responses)
Pros: you now do have a fixed IP, managed along with all your dynamic addresses
Cons: you actually now have to manage the stuff ...
Cheers,
Posted Aug 26, 2025 16:35 UTC (Tue)
by farnz (subscriber, #17727)
[Link] (20 responses)
Posted Aug 27, 2025 9:47 UTC (Wed)
by paulj (subscriber, #341)
[Link] (19 responses)
Posted Aug 27, 2025 9:51 UTC (Wed)
by farnz (subscriber, #17727)
[Link] (18 responses)
Perhaps you can clarify how your DHCP server of choice handles this.
Posted Aug 27, 2025 10:05 UTC (Wed)
by Wol (subscriber, #4433)
[Link] (1 responses)
(When I started doing this, I didn't understand how to set up dynamic DNS, and this for my home network was so much simpler.)
As I understand it, what happened at my site was they didn't put the server type stuff into the static half, and relied on the fact that when leases are renewed the IP doesn't change ... except because the down-time was longer than the ttl, the leases weren't renewed, they were re-allocated. And because the IPs were in the hosts file ... whoops!
Cheers,
Posted Aug 27, 2025 10:08 UTC (Wed)
by farnz (subscriber, #17727)
[Link]
Otherwise, we hit the same problem; you have devices that are dynamically assigned addresses, where downtime causes them to move, but the addresses are hardcoded in places you don't know about.
Posted Aug 27, 2025 13:39 UTC (Wed)
by paulj (subscriber, #341)
[Link] (15 responses)
I don't understand the problem you envisage. If dnsmasq knows MAC addr X should get IP Y, it will give out IP Y if MAC X makes a request or a renew. If dnsmasq does not know (cause config was lost or corrupt) that X -> Y, then when MAC X asks, it will get some other IP. I've never had the problem you describe. Closest I've seen is that I've changed a wifi or NIC card/usb stick, and my hosts didn't get its usual static IP - just some random other one from the same pool. Until I update the OpenWRT config with the new MAC (openwrt automatically updates and restarts dnsmasq), and re-join/restart the network on the host.
Something similar can happen with IPv6 DHCP (which is not dnsmasq on OpenWRT, but a different dhcpv6 binary) with no hardware change, because in DHCPv6 static assignments are usually done via the DUID (DHCP UID), not the MAC. And the DUID sometimes can change on hosts (e.g. reinstall), which can be annoying. Still, you just end up with a different address than the static one - not "no address". So v6 the static host suffix (note: note the full IP - the single DHCPv6 suffix can be combined with and sued for multiple on-link prefixes!) survives swapping the network hardware, but not a reinstall on the same hardware.
Posted Aug 27, 2025 14:43 UTC (Wed)
by farnz (subscriber, #17727)
[Link] (14 responses)
When the new DHCP server started up, several systems failed to operate because they had hard-coded IP addresses that were not in config, the time taken meant the client had lost its IP (and was asking for a new DHCPOFFER instead of using DHCPREQUEST to renew the same lease), and the hardware failure meant that the leases database was lost.
How do you transform this into a state where the users discover within a human-reasonable (24 hours or so) timeframe that they'd hardcoded an address without making sure config matches? The two (broad strokes) solutions above are:
If you've got another solution, I'm interested, because this is a hard problem to solve.
Posted Aug 27, 2025 15:34 UTC (Wed)
by Wol (subscriber, #4433)
[Link]
Dunno how it worked :-) but one system I worked on, you put the hostname in /etc/hostname, and added the host/IP to /etc/hosts, and that was how it discovered its own (fixed) IP address.
Cheers,
Posted Aug 27, 2025 15:54 UTC (Wed)
by paulj (subscriber, #341)
[Link] (12 responses)
If a host has been assigned a static IP in the config, that's what it gets.
Your problem seems to be that a host had an IP, from a dynamic pool, and a human and/or software on that host then took a dynamically assigned IP and hard-coded it in places. Relying on the DHCP server to maintain persistent state of the dynamic IP assignment, and assigning the same (dynamic) IP to hosts. ?
Firstly, the DHCP server /could/ give out the address the client requests, if not in use. Like dnsmasq does on OpenWRT. If the client still has the address.
If the client has removed its IP, cause of whatever timeout, and doesn't know to request it cause that state is gone, and the server doesn't have that state either, well... if this kind of a thing is a critical issue, it seems like the correct fix is to just configure a static IP in the DHCP server config, if your hosts are relying on static IPs?
Posted Aug 27, 2025 16:20 UTC (Wed)
by farnz (subscriber, #17727)
[Link] (11 responses)
The correct fix is to put a static MAC→IP mapping in config for anything that's hard coded. However, that leads to a human problem; users hard-code the IP as a "temporary quick fix", and thus never ask the DHCP server admin to put a static MAC→IP mapping in place (because this is temporary and will be removed soon), and because the "dynamic" address never changes, this "temporary quick fix" can become semi-permanent because nobody gets round to doing the "proper" fix, or asking for a static address to be configured in DHCP.
So, how do you ensure that the "temporary quick fix" cannot work for more than a day or so, ensuring that the person who did a "temporary quick fix" still remembers what they did when it next breaks?
Posted Aug 27, 2025 17:31 UTC (Wed)
by paulj (subscriber, #341)
[Link] (10 responses)
IPv6 may help fix this issue. It has a local, on-link address space large enough to let clients use their own address suffix with (next to) no fear of conflict (and having to pick something else). There are at least 2 ways an IPv6 host can give itself a stable address, without having to rely on the server a) use a MAC address derived suffix; b) using a "stable private" address.
Posted Aug 27, 2025 17:35 UTC (Wed)
by farnz (subscriber, #17727)
[Link] (9 responses)
And IPv6 makes the solution space much easier to work in; you're expected to have multiple addresses per host (indeed, DHCPv6 explicitly supports this when it's assigning addresses, not SLAAC or manual config), and you've got much more address space to work in, so that you can have self-organising stable addresses as you describe.
Posted Aug 28, 2025 12:19 UTC (Thu)
by paulj (subscriber, #341)
[Link] (8 responses)
You either need (near) infinite addresses, or you need to train your users to follow a process.
Posted Aug 28, 2025 13:23 UTC (Thu)
by farnz (subscriber, #17727)
[Link] (7 responses)
There's two things coming into play here:
That second point means that you need the technical process to back the human process - if the human fouls up (and they will, at some point), you need the technical process to make sure that the foul-up becomes clear quickly.
It's similar to kernel development in that regard; if I tell Linus "hey, latest rc doesn't work, rc1 did, this is the symptoms", I'm quite likely to get a diagnosis of the problem quickly, and a fix. If I say "hey, 6.7 doesn't work, latest rc doesn't work, 3.2 did, this is the symptoms", I'm not going to get a huge amount of sympathy.
Posted Aug 28, 2025 13:40 UTC (Thu)
by paulj (subscriber, #341)
[Link] (5 responses)
I can't think of any way to solve that problem.
The only thing I can see is that you attack that limiting constraint, which can - for many purposes (and almost definitely if we're talking "a human could write it down somewhere", given how that rate limits things) - be achieved by going to an IPv6 network and using some deterministic stateless address assignment.
Posted Aug 28, 2025 13:50 UTC (Thu)
by farnz (subscriber, #17727)
[Link] (4 responses)
The technical problem is that you're not getting feedback that you've fouled up in a reasonable timescale after making the mistake; the system cannot know that you've fouled up, but it can be reworked so that either it's impossible to foul up (address space large enough that all addresses are deterministic, thus static), or so that the system breaks reasonably soon after you fouled up, not years later (addresses either static, or forcibly changed every 24 hours).
Posted Aug 28, 2025 15:29 UTC (Thu)
by paulj (subscriber, #341)
[Link] (3 responses)
As others have pointed out, in IPv6 there is standardised support, and it would be possible. Don't know if there are common DHCPv6 servers that do it - should be possible though. There are ISP systems that deliberately change IP addresses at regular intervals, to stop residential connections from benefiting from static addresses.
Posted Aug 28, 2025 16:50 UTC (Thu)
by farnz (subscriber, #17727)
[Link] (2 responses)
And yes, there's no good way to do this in IPv4; which means that you've got a choice between doing it badly (thus ensuring that foul ups break near to the time of the foul up) or not doing it at all (and dealing with the fallout when there's a cascade failure that runs into the foul up).
But there's a general tendency for humans to assume that everything that works is working as intended, and that no foul ups have happened - after all, if a foul up had happened, things would stop working. Any time someone says "I've never had a problem doing this before", you're hearing "this used to work, therefore it must be the right thing to do, and not something that happened to work coincidentally".
This leads to a soft-skills requirement on technology - things that work should either be the right thing, or should break quickly so that the person setting it up doesn't go "this has been working for months - you must have done something". Otherwise, the technology gets the blame for the human problems.
Posted Aug 29, 2025 9:28 UTC (Fri)
by paulj (subscriber, #341)
[Link] (1 responses)
Posted Aug 29, 2025 10:06 UTC (Fri)
by farnz (subscriber, #17727)
[Link]
That's been a very hard lesson for the commercial aviation and nuclear industries to learn - but we don't have to relearn it from scratch, we can learn from their mistakes.
Posted Aug 28, 2025 14:08 UTC (Thu)
by Wol (subscriber, #4433)
[Link]
And what you DON'T do (cit Dick Feynmann, Challenger disaster) is leave a couple of jobs in the system for the human to do, to make them feel involved. Either automate them out of the process entirely, or involve them as much as you can.
I think most recent airline disasters are down to the fact that even very experienced pilots with thousands of hours in their log books, have actually spent very little time *at the controls*. On a ten-hour flight, George probably accrues 9.5 hours, with the humans acquiring half an hour real experience between them. So when something goes wrong, they actually have very little knowledge/feel of *how to fly*. Which in an emergency is an absolute necessity!
Cheers,
Posted Aug 26, 2025 17:24 UTC (Tue)
by dskoll (subscriber, #1630)
[Link] (3 responses)
That's what I do on my home network. For machines that need a fixed IP, or at least for which it's convenient to have a fixed IP, I allocate a specific address. For others, I let them grab an address from the dynamic pool.
And sure, you have to manage it.
Posted Sep 22, 2025 19:01 UTC (Mon)
by fest3er (guest, #60379)
[Link] (2 responses)
As to the CRL problem, I think an expired equine is being flogged. I think that end-to-end encryption is nearing the end of—or was abused beyond—its usefulness. The problem is that nearly the entire industry forces security verification to be performed on insecure systems; face it, Windows, Mac, Linux, BSD, Haiku and all other user-installed and -maintained systems are inherently insecure. Does anyone *really* expect a non-expert user to properly maintain the security of his own system?
The correct solution is OE (host-gateway/gateway-host/host-host encryption). It will be centrally managed (much like centralized IP address management). It will be managed much closer to home. If you can't trust your own gateway, you might as well unplug all of your systems and return to pencil and paper.
In short, it comes down to trusting a far-off, nameless, faceless 'authority' to do the right thing. I don't know about any of you all, but I would far more readily trust the local expert in my home or in my neighborhood with whom I can speak face-to-face than the large, central, all-controlling national government or a self-publicized charlatan expert half-way around the planet. End-to-end encryption that depends on far-away people who are more concerned about profit than security is a losing proposition. End-to-end encryption should be reserved solely for situations that truly require it, such as banking, voting, and site-to-site VPNs.
Posted Sep 23, 2025 8:07 UTC (Tue)
by taladar (subscriber, #68407)
[Link] (1 responses)
Posted Sep 23, 2025 12:57 UTC (Tue)
by Wol (subscriber, #4433)
[Link]
But certainly as far of the rest of it (employee laptops for certain, pretty much all computers that I am aware of), they are considered untrusted end points that need to validate before they are allowed to do anything.
Cheers,
Posted Aug 26, 2025 17:52 UTC (Tue)
by NYKevin (subscriber, #129325)
[Link] (2 responses)
For devices used by humans (workstations, phones, etc.): Configure this to happen at ~4 AM local time, and very few people will care. In the event that you do need some humans working those hours (e.g. at hospitals), use the next solution instead, or schedule those humans' devices to force a re-discover at a different time (if feasible).
For devices that are serving traffic, or otherwise cannot suffer a total site outage all at once: Spread the DHCPNAKs over several hours during your daily trough (the period when usage is lowest every day), such that you lose no more than one device every few minutes or so. If your error budget does not allow for this, then only flip over a fraction of all devices every day (preferably, split the devices into stable cohorts so that every device is flipped once every few days). If you do not have an error budget, then on paper, your requirements are not compatible with doing this at all (see next item). You should consider whether or not you really intend to provide that level of availability, and for that matter whether you have the necessary resources to do so.
For devices that absolutely, positively must not go down under any circumstances: Obviously, you cannot do this at all, because even the tiniest blip is a catastrophic failure, under this standard of availability. Instead, you should be hardening the network so that it does not go down (e.g. put the router on a UPS, get multiple discontiguous fiber links from separate ASes, etc.). Probably it's also worth the bother of configuring the device with a static address just in case, but that's almost the least important part. You may have noticed that this is extremely expensive. That's the point. "Absolutely must not go down ever" is a very high bar. You are not going to get there by clever configuration of software alone.
Posted Sep 2, 2025 14:13 UTC (Tue)
by vonbrand (subscriber, #4458)
[Link] (1 responses)
Posted Sep 3, 2025 9:05 UTC (Wed)
by farnz (subscriber, #17727)
[Link]
YouTube (and other streaming media platforms) work by stitching together a lot of small (typically 10 seconds or so) clips at the client side, and YouTube is set up so that if you download the first part of a clip, then change network, it's possible to download the rest of that clip with a range request, instead of a full redownload.
It also adapts to network conditions by using lower bitrate clips if requests are either not completing in full, or taking too long (since 10 seconds at 400 kbit/s is less data to transfer than 10 seconds at 2 Mbit/s).
To test properly, you'd need a server side under your control, and you'd be watching for connection drops of something long-lived like an SSH connection. And, of course, the best fix is to not depend on long-lived connections to begin with - you're in the best possible place if connections can drop freely without issue, and you always find systems by mDNS or similar, not IP.
As so often, you've now got a tradeoff to make:
Change address on every DHCP renewal
Change address on every DHCP renewal
Wol
To make that work, you'd have to disable DHCP pools, so that only known MAC addresses can get an address at all. This is, of course, a perfectly reasonable way to run an industrial network, but means that you don't have dynamic addressing at all.
Change address on every DHCP renewal
Change address on every DHCP renewal
I don't see how to configure dnsmasq such that having assigned an address from a pool, it will maintain that device/address link forever, even in the face of the leases database being corrupted (which, from the sounds of things, is part of what happened at WoL's site). In particular, this means that once an address has been assigned, it's always assigned to that device (even if the pool is exhausted, and even if another device explicitly requests that address), and it's never going to assign a different address to that device (even if the leases database is corrupt, and even if the device requests a different address).
Change address on every DHCP renewal
Change address on every DHCP renewal
Wol
Right, but the only way to prevent that being a problem is to not have the second half at all - either you have a manually allocated address, or you're not on the network.
Change address on every DHCP renewal
Change address on every DHCP renewal
The problem statement is that the old DHCP server failed. A new one was shipped in, with configuration but not persistent state restored from their VCS; by the time the new DHCP server arrived, all devices had lost their IP address.
Change address on every DHCP renewal
Change address on every DHCP renewal
Wol
Change address on every DHCP renewal
While you've got the problem correct, you're missing one key point: the clients appear to have static IPs from the perspective of the users of the system; the IPs are technically dynamic, but can stay unchanged for years.
Change address on every DHCP renewal
Change address on every DHCP renewal
Absolutely - the root cause of the problem is humans assuming that because they've never seen something change, it will never change. The technical question is "how do you make sure that, without unacceptable disruption, IP addresses change unless configured to stay static?", so that humans see them change and know that this is normal.
Change address on every DHCP renewal
Change address on every DHCP renewal
The technical problem is "human did not get actionable feedback when they failed to follow the process".
Change address on every DHCP renewal
Change address on every DHCP renewal
There is a way to introduce feedback, however; if the host changes address frequently, then instead of the pattern being "quick temporary fix done, new fire comes up so it's never redone properly, system fails 2 years later, blame the system", it's "quick temporary fix done, new fire comes up so it's never redone properly, system fails tomorrow, remember the quick temporary fix and make it permanent".
Change address on every DHCP renewal
Change address on every DHCP renewal
I'm one of the people who's pointed out that IPv6 has standardised support for rotating addresses :-)
Change address on every DHCP renewal
Change address on every DHCP renewal
Sure, and we know that the best way to ensure that humans actually learn that lesson is to not let things appear to work for a long period before breaking due to a human error. Humans need feedback fairly close in time to the mistake in order to learn from their errors. Humans are also guaranteed to have a non-zero undetected error rate - from 0.009% for trivial tasks, to 30% for very difficult tasks - and we need feedback to get us to detect the errors.
Change address on every DHCP renewal
Change address on every DHCP renewal
Wol
Change address on every DHCP renewal
Change address on every DHCP renewal
Change address on every DHCP renewal
Change address on every DHCP renewal
Wol
Change address on every DHCP renewal
My phone switches seamlessly between WiFi and 5G, even in the middle of e.g. watching YouTube. Haven't tried changing WiFi connection mid-stream, however.
Change address on every DHCP renewal
YouTube is not a good test for seamless switching at the network layer, since it makes many small requests, and copes well with connection drop (by design - YouTube is meant to work even on bad networks).
Change address on every DHCP renewal