Change address on every DHCP renewal

Posted Aug 27, 2025 14:43 UTC (Wed) by farnz (subscriber, #17727)
In reply to: Change address on every DHCP renewal by paulj
Parent article: Linux's missing CRL infrastructure

The problem statement is that the old DHCP server failed. A new one was shipped in, with configuration but not persistent state restored from their VCS; by the time the new DHCP server arrived, all devices had lost their IP address.

When the new DHCP server started up, several systems failed to operate because they had hard-coded IP addresses that were not in config, the time taken meant the client had lost its IP (and was asking for a new DHCPOFFER instead of using DHCPREQUEST to renew the same lease), and the hardware failure meant that the leases database was lost.

How do you transform this into a state where the users discover within a human-reasonable (24 hours or so) timeframe that they'd hardcoded an address without making sure config matches? The two (broad strokes) solutions above are:

Ensure the address changes anyway (by various means) unless it's in config as a MAC→IP mapping, and make appropriate tradeoffs around connections blipping when the address changes.
Don't allow devices to get an IP unless it's in config as a MAC→IP mapping, so that hardcoding is safe.

If you've got another solution, I'm interested, because this is a hard problem to solve.

Change address on every DHCP renewal

Posted Aug 27, 2025 15:34 UTC (Wed) by Wol (subscriber, #4433) [Link]

Given that the hosts file has to be updated and shipped around, regenerate the DHCPd persistent state by importing the hosts file and say "if the requester says 'my name is X', look up X in the hosts file and return that IP if found".

Dunno how it worked :-) but one system I worked on, you put the hostname in /etc/hostname, and added the host/IP to /etc/hosts, and that was how it discovered its own (fixed) IP address.

Cheers,
Wol

Change address on every DHCP renewal

Posted Aug 27, 2025 15:54 UTC (Wed) by paulj (subscriber, #341) [Link] (12 responses)

The scenario of the DHCP server starting up with config, but having lost lease state is the norm in OpenWRT. It's generally running on small APs and consumer routers out of RO flash with a tmpfs for most runtime state. If you reboot an OpenWRT router, clients will either DHCPDISCOVER or DHCPREQUEST, and the OpenWRT router will give them their IP. (I assume dnsmasq does some kind of DAD - seems like it does a ping-check for at least DHCPDISCOVER before OFFER, reading the internet).

If a host has been assigned a static IP in the config, that's what it gets.

Your problem seems to be that a host had an IP, from a dynamic pool, and a human and/or software on that host then took a dynamically assigned IP and hard-coded it in places. Relying on the DHCP server to maintain persistent state of the dynamic IP assignment, and assigning the same (dynamic) IP to hosts. ?

Firstly, the DHCP server /could/ give out the address the client requests, if not in use. Like dnsmasq does on OpenWRT. If the client still has the address.

If the client has removed its IP, cause of whatever timeout, and doesn't know to request it cause that state is gone, and the server doesn't have that state either, well... if this kind of a thing is a critical issue, it seems like the correct fix is to just configure a static IP in the DHCP server config, if your hosts are relying on static IPs?

Change address on every DHCP renewal

Posted Aug 27, 2025 16:20 UTC (Wed) by farnz (subscriber, #17727) [Link] (11 responses)

While you've got the problem correct, you're missing one key point: the clients appear to have static IPs from the perspective of the users of the system; the IPs are technically dynamic, but can stay unchanged for years.

The correct fix is to put a static MAC→IP mapping in config for anything that's hard coded. However, that leads to a human problem; users hard-code the IP as a "temporary quick fix", and thus never ask the DHCP server admin to put a static MAC→IP mapping in place (because this is temporary and will be removed soon), and because the "dynamic" address never changes, this "temporary quick fix" can become semi-permanent because nobody gets round to doing the "proper" fix, or asking for a static address to be configured in DHCP.

So, how do you ensure that the "temporary quick fix" cannot work for more than a day or so, ensuring that the person who did a "temporary quick fix" still remembers what they did when it next breaks?

Change address on every DHCP renewal

Posted Aug 27, 2025 17:31 UTC (Wed) by paulj (subscriber, #341) [Link] (10 responses)

Hmm, your problem lies in the domain of humans, more than the machine. The machine has a way for you to express what you want. ;)

IPv6 may help fix this issue. It has a local, on-link address space large enough to let clients use their own address suffix with (next to) no fear of conflict (and having to pick something else). There are at least 2 ways an IPv6 host can give itself a stable address, without having to rely on the server a) use a MAC address derived suffix; b) using a "stable private" address.

Change address on every DHCP renewal

Posted Aug 27, 2025 17:35 UTC (Wed) by farnz (subscriber, #17727) [Link] (9 responses)

Absolutely - the root cause of the problem is humans assuming that because they've never seen something change, it will never change. The technical question is "how do you make sure that, without unacceptable disruption, IP addresses change unless configured to stay static?", so that humans see them change and know that this is normal.

And IPv6 makes the solution space much easier to work in; you're expected to have multiple addresses per host (indeed, DHCPv6 explicitly supports this when it's assigning addresses, not SLAAC or manual config), and you've got much more address space to work in, so that you can have self-organising stable addresses as you describe.

Change address on every DHCP renewal

Posted Aug 28, 2025 12:19 UTC (Thu) by paulj (subscriber, #341) [Link] (8 responses)

I don't see a technical problem, I see a human problem: "How do we get humans to formally record an expectation with regard to the assignation of a limited resource, so the wider system can grant it?". It's a training and process problem.

You either need (near) infinite addresses, or you need to train your users to follow a process.

Change address on every DHCP renewal

Posted Aug 28, 2025 13:23 UTC (Thu) by farnz (subscriber, #17727) [Link] (7 responses)

The technical problem is "human did not get actionable feedback when they failed to follow the process".

There's two things coming into play here:

Humans will take shortcuts without considering the system as a whole, and unless you have a feedback mechanism to catch this, it will be blamed on the system, not the humans who made a mistake - not least because the person who did the "quick, temporary" fix might well have left the company, so there's no way to retrain them, but the system is still here, and broken.
No matter how good your training and processes are, there will be a non-zero human error rate. Per this table from a journal article in the nuclear industry, you can expect a human error probability on a task between 0.009% for trivial tasks, going up as far as 30% on hard tasks.

That second point means that you need the technical process to back the human process - if the human fouls up (and they will, at some point), you need the technical process to make sure that the foul-up becomes clear quickly.

It's similar to kernel development in that regard; if I tell Linus "hey, latest rc doesn't work, rc1 did, this is the symptoms", I'm quite likely to get a diagnosis of the problem quickly, and a fix. If I say "hey, 6.7 doesn't work, latest rc doesn't work, 3.2 did, this is the symptoms", I'm not going to get a huge amount of sympathy.

Change address on every DHCP renewal

Posted Aug 28, 2025 13:40 UTC (Thu) by paulj (subscriber, #341) [Link] (5 responses)

There is no way for the system to know the human fouled up. How is a host or a DHCP server, or some combination of these and/or other distributed system supposed to - in a generalised way - know that a human has recorded the dynamically assigned IP address of a host somewhere, with the expectation that it will be stable, under the constraint that the address is signed from a small, finite pool (i.e., the IPv4 case)?

I can't think of any way to solve that problem.

The only thing I can see is that you attack that limiting constraint, which can - for many purposes (and almost definitely if we're talking "a human could write it down somewhere", given how that rate limits things) - be achieved by going to an IPv6 network and using some deterministic stateless address assignment.

Change address on every DHCP renewal

Posted Aug 28, 2025 13:50 UTC (Thu) by farnz (subscriber, #17727) [Link] (4 responses)

There is a way to introduce feedback, however; if the host changes address frequently, then instead of the pattern being "quick temporary fix done, new fire comes up so it's never redone properly, system fails 2 years later, blame the system", it's "quick temporary fix done, new fire comes up so it's never redone properly, system fails tomorrow, remember the quick temporary fix and make it permanent".

The technical problem is that you're not getting feedback that you've fouled up in a reasonable timescale after making the mistake; the system cannot know that you've fouled up, but it can be reworked so that either it's impossible to foul up (address space large enough that all addresses are deterministic, thus static), or so that the system breaks reasonably soon after you fouled up, not years later (addresses either static, or forcibly changed every 24 hours).

Change address on every DHCP renewal

Posted Aug 28, 2025 15:29 UTC (Thu) by paulj (subscriber, #341) [Link] (3 responses)

In IPv4 there isn't a good, generally agreed way to rotate addresses - without breaking ongoing connections.

As others have pointed out, in IPv6 there is standardised support, and it would be possible. Don't know if there are common DHCPv6 servers that do it - should be possible though. There are ISP systems that deliberately change IP addresses at regular intervals, to stop residential connections from benefiting from static addresses.

Change address on every DHCP renewal

Posted Aug 28, 2025 16:50 UTC (Thu) by farnz (subscriber, #17727) [Link] (2 responses)

I'm one of the people who's pointed out that IPv6 has standardised support for rotating addresses :-)

And yes, there's no good way to do this in IPv4; which means that you've got a choice between doing it badly (thus ensuring that foul ups break near to the time of the foul up) or not doing it at all (and dealing with the fallout when there's a cascade failure that runs into the foul up).

But there's a general tendency for humans to assume that everything that works is working as intended, and that no foul ups have happened - after all, if a foul up had happened, things would stop working. Any time someone says "I've never had a problem doing this before", you're hearing "this used to work, therefore it must be the right thing to do, and not something that happened to work coincidentally".

This leads to a soft-skills requirement on technology - things that work should either be the right thing, or should break quickly so that the person setting it up doesn't go "this has been working for months - you must have done something". Otherwise, the technology gets the blame for the human problems.

Change address on every DHCP renewal

Posted Aug 29, 2025 9:28 UTC (Fri) by paulj (subscriber, #341) [Link] (1 responses)

At a certain point, you just have to teach humans how to work with the system as it is - while waiting for the AGI that will anticipate your every need, before you even know them.

Change address on every DHCP renewal

Posted Aug 29, 2025 10:06 UTC (Fri) by farnz (subscriber, #17727) [Link]

Sure, and we know that the best way to ensure that humans actually learn that lesson is to not let things appear to work for a long period before breaking due to a human error. Humans need feedback fairly close in time to the mistake in order to learn from their errors. Humans are also guaranteed to have a non-zero undetected error rate - from 0.009% for trivial tasks, to 30% for very difficult tasks - and we need feedback to get us to detect the errors.

That's been a very hard lesson for the commercial aviation and nuclear industries to learn - but we don't have to relearn it from scratch, we can learn from their mistakes.

Change address on every DHCP renewal

Posted Aug 28, 2025 14:08 UTC (Thu) by Wol (subscriber, #4433) [Link]

> That second point means that you need the technical process to back the human process - if the human fouls up (and they will, at some point), you need the technical process to make sure that the foul-up becomes clear quickly.

And what you DON'T do (cit Dick Feynmann, Challenger disaster) is leave a couple of jobs in the system for the human to do, to make them feel involved. Either automate them out of the process entirely, or involve them as much as you can.

I think most recent airline disasters are down to the fact that even very experienced pilots with thousands of hours in their log books, have actually spent very little time *at the controls*. On a ten-hour flight, George probably accrues 9.5 hours, with the humans acquiring half an hour real experience between them. So when something goes wrong, they actually have very little knowledge/feel of *how to fly*. Which in an emergency is an absolute necessity!

Cheers,
Wol