A DNS flag day

By Jake Edge
January 23, 2019

A flag day for DNS is coming on February 1; it may have escaped notice even though it has been planned for nearly a year. Some DNS servers will simply be marked as "dead" by much of the rest of the internet on or after that day, which means that domain owners need to ensure their DNS records will still be available after that point. A longstanding workaround for non-compliant servers will be dropped—mostly for better performance but also in support of DNS extensions, some of which can help alleviate security problems.

The Domain Name System, or DNS, is a foundational service on the internet. It is, of course, what connects domain names, like lwn.net, with their IP addresses. Without DNS records, and a server that will provide those records in response to queries, a domain is effectively "off the net". DNS provides for lookups of various types of information beyond just IP addresses, such as policy information for Sender Policy Framework (SPF) or keys for DomainKeys Identified Mail (DKIM).

The problem that has led to the flag day stems from DNS servers that do not implement the "Extension Mechanisms for DNS", also known as EDNS(0) or just EDNS; it is specified in RFC 6891. EDNS was introduced in 1999 and finalized in 2013. Some servers do not properly respond to requests that ask if they support various EDNS features. It is important to note that there is no requirement that the servers actually support any extensions, just that they reply properly (with a normal DNS response), rather than ignoring and not answering EDNS queries.

The lack of a reply to a query that uses EDNS coming from the non-compliant servers led to a workaround. When a requesting server notices that a query containing EDNS has timed out, it will send the query again without the EDNS content. That timeout ranges from five to ten seconds, depending on the server in question, which results in a major slowdown. DNS resolvers that use the workaround are also potentially subject to targeted denial-of-service exploits if an attacker can force all of their queries into the time-out state. Beyond that, some firewalls are dropping DNS queries with EDNS options, which also causes the workaround to be used.

In order to stop that misbehavior, the major players in the DNS world have gotten together to declare the flag day. After February 1, major DNS service providers will stop working around these buggy implementations. In addition, many DNS resolvers (e.g. BIND, Unbound, PowerDNS) will be (or have been) updated to stop doing the workaround as well. Once users start updating their servers to those newer versions, DNS servers beyond those run by the big providers will also stop working around the bug. It is an obvious power play to force the buggy, non-compliant DNS servers to either fix their bugs or for their users to switch away to something compliant.

One of the features that EDNS enables is DNS cookies, which is a simple mechanism that can provide some protection against denial-of-service, amplification, and cache-poisoning attacks. EDNS is also needed to implement DNSSEC. The lack of support for EDNS is holding up important defensive mechanisms that could be more widely deployed and the workaround is just making things worse, which is what led to the flag day.

Domain owners can check if their domains will be affected by using a web form at the "DNS flag day" site that was linked above. Internet Systems Consortium (ISC), which is the organization behind the popular open-source BIND DNS server, also provides the EDNS Compliance Tester (ednscomp), which can give additional information to those running DNS servers. Anyone with their own domains should consult these tools—and soon.

In some ways, this provides an object lesson in the perils of working around buggy implementations. Those workarounds eventually get cast in stone, which allows oblivious users (and vendors) to keep rolling along without dealing with the shortcomings of the software they provide or run. That can stifle innovation and, ultimately, prevent more secure options from being deployed. Given that the internet threat environment is always changing (and rarely, if ever, for the better), it is important to allow maximum flexibility, especially for important protocols like DNS.

A DNS flag day

Posted Jan 24, 2019 5:41 UTC (Thu) by marcH (subscriber, #57642) [Link] (11 responses)

> In some ways, this provides an object lesson in the perils of working around buggy implementations. Those workarounds eventually get cast in stone, which allows oblivious users (and vendors) to keep rolling along without dealing with the shortcomings of the software they provide or run. That can stifle innovation and, ultimately, prevent more secure options from being deployed.

https://en.wikipedia.org/wiki/Robustness_principle#Criticism
(conservative in what you do, liberal in what you accept)

A DNS flag day

Posted Jan 24, 2019 10:26 UTC (Thu) by peter-b (guest, #66996) [Link] (7 responses)

> https://en.wikipedia.org/wiki/Robustness_principle#Criticism
> (conservative in what you do, liberal in what you accept)

Over time, I've come to be of the opposite view: be absolutely pedantic about what you accept, and occasionally evil in what you send (so that there's a cost to implementations that aren't pedantic about what they accept).

This is the real way to end up with robust, reliable and well-tested software. The "robustness principle" leads to a race to the bottom where everyone adopts a "screw it, it's good enough, so why pay attention to getting the corner cases right?"

A DNS flag day

Posted Jan 24, 2019 13:06 UTC (Thu) by farnz (subscriber, #17727) [Link] (6 responses)

There's a pattern in slightly counterintutive results for software engineering here:

It's better to build your software to crash as soon as it detects a broken invariant, than to try to carry on despite the problem.
It's better to build on top of multiple low-reliability computers and design the system to cope when one fails unexpectedly than to build atop a highly reliable system and cope when it goes down for maintenance.
It's better to be pedantic about what you accept and cope with the resulting failure, than to be liberal and then tolerate bad implementations.

I'm sure there's other cases where the long-term results of failing fast are better than trying to push on past a failure.

A DNS flag day

Posted Jan 26, 2019 19:48 UTC (Sat) by biergaizi (subscriber, #92498) [Link] (5 responses)

The Harmful Consequences of Postel's Maxim
https://tools.ietf.org/html/draft-thomson-postel-was-wron...

2. The Protocol Decay Hypothesis

[...]

An implementation that reacts to variations in the manner advised by
Postel sets up a feedback cycle:

o Over time, implementations progressively add new code to constrain
how data is transmitted, or to permit variations what is received.

o Errors in implementations, or confusion about semantics can
thereby be masked.

o As a result, errors can become entrenched, forcing other
implementations to be tolerant of those errors.

An entrenched flaw can become a de facto standard. Any
implementation of the protocol is required to replicate the aberrant
behavior, or it is not interoperable. This is both a consequence of
applying Postel's advice, and a product of a natural reluctance to
avoid fatal error conditions. This is colloquially referred to as
being "bug for bug compatible".

3. The Long Term Costs

Once deviations become entrenched, there is little that can be done
to rectify the situation.

For widely used protocols, the massive scale of the Internet makes
large scale interoperability testing infeasible for all a privileged
few. Without good maintenance, new implementations can be restricted
to niche uses, where the prolems arising from interoperability issues
can be more closely managed.

[...]

Protocol maintenance can help by carefully documenting divergence and
recommending limits on what is both acceptable and interoperable.
The time-consuming process of documenting the actual protocol -
rather than the protocol as it was originally conceived - can restore
the ability to create and maintain interoperable implementations.

Such a process was undertaken for HTTP/1.1 [RFC7230]. This this
effort took more than 6 years, it has been successful in documenting
protocol variations and describing what has over time become a far
more complex protocol.

4. A New Design Principle

The following principle applies not just to the implementation of a
protocol, but to the design and specification of the protocol.

Protocol designs and implementations should be maximally strict.

Though less pithy than Postel's formulation, this principle is based
on the lessons of protocol deployment. The principle is also based
on valuing early feedback, a practice central to modern engineering
discipline.

4.1. Fail Fast and Hard

Protocols need to include error reporting mechanisms that ensure
errors are surfaced in a visible and expedient fashion.

4.2. Implementations Are Ultimately Responsible

Implementers are encouraged to expose errors immediately and
prominently in addition to what a specification mandates.

4.3. Protocol Maintenance is Important

Protocol designers are strongly encouraged to continue to maintain
and evolve protocols beyond their initial inception and definition.
If protocol implementations are less tolerant of variation, protocol
maintenance becomes critical.

A DNS flag day

Posted Jan 26, 2019 23:29 UTC (Sat) by marcH (subscriber, #57642) [Link] (4 responses)

> 4.1. Fail Fast and Hard
> Protocols need to include error reporting mechanisms that ensure
> errors are surfaced in a visible and expedient fashion.

Hi, firewalls!

Is there any big company where the IT department can be made accountable for severe losses of productivity? Curious whether they have any job offers right now.

A DNS flag day

Posted Jan 27, 2019 10:30 UTC (Sun) by mpr22 (subscriber, #60784) [Link]

Define "made accountable".

Ideally, in a way that still makes people want to work in that IT department.

A DNS flag day

Posted Jan 29, 2019 23:51 UTC (Tue) by intgr (subscriber, #39733) [Link] (2 responses)

> Hi, firewalls!

This!

ICMP provides a perfectly good mechanism to report back forbidden packets. But for some odd reason it's considered best practice to instead blackhole disallowed packets.

In more than one case, a missing firewall rule and the blackhole approach together turned a simple mistake into a cascading failure of multiple systems waiting for timeouts.

A DNS flag day

Posted Feb 5, 2019 15:22 UTC (Tue) by JFlorian (guest, #49650) [Link] (1 responses)

My understanding of this is that it's all about information disclosure. In other words, it's best practice to fail hard with ICMP forbidden on internal facing connections, but to silently drop on external ones. Is there a strong argument that ICMP forbidden in both directions doesn't really present any additional risk? I certainly get the advantages (I've been caught out by my own firewall rules more times than I can count), but the disadvantages can be of the type you don't know until you've been burned.

A DNS flag day

Posted Feb 5, 2019 16:45 UTC (Tue) by nybble41 (subscriber, #55106) [Link]

Assuming they already know your assigned IP range, does always responding to incoming connections with ICMP forbidden (including for unknown internal IPs) really leak significantly more information than silently dropping the packets?

djbdns

Posted Jan 24, 2019 14:03 UTC (Thu) by mirabilos (subscriber, #84359) [Link] (1 responses)

I may need some help interpreting the output of the EDNS tester.

I run djbdns, which provides an axfrdns(1) utility that listens on TCP port 53.

The results are:

dns=ok edns=noopt edns1=noerror,noopt,soa edns@512=noopt ednsopt=noopt edns1opt=noerror,noopt,soa do=noopt ednsflags=noopt docookie=noopt edns512tcp=noopt optlist=noopt

Do I need to worry?

djbdns

Posted Jan 24, 2019 14:04 UTC (Thu) by mirabilos (subscriber, #84359) [Link]

Ah: “day must not have timeout result in any of plain DNS and EDNS version 0 tests” so no, I don’t.

A DNS flag day

Posted Jan 25, 2019 6:10 UTC (Fri) by jani (subscriber, #74547) [Link]

The Crapustness Principle: The software design guideline based on the illusion of ability to happily process any crap that you're given as if it was actually meaningful. This is when GIGO and the robustness principle collide.

A DNS flag day

Posted Jan 26, 2019 15:30 UTC (Sat) by naptastic (guest, #60139) [Link] (3 responses)

Hey everybody: Can we do this with IPv6 too? Please? <3

A DNS flag day

Posted Jan 26, 2019 20:29 UTC (Sat) by biergaizi (subscriber, #92498) [Link] (2 responses)

A DNS flag day

Posted Mar 5, 2019 7:42 UTC (Tue) by immibis (subscriber, #105511) [Link] (1 responses)

Removing Layer-2 networking (and therefore ARP) is just not going to happen - it's too entrenched in existing hardware and protocols.
Five years ago as a university student, I would've agreed with you.
Now, as a software engineer at a networking hardware vendor, I know there's just no way, because things are too tangled already.

Even if not for that, you'll just end up in an XKCD 927 situation. Adding the new option of {IPv6 + let's call it "IPv6L2"} does not make the {IPv6 + Ethernet} option go away. You weren't crazy enough to think I'll throw out my hundreds or thousands or millions of dollars of Ethernet NICs and switches and routers were you? The *very best* case is that I keep all that equipment and it interoperates nicely with IPv6L2. And at that point why shouldn't I keep getting Ethernet equipment so I only have one L2 protocol to manage?

It's not that Ethernet and ARP are good, but at the very least, they're not all that harmful and they're entrenched, so good luck getting rid of them now. It's like saying the next version of Windows will run on RISC-V.

And what the heck will you do if I want to run it over Infiniband? And in the future someone will want to unify Ethernet with IPv6L2...

A DNS flag day

Posted Mar 5, 2019 9:28 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]

The problem is not Ethernet per se. It's the most ubiquitous medium of transfer but it's by no means the only one. There's also MPLS and even VPNs can be thought as if they are very complicated L2 networks.

The first issue is that IPv6 actually had not solved the problem of multihoming and roaming. It totally could have, without changing the Ethernet layer. But unfortunately, the IETF did not have foresight for this. QUIC (and HTTP/3) are attempting to fix this, but it's probably too late.

The second issue is hierarchic adressing. Ethernet addresses are flat, they don't have any structure. IPv6 -could- have been used to allow automatic hierarchic delegation: your router gets IPs from the ISP and delegates a part of the address space to a smart light switch, a smart switch then acts as a gateway for power line network, giving out each smart light bulb a separate IPv6 subspace, and so on.

IPv6 theoretically supports this with DHCP PD, but... there's not enough bits in IPv6 for it! You will at most get a /48 from your ISP, which only gives you 16 bits to play with. And /56 or even /60 allocations are not at all uncommon. Also, even the DHCP PD standard was finalized only in 2003.

A DNS flag day

Posted Jan 31, 2019 14:52 UTC (Thu) by emmanuelF (guest, #130158) [Link] (1 responses)

Please note that even a compliant DNS dating "before edns existence" will not be disturbed by a edns request and give proper/compliant answer.
You are not forced to deploy an EDNS aware DNS server.
The problem is just non compliant DNS servers (edns aware or not) and non compliant/non transparent middle boxes/firewalls.
For some authoritative server buggy EDNS implementation, the solution is to disable the EDNS support.

The phrase
"The problem that has led to the flag day stems from DNS servers that do not implement the "Extension Mechanisms for DNS", also known as EDNS(0) or just EDNS"
is wrong and misleading.
EDNS is completely backward compatible with plain non EDNS aware servers.

A DNS flag day

Posted Jan 31, 2019 15:27 UTC (Thu) by emmanuelF (guest, #130158) [Link]

Sorry, correction :
"the solution is to disable the EDNS support." -> "a (bad) solution is to disable the EDNS support."