LWN.net Logo

Potential pitfalls in DNS handling

By Jake Edge
November 14, 2012

The domain name system (DNS) seems relatively straightforward, at least from a high level, but there are some darker corners of the protocol that could easily trip up the unwary—or even the wary. A recent vulnerability in the Exim mail transfer agent shows one such example, but there are more. In fact, Exim developer Phil Pennock, who patched the recent vulnerability, has collected up a number of these places where DNS parsing can go awry.

The Exim hole was a fairly standard buffer overflow, but it came about because of the way DNS messages are structured. When a program requests a TXT record (for, say, a DomainKeys Identified Mail (DKIM) public key), the reply is broken up into multiple DNS "strings". The TXT record itself can be up to 64K in size, with an overall length specified in the "resource record" (RR) header, but it is broken up into multiple strings, each of which is prefaced with a length.

Each string is a one-octet length value, followed by that many octets of data. To construct the full TXT record, one collects each of the string payloads into a buffer, which is where Exim went astray. For DKIM verification, a 4K buffer was allocated for the TXT record. Each string was length checked, so that it couldn't overrun the buffer, but the loop did not terminate once the buffer was exhausted. An attacker-controlled DNS server (or a benign server that just had a TXT record larger than 4K) could send a large record and either crash Exim or execute arbitrary code.

The fix is simple, making two changes: check for buffer exhaustion before looking at the next string and increase the size of the buffer to 64K. Either of those would be sufficient to fix the problem, doing both just provides a more robust fix. It's not clear why the original 4K buffer size was chosen, but Pennock speculated that it seemed a reasonable limit to the original developer given that there was a test for overflow (though it turned out to be incorrect).

The problem was found in an Exim DKIM code inspection that was done after a US-CERT advisory as and a Wired article raised DKIM issues. While the specific problems reported were not present in Exim, Pennock was concerned that increased attention would be focused on that code, thus the code review.

There are other implications to consider with the strings that make up a TXT record. At first blush, joining the strings directly (rather than with a space or newline character) makes sense, but there are protocols that depend on the strings within a TXT record being treated as separate entities. DKIM and Sender Policy Framework (SPF) both explicitly say that the strings should be joined directly, but forcing that behavior for all TXT records retrieved by Exim broke some ad hoc uses.

Likewise, there is a question of how to handle multiple TXT records. Those records will be returned in random order, so two DKIM key TXT records (i.e. prefaced with "v=DKIM1;") could be returned in a query. If applications don't check for that possibility, or handle it differently than the DNS administrator creating the TXT records expected, problems could result. Once again, DKIM and SPF explicitly disallow multiple TXT records for their information, so compliant programs need to check. Other protocols may not be as clear.

Beyond that, DNS has some surprises in the kinds of names it allows. Many believe that domain and host names are restricted to certain subsets of characters, but that is not true. As RFC 2181 specifies, the limits are purely length-based (63 octets per component, 255 octets for a domain name). Each octet of the name can contain any value from 0 to 255. Looking at the host names returned by the following command is rather interesting:

    $ host -lva test.globnix.net nlns.globnix.net
    ...
    foo\\.bar.test.globnix.net. 600 IN      A       192.0.2.8
    ...
    cr\013\010lf.test.globnix.net. 600 IN   AAAA    2a02:898:31:dead:beef::32
    ...
    i-want-nul.test.globnix.net. 600 IN     CNAME   nul\000gap.test.globnix.net.
    ...

That domain is one that Pennock has had for years, and the entries are meant to be somewhat eye-opening. For example, note that '.' is legal in the components of a host name (represented textually as foo\.bar...). And that brackets ([, ]), colons, NULs (\000), newlines, backslashes, and so on are all legal. Any of those could pose a problem for a program that didn't expect to receive them. One of the ways that might happen is with a reverse lookup, where an IP address to host name mapping is sought.

For actual domain names, it may be difficult or impossible to register any with "weird" characters, but they are definitely legal as far as DNS is concerned. The registrars will shy away from those kinds of domains because they aren't legal in email addresses or URLs. But, as Pennock's examples show, domains with their own DNS can create all sorts of problematic host names.

These dark corners are hopefully well-known to DNS server and library developers, but they aren't necessarily obvious to those outside of those specialties. One can well imagine that there are bugs lurking in applications and tools that use DNS at a medium or low level. Some of those could easily result in security vulnerabilities.

[I would like to thank Phil Pennock for sharing his research and answering questions about DNS handling.]


(Log in to post comments)

Potential pitfalls in DNS handling

Posted Nov 15, 2012 11:02 UTC (Thu) by epa (subscriber, #39769) [Link]

Yikes! Any byte value is allowed even . and \0? That's quite a surprise. It also means that the common foo.bar.com DNS notation is inadequate; you'd have to carefully define an escaping scheme so that any hostname can be written unambiguously.

What happens when one of these odd hostnames needs to be encoded in a URI? It is not enough to say 'just %-encode it' because that does not address the issue of '.' contained in a component.

Potential pitfalls in DNS handling

Posted Nov 15, 2012 16:02 UTC (Thu) by NAR (subscriber, #1313) [Link]

Well, it looks like that the "Preferred name syntax" chapter is indeed only about the preferred names. Interestingly the inet:getaddr/2 function in Erlang accept characters only between $21 and $7e, so no space or international domain names.

Potential pitfalls in DNS handling

Posted Nov 15, 2012 18:14 UTC (Thu) by paulj (subscriber, #341) [Link]

Anything is allowed in DNS labels? Was that always the case, because RFC1035 says this in section 2.3.1 on "Preferred name syntax" - on names in DNS generally:

"Note that while upper and lower case letters are allowed in domain names, no significance is attached to the case. That is, two names with the same spelling but different case are to be treated as if identical.

The labels must follow the rules for ARPANET host names. They must start with a letter, end with a letter or digit, and have as interior characters only letters, digits, and hyphen. There are also some restrictions on the length. Labels must be 63 characters or less."

Potential pitfalls in DNS handling

Posted Nov 15, 2012 18:18 UTC (Thu) by paulj (subscriber, #341) [Link]

Ah, perhaps it's RFC2673 allowing this: https://tools.ietf.org/html/rfc2673

Potential pitfalls in DNS handling

Posted Nov 15, 2012 20:10 UTC (Thu) by bjencks (subscriber, #80303) [Link]

Further up 2.3.1 it says:

"For example, when naming a mail domain, the user should satisfy both the rules of this memo and those in RFC-822. When creating a new host name, the old rules for HOSTS.TXT should be followed. This avoids problems when old software is converted to use domain names.

The following syntax will result in fewer problems with many applications that use domain names (e.g., mail, TELNET)."

and proceeds to describe the rules you quoted, indicating that those rules are guidelines for maximum interoperability, not MUST specifications of the protocol. Section 3.1 reinforces this:
"Although labels can contain any 8 bit values in octets that make up a label, it is strongly recommended that labels follow the preferred syntax described elsewhere in this memo, which is compatible with existing host naming conventions."

Potential pitfalls in DNS handling

Posted Nov 16, 2012 10:24 UTC (Fri) by paulj (subscriber, #341) [Link]

Interesting. Which means RFC1035 is surely inconsistent on this, given 2.3.1 says "the labels must" - a "must" that isn't actually a "must". But as someone else points out, RFC2181 §11 clearly states binary is allowed.

Potential pitfalls in DNS handling

Posted Nov 16, 2012 19:11 UTC (Fri) by hawk (subscriber, #3195) [Link]

I think the point there is that section 2.3.1 of RFC1035 (http://tools.ietf.org/html/rfc1035#section-2.3.1) is not describing the capabilities of the actual DNS protocol but rather what names should be used to achieve compatibility with existing systems.

This article is really about what kind of data you can get back in a (still correctly formatted) DNS response. It's important to note that even though the DNS protocol can carry anything there may still be application specific naming rules that prevents the full-on "any byte is valid" in a specific context.

(The article does have an unfortunate mixup (that's my take on it, anyway) where hostname name rules and DNS protocol name rules seem to be considered the same thing. See my comment regarding this: http://lwn.net/Articles/525471/)

Potential pitfalls in DNS handling

Posted Nov 22, 2012 6:40 UTC (Thu) by magfr (guest, #16052) [Link]

The problem with application specific rules is that a cracker could choose to not adhere to them so the problem is still there and the application have to be prepared for everything that the protocol can transport.

Note that everything the protocol can transport might be a superset of what the protocol allows.

Potential pitfalls in DNS handling

Posted Nov 16, 2012 17:57 UTC (Fri) by hawk (subscriber, #3195) [Link]

I guess the point is that "the common foo.bar.com DNS notation" is not really "DNS notation" at all but a common domain name notation used in the operating system/application layers regardless of resolution mechanism.

In fact, it's clearly a different notation from the one that the DNS protocol uses, so there being a difference in capability there is not hugely surprising.

For the actual DNS protocol it's not actually that shocking, there dots have no special meaning and the same applies to any other byte value.
Instead, of the dot-separation the DNS protocol prefixes each label by an integer specifying the length.

However, there being a difference between the capabilities of how domain names are handled by the OS/application and by DNS and possibly other resolution mechanisms in use clearly creates an opportunity for confusion/inconsistency/disaster/... when mapping between the native format of each resolution mechanism (the 8bit clean "DNS wire notation" in the case of DNS) and the "string representation with dot-separation".

I suppose the idea is that the resolver library in the operating system ought to take care of this mapping (escaping or discarding or whatever should be done for names that can not be represented in the dot-separated string notation) once and for all and let the applications just do their thing.

Potential pitfalls in DNS handling

Posted Nov 16, 2012 23:54 UTC (Fri) by Comet (subscriber, #11646) [Link]

The bind library does.

That doesn't stop people writing code like:

labels = result.split('.')
shorthost = labels[0]

If you split on '.', then you break when presented with the escaped form '\.' as you'll do an extra split. You might get away with it when only looking for the TLD, or just sorting data and rejoining the strings.

So, there's an escaping mechanism, it helps a lot of the time, but other times it produces surprising results and you need to at least know that the escaping mechanism is in use.

Potential pitfalls in DNS handling

Posted Nov 15, 2012 14:44 UTC (Thu) by rvfh (subscriber, #31018) [Link]

It is worth noting that file names in UNIX can also have any characters except NULL and /. I had the bad surprise to find out that some users can be very creative with file naming, and misbehaving programs even more so, before I realised this.

Potential pitfalls in DNS handling

Posted Nov 15, 2012 21:29 UTC (Thu) by wahern (subscriber, #37304) [Link]

I once found a hacker on a large university system who kept a setuid binary in / for his backdoor, except the name had escaping sequences which kept it hidden from the typical shell listing. I found it when poking around, like inquisitive users do. I had compiled zshell, which unlike the default, proprietary shell on the system let me see and manipulate those names. I was giddy when I executed it and was dropped into a root shell.

I notified the sysadmin, who was incredulous at first. Later I found out that the hacker had penetrated many more systems, including many Bell Atlantic servers. Never did find out how he broke in, though in those days there was lots of low hanging fruit to exploit.

Potential pitfalls in DNS handling

Posted Nov 16, 2012 12:25 UTC (Fri) by cate (subscriber, #1359) [Link]

Not really. You can eventually have "/" in a filename: "/" has a value defined in localedef, thus you can define a private locale, create a file containing the ascii value of "/", and returning to a normal locale.

It is allowed by POSIX (but without specifying what a POSIX program should behave when it encounters such file), but I never tested.

'/' in filename? Really?

Posted Nov 17, 2012 2:35 UTC (Sat) by pr1268 (subscriber, #24648) [Link]

Really...? Any online examples of this?

I'm sincerely curious as to how I could overcome the inability to create a directory named "AC/DC" in my music files directory (where each subdirectory is named after the artist/band whose song files are stored within).

Back to the article, I feel somewhat re-assured that the various DNS library implementations would appear to fail given strange input that the RFCs seem to allow. And besides, those are relatively low-numbered RFCs; surely they've been around a while to shake out the bugs. </slightly ignorant observation>

Thanks to Phil Pennock and the Exim developers looking into this.

'/' in filename? Really?

Posted Nov 17, 2012 11:47 UTC (Sat) by hummassa (subscriber, #307) [Link]

Use the codepoint 0x2215 And be happy...

Potential pitfalls in DNS handling

Posted Nov 16, 2012 18:34 UTC (Fri) by quotemstr (subscriber, #45331) [Link]

> file names in UNIX can also have any characters

And I maintain that's a bug. Kernels should be doing:

* UTF-8 normalization
* Leading and trailing space elimination
* Banning leading dashes
* Banning non-printable unicode characters

There's absolutely no reason for treating filenames as opaque strings, except that by doing so, you avoid having arguments about encodings. Now that UTF-8 has won, we should revisit that decision.

Potential pitfalls in DNS handling

Posted Nov 17, 2012 1:39 UTC (Sat) by anselm (subscriber, #2796) [Link]

UTF-8 normalisation probably makes sense, but disallowing leading dashes in filenames would disable potentially desirable features like being able to create a file called »-i« in a directory where you don't want to accidentally have »rm *« delete all your files.

Potential pitfalls in DNS handling

Posted Nov 17, 2012 2:13 UTC (Sat) by apoelstra (subscriber, #75205) [Link]

> UTF-8 normalisation probably makes sense, but disallowing leading dashes in filenames would disable potentially desirable features like being able to create a file called »-i« in a directory where you don't want to accidentally have »rm *« delete all your files.

Nor would it allow creating "-r" in directories where you want rm to be extra destructive. :)

Potential pitfalls in DNS handling

Posted Nov 17, 2012 15:13 UTC (Sat) by Jandar (subscriber, #85683) [Link]

It is not the kernels job to judge the userspace character-encodings or other aspects of filenames. If we follow this path would more than one dot legal, why ban only leading and trailing spaces? And think of the children, disallow NSFW words in filenames or file-content ;-).

Potential pitfalls in DNS handling

Posted Nov 19, 2012 10:51 UTC (Mon) by cesarb (subscriber, #6266) [Link]

Potential pitfalls in DNS handling

Posted Nov 15, 2012 17:05 UTC (Thu) by RobSeace (subscriber, #4435) [Link]

Many believe that domain and host names are restricted to certain subsets of characters, but that is not true. As RFC 2181 specifies, the limits are purely length-based (63 octets per component, 255 octets for a domain name). Each octet of the name can contain any value from 0 to 255.

Talking purely of DNS restrictions, sure... But, people get their beliefs about actual hostname restrictions from RFC 1123 and RFC 952 before it, which limit host and domain names to "characters drawn from the alphabet (A-Z), digits (0-9), minus sign (-), and period (.). Note that periods are only allowed when they serve to delimit components of 'domain style names'."

Potential pitfalls in DNS handling

Posted Nov 15, 2012 22:30 UTC (Thu) by Comet (subscriber, #11646) [Link]

Answering several points at once:
* RFC2181 clarifies what the underlying protocol supports. It didn't change anything.
* DNS has always been this open; it's up to the application layer to impose restrictions. If you don't put in the restrictions yourself, you have no restrictions, so don't forget them. Then remember that common usage doesn't always match specifications -- eg, URLs often do end up with underscores in them.
* There isn't actually a generic specification for hostnames that I've ever found. There's old rules which applied to HOSTS.TXT; there are rules for mail domains; there are rules for hostnames as encountered in URLs. That's about it.
* Escaping: with modern Bind, you might need to export IDN_DISABLE=t into the environment before running dig(1) and friends, to avoid the IDN errors that might get thrown for parse issues. With that in mind, there is actually a quoting mechanism in common use, derived from RFC1035 section 5.1. RFC4343 is the one which extends this to be used more generally.
* If you validate hostnames from user input, don't forget to consider hostnames which you get from reverse DNS or from CNAME lookups, or NAPTR. You might leave it unmolested for pass-back to the DNS layer, but if you're passing it up, consider escaping.
* Whatever beliefs people have about what's allowed in a hostname: fine, now just make sure your code enforces it. :)

Potential pitfalls in DNS handling

Posted Nov 15, 2012 22:32 UTC (Thu) by Comet (subscriber, #11646) [Link]

Forgot I had this:
% host -lva f.e.e.b.d.a.e.d.1.3.0.0.8.9.8.0.2.0.a.2.ip6.arpa nlns.globnix.net

Sorry Jake, should have mentioned it before. That too is public DNS, it's just that it's only in IPv6 that I have enough public IPs spare to dedicate a dead:beef range to test purposes. ;)

Potential pitfalls in DNS handling

Posted Nov 16, 2012 13:10 UTC (Fri) by marcH (subscriber, #57642) [Link]

> Whatever beliefs people have about what's allowed in a hostname: fine, now just make sure your code enforces it. :)

Indeed it would be foolish not to sanitize your (network!) inputs only on the basis that some specification restricts something.

Potential pitfalls in DNS handling

Posted Nov 16, 2012 18:21 UTC (Fri) by hawk (subscriber, #3195) [Link]

Overall, nicely put together article. However, I think this particular section is problematic:

"Many believe that domain and host names are restricted to certain subsets of characters, but that is not true. As RFC 2181 specifies, the limits are purely length-based (63 octets per component, 255 octets for a domain name)."

The problem with that reasoning is that it implies that RFC2181 defines anything else than what the DNS protocol can carry.

In fact, the RFC goes out of its way pointing out that DNS is general purpose and that different applications will want different rules for what names are acceptable. From the "Name syntax" section (http://tools.ietf.org/html/rfc2181#section-11):

" Occasionally it is assumed that the Domain Name System serves only
the purpose of mapping Internet host names to data, and mapping
Internet addresses to host names. This is not correct, the DNS is a
general (if somewhat limited) hierarchical database, and can store
almost any kind of data, for almost any purpose."

...description of what the DNS protocol can carry...

" Note however, that the various applications that make use of DNS data
can have restrictions imposed on what particular values are
acceptable in their environment. For example, that any binary label
can have an MX record does not imply that any binary name can be used
as the host part of an e-mail address. Clients of the DNS can impose
whatever restrictions are appropriate to their circumstances on the
values they use as keys for DNS lookup requests, and on the values
returned by the DNS. If the client has such restrictions, it is
solely responsible for validating the data from the DNS to ensure
that it conforms before it makes any use of that data."

Potential pitfalls in DNS handling

Posted Nov 16, 2012 23:50 UTC (Fri) by Comet (subscriber, #11646) [Link]

Yes; so when you perform a reverse DNS lookup, which can return arbitrary data, it is thus your responsibility, as an application using DNS, to enforce restrictions upon that value.

If the reverse DNS for an IP contains \r\n and you emit the IP to your logs, make sure you understand what is escaped where, to ensure that your logs don't have arbitrary records injected via DNS data.

Copyright © 2012, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds