hostname matching

Posted Jun 4, 2017 9:37 UTC (Sun) by tialaramex (subscriber, #21167)
In reply to: hostname matching by njs
Parent article: Python ssl module update

"The current Python hostname matching accepts a unicode hostname"

But why? You can't use this name _for_ anything here. I absolutely understand that users want the name shown as they expect it, but the user isn't feeding the name into the hostname matching code, almost always the user doesn't care about matching at all, this all needs to happen behind the scenes when they connect. If you are able to connect to the host (otherwise what are you trying to "match" against?) then somewhere you have successfully figured out the punycode DNS name for this host and _that_ is the thing you ought to be matching against the SAN dnsName inside the certificate. [[ If you connected by IP address, you should only be matching SAN ipAddress names NOT trying to contemplate dnsNames, do not repeat Microsoft's bug here ]] Doing the conversion separately in each place that it occurs just increases the chance things will break.

If the reason is just "it looks like text, so we accept Unicode" well, I guess, I don't know enough about Python style to recommend the correct way forward, in Java I would suggest labelling the Unicode API @deprecated and explaining why in the documentation. It's not useless to offer to do Punycode translation here, it just makes the API needlessly fragile to rely upon that for the usual case when we should know the exact name we're trying to match already. Given that people _shouldn't_ be calling in here with Unicode, it's probably safer to actively reject that than to try to muddle along, that way people who do need to work directly with the U-name form (e.g. maybe a test tool) will be aware of the sharp edge they're invoking because they'll need to do the encode/decode step themselves.

Over in m.d.s.policy we had discussions with Cory Benfield about the other end of this stuff - Cory sees that the CA trust relationships packaged up with a Linux distro, or with Python requests are only a crude partial summary of the actual CA trust exhibited by the browsers (in this case Mozilla's Firefox) which is implemented in software. In particular browsers often impose what we might call "sanctions" short of distrust through such code, e.g. a poorly managed French government CA is not actually trusted by Firefox to issue for TLDs that aren't controlled by the French state, and the incompetent/ deceitful WoSign CA is not actually trusted to issue new certificates. However the simple list of trusted CAs exported to software like Cory's does not reflect these nuances. In both cases the CA is simply "trusted" because the alternative is "not trusted". Alas we did not come to much conclusion, there is understandable reluctance on the Mozilla side to do more (they already do more than their fair share) and the sanctions imposed are a bit "ad hoc" so there's not much realistic chance of consistently exposing them as data so that they can be consumed by other tools.

hostname matching

Posted Jun 5, 2017 0:09 UTC (Mon) by njs (subscriber, #40338) [Link] (4 responses)

It does happen that Python functions that work with hostnames in general accept U-labels and do the A-label conversion automatically, but this isn't the problem – Python's getaddrinfo and SNI and hostname checking code all use the same routine for this, so they stay consistent. (I agree that it seems a bit fragile, but I'm not aware of it having caused any problems yet in practice.) The problem is that Python's U-label -> A-label code has gotten stuck on IDNA 2003, so if the user asks for "faß.de" then getaddrinfo helpfully gives them the IP address for fass.de, and then the hostname checking helpfully confirms that they do have a valid certificate for fass.de, etc., and there's no indication that they're not actually talking to xn--fa-hia.de like they should be. For this it doesn't really matter whether the A-label conversion happens once at the boundary or multiple times inside, because it gives the same wrong answer either way :-). If there were some standard well-maintained library for doing hostname checking that also took care of IDN encoding and Python delegated this stuff to it, then it would at least catch that fass.de does *not* have a valid certificate for faß.de. But really the solution is just to upgrade to IDNA 2008. (Possibly breaking everyone's code in the process, which I guess is why it hasn't happened yet.)

Some security-conscious libraries like requests do already do their own IDN encoding, so that the stdlib functions only see the A-label.

Yeah, this is also unfortunate. Cory's currently engaged in a herculean effort to define a new TLS API for Python that can delegate to the platform TLS implementations on Windows and MacOS. I'm not sure that they're actually any better at this in practice, but at least it would reduce the number of distinct trust databases, and shift the responsibility away from the Python devs. Of course on Linux we can't even agree on where the list of trusted CAs gets put on disk, never mind any kind of more sophisticated policy decisions...

hostname matching

Posted Jun 5, 2017 1:10 UTC (Mon) by tialaramex (subscriber, #21167) [Link] (3 responses)

"If there were some standard well-maintained library for doing hostname checking that also took care of IDN encoding and Python delegated this stuff to it, then it would at least catch that fass.de does *not* have a valid certificate for faß.de"

Arguably there is no such thing as "a valid certificate for faß.de" the certificate would be for xn--fa-hia.de, and it's purely a presentation layer decision to render this as faß.de. It certainly isn't correct to say "Oh, the user can type faß.de, we'll connect to the wrong machine, then give them a certificate error". That's not even a halfway acceptable solution.

There absolutely are CAs which will issue a certificate with a dnsName SAN for xn--fa-hia.de and then in CN they'll write faß.de because they can (the dnsName is deliberately defined with one of ASN.1's far too numerous sort-of ASCII encodings, so you can't write ß there, but CN is just arbitrary human-readable text...) However, checking CN for a Unicode version of the name is just compounding the original error, please don't do that either!

hostname matching

Posted Jun 8, 2017 7:33 UTC (Thu) by njs (subscriber, #40338) [Link] (2 responses)

I was going to say oh it's not that bad, but it turns out that was based on a misreading of the source... it's not just that they have the wrong IDNA standard implemented :-(. In fact Python's SSL module's hostname verification will encode whatever hostname you gave it to a U-label (even if you forcibly pass in an A-label yourself), and then it will compare that against the raw subjectAltNames and CN. So currently the *only* situation in which the stdlib ssl module will successfully connect to a IDN over TLS is when the CN has the U-label in it.

In conclusion, TLS is hard and software is hard and everything is terrible.

hostname matching

Posted Jun 8, 2017 22:32 UTC (Thu) by tialaramex (subscriber, #21167) [Link] (1 responses)

While I appreciate that the "and everything is terrible" line seems appropriate here, might we at least raise this as a clear bug? Can I do that somewhere? Or if it already exists, can I be told where the bug report is so I can ensure it gets tended to by others who grok this stuff and will try to "gently" direct people towards actually doing what the spec. says ?

From the Web PKI side, bugs like this mean when we say to CAs "Don't do X" they point at the bug and say "We have to because of this bug". And so another year or six goes by without the problem fixed. Python being part of the problem not the solution is disappointing.

hostname matching

Posted Jun 11, 2017 8:26 UTC (Sun) by njs (subscriber, #40338) [Link]

The determinedly broken hostname matching is: https://bugs.python.org/issue28414
The lack of IDNA 2008 is: https://bugs.python.org/issue17305

I also just alerted Cory to the issue in the hopes that his new TLS library will hopefully avoid this problem... the Python ssl maintainer(s) is (are) certainly aware of it, but the stdlib ssl module is (like everything) pretty under-resourced, and with the Python release cycle and the py2/py3 split getting this kind of complex change done can be really slow :-/

hostname matching

Posted Jun 6, 2017 3:00 UTC (Tue) by flussence (guest, #85566) [Link] (1 responses)

> Over in m.d.s.policy we had discussions with Cory Benfield about the other end of this stuff - Cory sees that the CA trust relationships packaged up with a Linux distro, or with Python requests are only a crude partial summary of the actual CA trust exhibited by the browsers (in this case Mozilla's Firefox) which is implemented in software.
Gentoo's packaging of Mozilla's CA bundle is surprisingly opinionated - not only have they given the option to trust CACert (the only root that has OV/EV practices worth a damn) but they also blacklisted the evil Symantec/Wosign/StartCom certs far earlier than the browsers did.

It caused me some mild grief, e.g. Pidgin wouldn't connect to AIM any more because its entire SSL chain was rotten. Some workaround must be in place since it still uses Symantec certs.

hostname matching

Posted Jun 6, 2017 16:17 UTC (Tue) by tialaramex (subscriber, #21167) [Link]

As I understand it Gentoo's Symantec change was masked out. Symantec's entire argument is basically that they're such a large (and more importantly visible, they only issued about 5% of the valid site certificates, but they're disproportionately on high traffic sites) provider, so just instantly switching that off will break lots of stuff. I suspect this would very quickly demonstrate that Gentoo's independence is more theoretical than actual.

I appreciate that CACert's processes may feel robust if you happen to know the core CACert people, most of us don't and never will, so what we see is just another flailing volunteer group. Ten years ago CACert looked like a reasonable way forward, but today it does not. Maybe if CACert had been in the game much earlier, say in 1998 not 2003 then they'd already have been included in key stores prior to Honest Achmed and the CA/B and so then they'd be _inside_ the tent making rules for newcomers, not outside desperately playing catch-up.

In terms of competence, I see basically the same sort of errors made by CACert as at Symantec, and I feel the same way. Yes, in principle you can take a bunch of tools and know-how and do whatever you want, issue whatever you want, and it will all work out fine. But you will very likely make lots of mistakes if you do that, so I _strongly_ recommend you instead put the effort into having machines doing just a handful of things very well, and then sit on your hands. At one point Symantec tried to create a custom tbsCertificate and in doing so they erroneously signed it, even though the _whole point_ of the exercise was not to sign anything, when you read transcripts of CACert trying to follow simple instructions for a non-standard procedure it looks much the same.