LWN: Comments on "Python ssl module update" https://lwn.net/Articles/724209/ This is a special feed containing comments posted to the individual LWN article titled "Python ssl module update". en-us Thu, 09 Oct 2025 04:13:05 +0000 Thu, 09 Oct 2025 04:13:05 +0000 https://www.rssboard.org/rss-specification lwn@lwn.net hostname matching https://lwn.net/Articles/724976/ https://lwn.net/Articles/724976/ njs <div class="FormattedComment"> The determinedly broken hostname matching is: <a rel="nofollow" href="https://bugs.python.org/issue28414">https://bugs.python.org/issue28414</a><br> The lack of IDNA 2008 is: <a rel="nofollow" href="https://bugs.python.org/issue17305">https://bugs.python.org/issue17305</a><br> <p> I also just alerted Cory to the issue in the hopes that his new TLS library will hopefully avoid this problem... the Python ssl maintainer(s) is (are) certainly aware of it, but the stdlib ssl module is (like everything) pretty under-resourced, and with the Python release cycle and the py2/py3 split getting this kind of complex change done can be really slow :-/<br> <p> </div> Sun, 11 Jun 2017 08:26:38 +0000 hostname matching https://lwn.net/Articles/724885/ https://lwn.net/Articles/724885/ tialaramex <div class="FormattedComment"> While I appreciate that the "and everything is terrible" line seems appropriate here, might we at least raise this as a clear bug? Can I do that somewhere? Or if it already exists, can I be told where the bug report is so I can ensure it gets tended to by others who grok this stuff and will try to "gently" direct people towards actually doing what the spec. says ?<br> <p> From the Web PKI side, bugs like this mean when we say to CAs "Don't do X" they point at the bug and say "We have to because of this bug". And so another year or six goes by without the problem fixed. Python being part of the problem not the solution is disappointing.<br> </div> Thu, 08 Jun 2017 22:32:58 +0000 hostname matching https://lwn.net/Articles/724840/ https://lwn.net/Articles/724840/ njs <div class="FormattedComment"> I was going to say oh it's not that bad, but it turns out that was based on a misreading of the source... it's not just that they have the wrong IDNA standard implemented :-(. In fact Python's SSL module's hostname verification will encode whatever hostname you gave it to a U-label (even if you forcibly pass in an A-label yourself), and then it will compare that against the raw subjectAltNames and CN. So currently the *only* situation in which the stdlib ssl module will successfully connect to a IDN over TLS is when the CN has the U-label in it.<br> <p> In conclusion, TLS is hard and software is hard and everything is terrible.<br> </div> Thu, 08 Jun 2017 07:33:16 +0000 hostname matching https://lwn.net/Articles/724680/ https://lwn.net/Articles/724680/ tialaramex <div class="FormattedComment"> As I understand it Gentoo's Symantec change was masked out. Symantec's entire argument is basically that they're such a large (and more importantly visible, they only issued about 5% of the valid site certificates, but they're disproportionately on high traffic sites) provider, so just instantly switching that off will break lots of stuff. I suspect this would very quickly demonstrate that Gentoo's independence is more theoretical than actual.<br> <p> I appreciate that CACert's processes may feel robust if you happen to know the core CACert people, most of us don't and never will, so what we see is just another flailing volunteer group. Ten years ago CACert looked like a reasonable way forward, but today it does not. Maybe if CACert had been in the game much earlier, say in 1998 not 2003 then they'd already have been included in key stores prior to Honest Achmed and the CA/B and so then they'd be _inside_ the tent making rules for newcomers, not outside desperately playing catch-up.<br> <p> In terms of competence, I see basically the same sort of errors made by CACert as at Symantec, and I feel the same way. Yes, in principle you can take a bunch of tools and know-how and do whatever you want, issue whatever you want, and it will all work out fine. But you will very likely make lots of mistakes if you do that, so I _strongly_ recommend you instead put the effort into having machines doing just a handful of things very well, and then sit on your hands. At one point Symantec tried to create a custom tbsCertificate and in doing so they erroneously signed it, even though the _whole point_ of the exercise was not to sign anything, when you read transcripts of CACert trying to follow simple instructions for a non-standard procedure it looks much the same. <br> </div> Tue, 06 Jun 2017 16:17:54 +0000 hostname matching https://lwn.net/Articles/724656/ https://lwn.net/Articles/724656/ flussence <div class="FormattedComment"> <font class="QuotedText">&gt; Over in m.d.s.policy we had discussions with Cory Benfield about the other end of this stuff - Cory sees that the CA trust relationships packaged up with a Linux distro, or with Python requests are only a crude partial summary of the actual CA trust exhibited by the browsers (in this case Mozilla's Firefox) which is implemented in software.</font><br> Gentoo's packaging of Mozilla's CA bundle is surprisingly opinionated - not only have they given the option to trust CACert (the only root that has OV/EV practices worth a damn) but they also blacklisted the evil Symantec/Wosign/StartCom certs far earlier than the browsers did.<br> <p> It caused me some mild grief, e.g. Pidgin wouldn't connect to AIM any more because its entire SSL chain was rotten. Some workaround must be in place since it still uses Symantec certs.<br> </div> Tue, 06 Jun 2017 03:00:33 +0000 hostname matching https://lwn.net/Articles/724585/ https://lwn.net/Articles/724585/ tialaramex <div class="FormattedComment"> "If there were some standard well-maintained library for doing hostname checking that also took care of IDN encoding and Python delegated this stuff to it, then it would at least catch that fass.de does *not* have a valid certificate for faß.de"<br> <p> Arguably there is no such thing as "a valid certificate for faß.de" the certificate would be for xn--fa-hia.de, and it's purely a presentation layer decision to render this as faß.de. It certainly isn't correct to say "Oh, the user can type faß.de, we'll connect to the wrong machine, then give them a certificate error". That's not even a halfway acceptable solution.<br> <p> There absolutely are CAs which will issue a certificate with a dnsName SAN for xn--fa-hia.de and then in CN they'll write faß.de because they can (the dnsName is deliberately defined with one of ASN.1's far too numerous sort-of ASCII encodings, so you can't write ß there, but CN is just arbitrary human-readable text...) However, checking CN for a Unicode version of the name is just compounding the original error, please don't do that either!<br> </div> Mon, 05 Jun 2017 01:10:55 +0000 hostname matching https://lwn.net/Articles/724581/ https://lwn.net/Articles/724581/ njs <div class="FormattedComment"> It does happen that Python functions that work with hostnames in general accept U-labels and do the A-label conversion automatically, but this isn't the problem – Python's getaddrinfo and SNI and hostname checking code all use the same routine for this, so they stay consistent. (I agree that it seems a bit fragile, but I'm not aware of it having caused any problems yet in practice.) The problem is that Python's U-label -&gt; A-label code has gotten stuck on IDNA 2003, so if the user asks for "faß.de" then getaddrinfo helpfully gives them the IP address for fass.de, and then the hostname checking helpfully confirms that they do have a valid certificate for fass.de, etc., and there's no indication that they're not actually talking to xn--fa-hia.de like they should be. For this it doesn't really matter whether the A-label conversion happens once at the boundary or multiple times inside, because it gives the same wrong answer either way :-). If there were some standard well-maintained library for doing hostname checking that also took care of IDN encoding and Python delegated this stuff to it, then it would at least catch that fass.de does *not* have a valid certificate for faß.de. But really the solution is just to upgrade to IDNA 2008. (Possibly breaking everyone's code in the process, which I guess is why it hasn't happened yet.)<br> <p> Some security-conscious libraries like requests do already do their own IDN encoding, so that the stdlib functions only see the A-label.<br> <p> <font class="QuotedText">&gt; Over in m.d.s.policy we had discussions with Cory Benfield about the other end of this stuff - Cory sees that the CA trust relationships packaged up with a Linux distro, or with Python requests are only a crude partial summary of the actual CA trust exhibited by the browsers</font><br> <p> Yeah, this is also unfortunate. Cory's currently engaged in a herculean effort to define a new TLS API for Python that can delegate to the platform TLS implementations on Windows and MacOS. I'm not sure that they're actually any better at this in practice, but at least it would reduce the number of distinct trust databases, and shift the responsibility away from the Python devs. Of course on Linux we can't even agree on where the list of trusted CAs gets put on disk, never mind any kind of more sophisticated policy decisions...<br> </div> Mon, 05 Jun 2017 00:09:22 +0000 hostname matching https://lwn.net/Articles/724558/ https://lwn.net/Articles/724558/ tialaramex <div class="FormattedComment"> "The current Python hostname matching accepts a unicode hostname"<br> <p> But why? You can't use this name _for_ anything here. I absolutely understand that users want the name shown as they expect it, but the user isn't feeding the name into the hostname matching code, almost always the user doesn't care about matching at all, this all needs to happen behind the scenes when they connect. If you are able to connect to the host (otherwise what are you trying to "match" against?) then somewhere you have successfully figured out the punycode DNS name for this host and _that_ is the thing you ought to be matching against the SAN dnsName inside the certificate. [[ If you connected by IP address, you should only be matching SAN ipAddress names NOT trying to contemplate dnsNames, do not repeat Microsoft's bug here ]] Doing the conversion separately in each place that it occurs just increases the chance things will break.<br> <p> If the reason is just "it looks like text, so we accept Unicode" well, I guess, I don't know enough about Python style to recommend the correct way forward, in Java I would suggest labelling the Unicode API @deprecated and explaining why in the documentation. It's not useless to offer to do Punycode translation here, it just makes the API needlessly fragile to rely upon that for the usual case when we should know the exact name we're trying to match already. Given that people _shouldn't_ be calling in here with Unicode, it's probably safer to actively reject that than to try to muddle along, that way people who do need to work directly with the U-name form (e.g. maybe a test tool) will be aware of the sharp edge they're invoking because they'll need to do the encode/decode step themselves.<br> <p> Over in m.d.s.policy we had discussions with Cory Benfield about the other end of this stuff - Cory sees that the CA trust relationships packaged up with a Linux distro, or with Python requests are only a crude partial summary of the actual CA trust exhibited by the browsers (in this case Mozilla's Firefox) which is implemented in software. In particular browsers often impose what we might call "sanctions" short of distrust through such code, e.g. a poorly managed French government CA is not actually trusted by Firefox to issue for TLDs that aren't controlled by the French state, and the incompetent/ deceitful WoSign CA is not actually trusted to issue new certificates. However the simple list of trusted CAs exported to software like Cory's does not reflect these nuances. In both cases the CA is simply "trusted" because the alternative is "not trusted". Alas we did not come to much conclusion, there is understandable reluctance on the Mozilla side to do more (they already do more than their fair share) and the sanctions imposed are a bit "ad hoc" so there's not much realistic chance of consistently exposing them as data so that they can be consumed by other tools.<br> </div> Sun, 04 Jun 2017 09:37:19 +0000 hostname matching https://lwn.net/Articles/724557/ https://lwn.net/Articles/724557/ njs <div class="FormattedComment"> <font class="QuotedText">&gt; Bug #1 is about being backwards compatible with abuse of CN, the topic I wrote about</font><br> <p> Well, OK, yes, but the bug was that they implemented your advice in 2011 and then people complained so they had to change it. Likely it would be different now, but still, this wasn't a bug that following your advice would have avoided :-).<br> <p> <font class="QuotedText">&gt; The rest is stuff that's not relevant to hostname matching</font><br> <p> It's not very clear from the slide, but #8 is that if you're using a non-default configuration it's possible to write buggy code that according to the docs should raise an error, but instead silently disables hostname validation. (I happen to be familiar with this one because I discovered it...) The underlying cause is that openssl will sometimes "helpfully" do an automatic handshake without Python realizing, i.e. Python's current strategy for coordinating with openssl here is just wrong.<br> <p> <font class="QuotedText">&gt; IDNA is a display issue, it may suck for your Python code to display German names wrong, but the matching in SANs doesn't care</font><br> <p> The current Python hostname matching accepts a unicode hostname and is responsible for matching it to the SAN, so IDNA issues are germane. Unfortunately AFAICT OpenSSL's hostname matching code doesn't do IDNA either, so Python will have to remain responsible for this bit anyway (as slide 26 notes).<br> <p> </div> Sun, 04 Jun 2017 07:04:14 +0000 hostname matching https://lwn.net/Articles/724524/ https://lwn.net/Articles/724524/ tialaramex <div class="FormattedComment"> Of those listed:<br> <p> Bug #1 is about being backwards compatible with abuse of CN, the topic I wrote about<br> <p> Bug #2 is the usual "So I used regular expressions, now I had two problems" although it's really covered by bug #3 because this abuse is forbidden in current standards, but even if you wanted to permit it a regular expression was totally inappropriate.<br> <p> Bug #3 is about Python belatedly realising that it should implement a current standard, not a very lax one from many years ago.<br> <p> Bug #4 is actually the _same_ bug again, but now interpreted as somehow relevant to IDNA. Having not done things as described in the standard, weirdness occurs if you use IDNA. The fix is, of course, just to implement the standard.<br> <p> Bug #5 is the familiar bug where a C API presents a "string" and it has embedded NUL bytes and you as a result trim off most of the string.<br> <p> NB Bugs 2-5 are very low risk for the Web PKI _because_ they're about SANs and so it would be mis-issuance for certificates which trigger these conditions to even exist.<br> <p> The rest is stuff that's not relevant to hostname matching (IDNA is a display issue, it may suck for your Python code to display German names wrong, but the matching in SANs doesn't care about display, that's the _whole point_ of this design).<br> </div> Sat, 03 Jun 2017 08:09:10 +0000 hostname matching https://lwn.net/Articles/724516/ https://lwn.net/Articles/724516/ njs <div class="FormattedComment"> There's a list in the slides that are linked below (starting on slide 16). I didn't see any that had anything to do with the issues you're talking about; instead it's a bunch of tricky cases around stuff like wildcards, or SAN fields with embedded NULs, or openssl being helpful in an unexpected way, or IDNA being a mess, etc etc. Individually they're all handleable, but this is just inherently tricky code, so it makes sense for Python to want to get out of the business of maintaining their own copy that no-one else uses.<br> </div> Sat, 03 Jun 2017 03:14:37 +0000 hostname matching https://lwn.net/Articles/724515/ https://lwn.net/Articles/724515/ daurnimator <div class="FormattedComment"> Even just extracting out of SAN and matching against hostname is difficult. See <a href="https://wiki.openssl.org/index.php/Hostname_validation#section2">https://wiki.openssl.org/index.php/Hostname_validation#se...</a><br> <br> Any projects I know of that have tried to do it have ended up just extracting the code from curl.<br> </div> Sat, 03 Jun 2017 02:56:07 +0000 Python ssl module update https://lwn.net/Articles/724487/ https://lwn.net/Articles/724487/ tiran <div class="FormattedComment"> I have updated my slides from the language summit on speakerdeck, <a href="https://speakerdeck.com/tiran/python-language-summit-2017-state-of-the-ssl-module">https://speakerdeck.com/tiran/python-language-summit-2017...</a> . The slides are slightly augmented with additional comments.<br> </div> Fri, 02 Jun 2017 16:07:49 +0000 Debian https://lwn.net/Articles/724438/ https://lwn.net/Articles/724438/ ballombe <div class="FormattedComment"> The sequel 'it has also been backported to Debian stable.' seems to imply that Debian 8 and Debian stable are different things.<br> </div> Fri, 02 Jun 2017 10:29:32 +0000 hostname matching https://lwn.net/Articles/724435/ https://lwn.net/Articles/724435/ tialaramex <div class="FormattedComment"> "There are a large number of bugs associated with matching hostnames against those found in TLS certificates"<br> <p> This is worrying, the "large number of bugs" aren't linked so I haven't trivially been able to examine them, but - again in the Web PKI which I appreciate isn't everybody's domain but it's the right default for software like Python - the correct thing to do here is accept that SANs have been mandatory for a long time, don't try to get fancy and just do a simple case-insensitive ASCII byte match on the SAN dnsName field. We know this works for the Web PKI because Google and others ship clients which do this ‡. Will it break for some wacky private systems? Yes. So will almost any meaningful validation because these are almost invariably poorly managed PKIs. Provide a switch for "use hot garbage validation" and default it to off.<br> <p> It's possible to throw an enormous amount of effort at trying to parse hostnames out of fields never intended to contain them in an effort to "correctly" allow certificates that other people seem to use fine. Such compatibility has an enormous security cost which people shouldn't be paying by default. You can end up trying to convert to or from punycode, trying to handle UTF-16, trimming out whitespace - none of this would be necessary if you just reject the non-compliant certificates outright and push the problem where it belongs - at issuance.<br> <p> ‡ Chrome has an unfortunate bug as a result of this, its error naming implies that some certificate validation failures are caused by an inappropriate Common Name (CN) field, but it doesn't actually even look at the CN these days so "fixing" that won't help.<br> </div> Fri, 02 Jun 2017 10:20:04 +0000 Debian https://lwn.net/Articles/724427/ https://lwn.net/Articles/724427/ xanni <div class="FormattedComment"> Then why write "also"?<br> </div> Fri, 02 Jun 2017 07:27:54 +0000 SHA-1 https://lwn.net/Articles/724398/ https://lwn.net/Articles/724398/ tialaramex <div class="FormattedComment"> "Alex Gaynor pointed out that SHA-1 is still allowed as the hash in X.509 certificates; Heimes acknowledged that, but said that it is needed to support some versions of TLS."<br> <p> It's hard to know who is barking up the wrong tree here, Gaynor, Heimes, or our reporter.<br> <p> So: The TLS protocol largely doesn't care what's inside X.509 certificates. It transports them to the peer and they examine the certificate and decide (on their own criteria) whether to continue the connection. In an X.509 certificate the SHA-1 algorithm is (was) used in signatures, in combination with public key cryptography. Most usually sha1WithRSAEncryption, as mathematical proof that someone who controls the Issuer's private key signed this certificate binding the Subject's identity to this public key.<br> <p> TLS does use a really broad range of cryptographic algorithms, mostly as part of the ciphersuite negotiated between peers during setup. Ciphersuites using SHA-1 continue to exist. In this role SHA-1 (usually just identified as "SHA" in a ciphersuite name) acts as a MAC rather than as part of a signature algorithm as it does in X.509.<br> <p> This difference (MAC vs signature algorithm) impacts the security implications and thus the community reaction to the SHA-1 collision demo earlier this year and the long-predicted weakness of SHA-1 this was intended to demonstrate.<br> <p> SHA-1 signatures are a big problem. The Web PKI outlawed them, there should be only a handful of new ones on the public Internet since 2015, and all those should have expired, they aren't trusted in any popular web browser's current release. Doubtless many more exist in private applications, sadly including in garbage middle boxes. It definitely doesn't make sense for new TLS-capable software in 2017 to accept SHA-1 signatures by default. If it must be permitted, lock it behind a flag that people need to read the documentation to discover, such signatures threaten everybody who might inadvertently trust one, that's why they're outlawed.<br> <p> Meanwhile SHA1 MACs, while no longer state of the art, aren't a big problem. Your shiny modern web browser is probably quite happy to use TLS_DHE_RSA_WITH_AES_128_CBC_SHA which uses SHA-1. It would be nice to upgrade servers to do something nicer, but if you had to prioritise then upgrading the MAC function is way down the list.<br> </div> Fri, 02 Jun 2017 01:09:10 +0000 Debian https://lwn.net/Articles/724362/ https://lwn.net/Articles/724362/ corbet A quick search shows that <a href="https://packages.debian.org/jessie-backports/openssl">1.0.2 is in jessie-backports</a>, so it is, as the article says, available in Debian&#160;8. Thu, 01 Jun 2017 15:20:55 +0000 Python ssl module update https://lwn.net/Articles/724360/ https://lwn.net/Articles/724360/ itvirta <div class="FormattedComment"> <font class="QuotedText">&gt; OpenSSL 1.0.2 is available in RHEL 7.4, Debian 8 ("Jessie"), and Ubuntu 16.04; it has also been backported to Debian stable</font><br> <p> Jessie _is_ stable, should that first mention be about stretch?<br> </div> Thu, 01 Jun 2017 15:12:24 +0000