Python ssl module update
In something of a follow-on to his session
(with Cory Benfield) at the 2016 Python Language Summit, Christian Heimes
gave an update on the state of the Python ssl module.
In it, he covered some changes that have been made in the last year as well
as some changes that are being proposed. Heimes and Benfield are the
co-maintainers is a co-maintainer of the ssl module.
Heimes started with a bit of a detour to the hashlib module. The SHA-1 hash algorithm is dead, he said, due to various breakthroughs over the last few years. So, in Python 3.6, the hashlib module has added support for SHA-3 and BLAKE2. The security community has been happy to see that, he said. But Alex Gaynor pointed out that SHA-1 is still allowed as the hash in X.509 certificates; Heimes acknowledged that, but said that it is needed to support some versions of TLS.
![Christian Heimes [Christian Heimes]](https://static.lwn.net/images/2017/pls-heimes-sm.jpg)
The default cipher suites were next up. In the past, the ssl module needed to choose its own cipher suites because the choices made by OpenSSL were extremely poor. But OpenSSL has gotten much better recently and Python's override actually re-enables some insecure ciphers (e.g. 3DES). Heimes is proposing that ssl start using the OpenSSL HIGH default and explicitly exclude the known insecure ciphers. That way ssl will benefit from OpenSSL's updates to its choices and will hopefully mean that there will be no need to backport changes to the ssl module for cipher suite changes in the future.
Version 1.3 of the TLS protocol will be supported in OpenSSL version 1.1.1, which is supposed to be released mid-year. New cipher suites will need to be added to support TLS 1.3. Older Python versions (2.7, 3.5, and 3.6) will use a new flag to indicate that they do not support the new protocol.
There are a large number of bugs associated with matching hostnames against those found in TLS certificates. In Python 3.2, ssl.match_hostname() was added (and backported to 2.7.9) to do so. Since that time, there has been a steady stream of hostname-matching bugs, some of which remain unfixed. His proposed solution is to let OpenSSL perform the hostname verification step. That requires a recent version of OpenSSL (1.0.2 or higher) or LibreSSL (version 2.5 or higher).
He would also like to drop support for older versions of OpenSSL, at least for Python 3.7. OpenSSL 1.0.2 is available in RHEL 7.4, Debian 8 ("Jessie"), and Ubuntu 16.04; it has also been backported to Debian stable. OpenSSL 1.0.1 is no longer supported upstream, so he would like to drop support for that.
LibreSSL is a BSD fork of OpenSSL 1.0.1 that has picked up the new features in OpenSSL 1.0.2, so it is mostly compatible though it has removed multiple features. He would like to keep ssl only using those features provided by LibreSSL so that it is supported. In answer to a question from the audience, Heimes said that LibreSSL support is important for the BSDs as well as for Alpine Linux, which is popular for use in containers.
As Heimes started running out of time, he went through a few more things rather quickly. He pointed out that PEP 543, which proposes a unified TLS API, still needs a BDFL delegate to determine whether it will be adopted or not. There are some upcoming deprecations of broken parts of the ssl API. In addition, there are plans for various improvements to the module, including better hostname checking and support for international domain names encoded using IDNA.
[I would like to thank the Linux Foundation for travel assistance to
Portland for the summit.]
Index entries for this article | |
---|---|
Security | Python |
Security | Secure Sockets Layer (SSL) |
Conference | Python Language Summit/2017 |
Posted Jun 1, 2017 15:12 UTC (Thu)
by itvirta (guest, #49997)
[Link] (3 responses)
Jessie _is_ stable, should that first mention be about stretch?
Posted Jun 1, 2017 15:20 UTC (Thu)
by corbet (editor, #1)
[Link] (2 responses)
Posted Jun 2, 2017 10:29 UTC (Fri)
by ballombe (subscriber, #9523)
[Link]
Posted Jun 2, 2017 1:09 UTC (Fri)
by tialaramex (subscriber, #21167)
[Link]
It's hard to know who is barking up the wrong tree here, Gaynor, Heimes, or our reporter.
So: The TLS protocol largely doesn't care what's inside X.509 certificates. It transports them to the peer and they examine the certificate and decide (on their own criteria) whether to continue the connection. In an X.509 certificate the SHA-1 algorithm is (was) used in signatures, in combination with public key cryptography. Most usually sha1WithRSAEncryption, as mathematical proof that someone who controls the Issuer's private key signed this certificate binding the Subject's identity to this public key.
TLS does use a really broad range of cryptographic algorithms, mostly as part of the ciphersuite negotiated between peers during setup. Ciphersuites using SHA-1 continue to exist. In this role SHA-1 (usually just identified as "SHA" in a ciphersuite name) acts as a MAC rather than as part of a signature algorithm as it does in X.509.
This difference (MAC vs signature algorithm) impacts the security implications and thus the community reaction to the SHA-1 collision demo earlier this year and the long-predicted weakness of SHA-1 this was intended to demonstrate.
SHA-1 signatures are a big problem. The Web PKI outlawed them, there should be only a handful of new ones on the public Internet since 2015, and all those should have expired, they aren't trusted in any popular web browser's current release. Doubtless many more exist in private applications, sadly including in garbage middle boxes. It definitely doesn't make sense for new TLS-capable software in 2017 to accept SHA-1 signatures by default. If it must be permitted, lock it behind a flag that people need to read the documentation to discover, such signatures threaten everybody who might inadvertently trust one, that's why they're outlawed.
Meanwhile SHA1 MACs, while no longer state of the art, aren't a big problem. Your shiny modern web browser is probably quite happy to use TLS_DHE_RSA_WITH_AES_128_CBC_SHA which uses SHA-1. It would be nice to upgrade servers to do something nicer, but if you had to prioritise then upgrading the MAC function is way down the list.
Posted Jun 2, 2017 10:20 UTC (Fri)
by tialaramex (subscriber, #21167)
[Link] (12 responses)
This is worrying, the "large number of bugs" aren't linked so I haven't trivially been able to examine them, but - again in the Web PKI which I appreciate isn't everybody's domain but it's the right default for software like Python - the correct thing to do here is accept that SANs have been mandatory for a long time, don't try to get fancy and just do a simple case-insensitive ASCII byte match on the SAN dnsName field. We know this works for the Web PKI because Google and others ship clients which do this ‡. Will it break for some wacky private systems? Yes. So will almost any meaningful validation because these are almost invariably poorly managed PKIs. Provide a switch for "use hot garbage validation" and default it to off.
It's possible to throw an enormous amount of effort at trying to parse hostnames out of fields never intended to contain them in an effort to "correctly" allow certificates that other people seem to use fine. Such compatibility has an enormous security cost which people shouldn't be paying by default. You can end up trying to convert to or from punycode, trying to handle UTF-16, trimming out whitespace - none of this would be necessary if you just reject the non-compliant certificates outright and push the problem where it belongs - at issuance.
‡ Chrome has an unfortunate bug as a result of this, its error naming implies that some certificate validation failures are caused by an inappropriate Common Name (CN) field, but it doesn't actually even look at the CN these days so "fixing" that won't help.
Posted Jun 3, 2017 2:56 UTC (Sat)
by daurnimator (guest, #92358)
[Link]
Posted Jun 3, 2017 3:14 UTC (Sat)
by njs (subscriber, #40338)
[Link] (10 responses)
Posted Jun 3, 2017 8:09 UTC (Sat)
by tialaramex (subscriber, #21167)
[Link] (9 responses)
Bug #1 is about being backwards compatible with abuse of CN, the topic I wrote about
Bug #2 is the usual "So I used regular expressions, now I had two problems" although it's really covered by bug #3 because this abuse is forbidden in current standards, but even if you wanted to permit it a regular expression was totally inappropriate.
Bug #3 is about Python belatedly realising that it should implement a current standard, not a very lax one from many years ago.
Bug #4 is actually the _same_ bug again, but now interpreted as somehow relevant to IDNA. Having not done things as described in the standard, weirdness occurs if you use IDNA. The fix is, of course, just to implement the standard.
Bug #5 is the familiar bug where a C API presents a "string" and it has embedded NUL bytes and you as a result trim off most of the string.
NB Bugs 2-5 are very low risk for the Web PKI _because_ they're about SANs and so it would be mis-issuance for certificates which trigger these conditions to even exist.
The rest is stuff that's not relevant to hostname matching (IDNA is a display issue, it may suck for your Python code to display German names wrong, but the matching in SANs doesn't care about display, that's the _whole point_ of this design).
Posted Jun 4, 2017 7:04 UTC (Sun)
by njs (subscriber, #40338)
[Link] (8 responses)
Well, OK, yes, but the bug was that they implemented your advice in 2011 and then people complained so they had to change it. Likely it would be different now, but still, this wasn't a bug that following your advice would have avoided :-).
> The rest is stuff that's not relevant to hostname matching
It's not very clear from the slide, but #8 is that if you're using a non-default configuration it's possible to write buggy code that according to the docs should raise an error, but instead silently disables hostname validation. (I happen to be familiar with this one because I discovered it...) The underlying cause is that openssl will sometimes "helpfully" do an automatic handshake without Python realizing, i.e. Python's current strategy for coordinating with openssl here is just wrong.
> IDNA is a display issue, it may suck for your Python code to display German names wrong, but the matching in SANs doesn't care
The current Python hostname matching accepts a unicode hostname and is responsible for matching it to the SAN, so IDNA issues are germane. Unfortunately AFAICT OpenSSL's hostname matching code doesn't do IDNA either, so Python will have to remain responsible for this bit anyway (as slide 26 notes).
Posted Jun 4, 2017 9:37 UTC (Sun)
by tialaramex (subscriber, #21167)
[Link] (7 responses)
But why? You can't use this name _for_ anything here. I absolutely understand that users want the name shown as they expect it, but the user isn't feeding the name into the hostname matching code, almost always the user doesn't care about matching at all, this all needs to happen behind the scenes when they connect. If you are able to connect to the host (otherwise what are you trying to "match" against?) then somewhere you have successfully figured out the punycode DNS name for this host and _that_ is the thing you ought to be matching against the SAN dnsName inside the certificate. [[ If you connected by IP address, you should only be matching SAN ipAddress names NOT trying to contemplate dnsNames, do not repeat Microsoft's bug here ]] Doing the conversion separately in each place that it occurs just increases the chance things will break.
If the reason is just "it looks like text, so we accept Unicode" well, I guess, I don't know enough about Python style to recommend the correct way forward, in Java I would suggest labelling the Unicode API @deprecated and explaining why in the documentation. It's not useless to offer to do Punycode translation here, it just makes the API needlessly fragile to rely upon that for the usual case when we should know the exact name we're trying to match already. Given that people _shouldn't_ be calling in here with Unicode, it's probably safer to actively reject that than to try to muddle along, that way people who do need to work directly with the U-name form (e.g. maybe a test tool) will be aware of the sharp edge they're invoking because they'll need to do the encode/decode step themselves.
Over in m.d.s.policy we had discussions with Cory Benfield about the other end of this stuff - Cory sees that the CA trust relationships packaged up with a Linux distro, or with Python requests are only a crude partial summary of the actual CA trust exhibited by the browsers (in this case Mozilla's Firefox) which is implemented in software. In particular browsers often impose what we might call "sanctions" short of distrust through such code, e.g. a poorly managed French government CA is not actually trusted by Firefox to issue for TLDs that aren't controlled by the French state, and the incompetent/ deceitful WoSign CA is not actually trusted to issue new certificates. However the simple list of trusted CAs exported to software like Cory's does not reflect these nuances. In both cases the CA is simply "trusted" because the alternative is "not trusted". Alas we did not come to much conclusion, there is understandable reluctance on the Mozilla side to do more (they already do more than their fair share) and the sanctions imposed are a bit "ad hoc" so there's not much realistic chance of consistently exposing them as data so that they can be consumed by other tools.
Posted Jun 5, 2017 0:09 UTC (Mon)
by njs (subscriber, #40338)
[Link] (4 responses)
Some security-conscious libraries like requests do already do their own IDN encoding, so that the stdlib functions only see the A-label.
> Over in m.d.s.policy we had discussions with Cory Benfield about the other end of this stuff - Cory sees that the CA trust relationships packaged up with a Linux distro, or with Python requests are only a crude partial summary of the actual CA trust exhibited by the browsers
Yeah, this is also unfortunate. Cory's currently engaged in a herculean effort to define a new TLS API for Python that can delegate to the platform TLS implementations on Windows and MacOS. I'm not sure that they're actually any better at this in practice, but at least it would reduce the number of distinct trust databases, and shift the responsibility away from the Python devs. Of course on Linux we can't even agree on where the list of trusted CAs gets put on disk, never mind any kind of more sophisticated policy decisions...
Posted Jun 5, 2017 1:10 UTC (Mon)
by tialaramex (subscriber, #21167)
[Link] (3 responses)
Arguably there is no such thing as "a valid certificate for faß.de" the certificate would be for xn--fa-hia.de, and it's purely a presentation layer decision to render this as faß.de. It certainly isn't correct to say "Oh, the user can type faß.de, we'll connect to the wrong machine, then give them a certificate error". That's not even a halfway acceptable solution.
There absolutely are CAs which will issue a certificate with a dnsName SAN for xn--fa-hia.de and then in CN they'll write faß.de because they can (the dnsName is deliberately defined with one of ASN.1's far too numerous sort-of ASCII encodings, so you can't write ß there, but CN is just arbitrary human-readable text...) However, checking CN for a Unicode version of the name is just compounding the original error, please don't do that either!
Posted Jun 8, 2017 7:33 UTC (Thu)
by njs (subscriber, #40338)
[Link] (2 responses)
In conclusion, TLS is hard and software is hard and everything is terrible.
Posted Jun 8, 2017 22:32 UTC (Thu)
by tialaramex (subscriber, #21167)
[Link] (1 responses)
From the Web PKI side, bugs like this mean when we say to CAs "Don't do X" they point at the bug and say "We have to because of this bug". And so another year or six goes by without the problem fixed. Python being part of the problem not the solution is disappointing.
Posted Jun 11, 2017 8:26 UTC (Sun)
by njs (subscriber, #40338)
[Link]
I also just alerted Cory to the issue in the hopes that his new TLS library will hopefully avoid this problem... the Python ssl maintainer(s) is (are) certainly aware of it, but the stdlib ssl module is (like everything) pretty under-resourced, and with the Python release cycle and the py2/py3 split getting this kind of complex change done can be really slow :-/
Posted Jun 6, 2017 3:00 UTC (Tue)
by flussence (guest, #85566)
[Link] (1 responses)
It caused me some mild grief, e.g. Pidgin wouldn't connect to AIM any more because its entire SSL chain was rotten. Some workaround must be in place since it still uses Symantec certs.
Posted Jun 6, 2017 16:17 UTC (Tue)
by tialaramex (subscriber, #21167)
[Link]
I appreciate that CACert's processes may feel robust if you happen to know the core CACert people, most of us don't and never will, so what we see is just another flailing volunteer group. Ten years ago CACert looked like a reasonable way forward, but today it does not. Maybe if CACert had been in the game much earlier, say in 1998 not 2003 then they'd already have been included in key stores prior to Honest Achmed and the CA/B and so then they'd be _inside_ the tent making rules for newcomers, not outside desperately playing catch-up.
In terms of competence, I see basically the same sort of errors made by CACert as at Symantec, and I feel the same way. Yes, in principle you can take a bunch of tools and know-how and do whatever you want, issue whatever you want, and it will all work out fine. But you will very likely make lots of mistakes if you do that, so I _strongly_ recommend you instead put the effort into having machines doing just a handful of things very well, and then sit on your hands. At one point Symantec tried to create a custom tbsCertificate and in doing so they erroneously signed it, even though the _whole point_ of the exercise was not to sign anything, when you read transcripts of CACert trying to follow simple instructions for a non-standard procedure it looks much the same.
Posted Jun 2, 2017 16:07 UTC (Fri)
by tiran (guest, #94212)
[Link]
Python ssl module update
A quick search shows that 1.0.2 is in jessie-backports, so it is, as the article says, available in Debian 8.
Debian
Debian
SHA-1
hostname matching
hostname matching
Any projects I know of that have tried to do it have ended up just extracting the code from curl.
hostname matching
hostname matching
hostname matching
hostname matching
hostname matching
hostname matching
hostname matching
hostname matching
hostname matching
The lack of IDNA 2008 is: https://bugs.python.org/issue17305
hostname matching
Gentoo's packaging of Mozilla's CA bundle is surprisingly opinionated - not only have they given the option to trust CACert (the only root that has OV/EV practices worth a damn) but they also blacklisted the evil Symantec/Wosign/StartCom certs far earlier than the browsers did.
hostname matching
Python ssl module update