The HTTPS bicycle attack

By Jake Edge
January 20, 2016

While HTTPS is an encrypted protocol, it does leak a certain amount of information about the communication—the source and destination addresses, at a minimum. But a newly reported technique can actually "see" inside of the encrypted data without requiring the key or cracking the encryption. By using the length information inherent in the protocol, some simple math can be done to determine the length of some portions of the encrypted data, which can be used to figure out things like password length. It only requires a recording of the packets in a session of interest, along with a bit of information about the target, which means it can be performed days or months later.

In a paper [PDF], Guido Vranken described the weakness that he has dubbed the "HTTPS bicycle attack". The name comes from the idea that wrapping a bicycle as a gift doesn't really hide what is inside the package. Similarly, HTTPS doesn't entirely obscure the contents of its encrypted payloads.

Vranken concentrates on stream ciphers in the paper, noting that they have a 1:1 relationship between the plain text and the cipher text; adding one byte to the plain text results in an additional encrypted byte in the HTTPS payload. The attack only considers messages that have the "application data" content type (0x17 in the first byte) and uses the length information stored at the fourth and fifth bytes of the message. From that, coupled with a little detective work, things like the length of a password submitted to the site can be derived.

The bicycle technique will be most effective for targeted attacks, where an eavesdropper can record the traffic to and from a host of interest. In particular, the "user agent" header being sent by the browser (or, really, its length) is helpful, though not necessarily required. It can be captured, along with other standard headers sent by the user's browser, from a regular HTTP request to any site. There may be other unknown headers in the HTTPS requests, but their length can be deduced from other encrypted requests as Vranken has shown.

The other major piece of the puzzle is that the attacker must also record their own session that exercises the web application in the same way that the victim has. Because they can decode their own traffic, the attacker gains the knowledge of the contents and lengths of various resources requested in the process. That allows the attacker to figure out which HTTPS messages correspond to the ones they are interested in.

For example, if a particular login page consists of half a dozen different resources (e.g. images, style sheets), each with a distinct length, it is relatively straightforward to isolate that part of encrypted stream even if the requests are handled in a different order. In addition, the analysis can ignore any constant difference in the sizes of the requests that comes from additional or different headers that the victim's browser sends. (Vranken used Pearson correlation to match a WordPress login page and its resources in the paper).

Once the messages of interest are identified, the request that sends the login credentials is scrutinized. Its length will consist of a mixture of known headers, unknown headers, and the actual form parameters that are being submitted. The length of the unknown headers can be derived from the other requests since the attacker knows the lengths they recorded from their own session. The difference between those other requests and what the attacker recorded can be used to adjust the length of the authentication message, which just leaves the length of the form parameters.

The login credentials consist of both a username and password, of course, so all of the analysis only gives the combined length of the two. That, again, is where targeting comes in. In general, finding out the username for a target is not that difficult. Subtracting its length gives the attacker the length of the target's password.

That may seem like a fair amount of work just to get the length of the password, but that can be used in various ways to potentially compromise the account (e.g. brute force, dictionary attacks). In addition, Vranken showed several other ways that the length of a string in a web request or response (e.g. geographic coordinates, IP addresses) might be used to peer inside the encrypted data to extract useful information.

Vranken offered some suggestions for mitigating the problem. Using JavaScript to hash the password (using SHA-256, say) on the client side would be one way to do that, since all passwords would hash to the same length. That would also mean that the server never has access to the plain-text password. While that would be advantageous in some ways, it would prevent the server from validating the password (e.g. that it must contain letters and digits), which might be undesirable.

Padding the password is another option, though there are some potential pitfalls there. Ensuring that the browser does not strip the padding characters is obviously essential. Variable-length padding seems attractive, but will actually leak information as well. Vranken recommended using the ASCII NUL ("\0") character for padding, then hexadecimal-encoding the password plus padding into a string to be sent to the server.

This attack is another reminder that encrypted communication is not necessarily a panacea. There are certainly government security agencies that have tons of HTTPS traffic stored that could be used to target a variety of web applications for a subject of interest. Placing the length parameter in the unencrypted portion of the message certainly helps here; if the message boundaries were obscured, this kind of attack would be more difficult, at a minimum.

Index entries for this article
Security	Encryption/Web
Security	Information leak

The HTTPS bicycle attack

Posted Jan 21, 2016 8:21 UTC (Thu) by dgm (subscriber, #49227) [Link] (4 responses)

> The length of the unknown headers can be derived from the other requests since the attacker knows the lengths they recorded from their own session.

This is a flaw in the technique that can be used to easily subvert it. The trick is having the login.php form include random length garbage in the headers. I humbly propose a "X-Bicycle-Box" (or simply "Bicycle-Box") header for that.

The HTTPS bicycle attack

Posted Jan 21, 2016 8:33 UTC (Thu) by johill (subscriber, #25196) [Link]

I'm not sure it should be in the header; you want it also on the way back when the data is POSTed, so it'd have to either be a random-length cookie, or be part of the <form>.

The HTTPS bicycle attack

Posted Jan 21, 2016 9:58 UTC (Thu) by mina86 (guest, #68442) [Link] (1 responses)

As the article describes, unknown headers can be ignored by the attacker so that doesn't seem like a valid protection, but a _pad form field filled with letter a such that len(login) + len(password) + len(_pad) is constant might just work.

Then again, the POST data encoding is variable-length so this may still leak presence or absence of some special characters so padding and hex encoding seems like the best option.

The HTTPS bicycle attack

Posted Jan 26, 2016 16:06 UTC (Tue) by robbe (guest, #16131) [Link]

> unknown headers can be ignored by the attacker

As I understood it, this only holds true as long as they have constant length. A variable length header, as proposed by dgm, may mitigate the issue somewhat.

The HTTPS bicycle attack

Posted Jan 31, 2016 2:31 UTC (Sun) by jimparis (guest, #38647) [Link]

> I humbly propose a "X-Bicycle-Box" (or simply "Bicycle-Box") header for that.

Or a perfect opportunity to use "X-Bikeshed".. :)

The HTTPS bicycle attack

Posted Jan 21, 2016 10:28 UTC (Thu) by epa (subscriber, #39769) [Link] (4 responses)

Do small https requests need to be automatically padded with random data of random length?

The HTTPS bicycle attack

Posted Jan 21, 2016 11:42 UTC (Thu) by Lekensteyn (guest, #99903) [Link] (3 responses)

I would say no, do not add padding to the HTTP request. The concern from the report is only valid for stream ciphers. When block ciphers are in use, then you cannot infer the exact plaintext length due to the included padding (for CBC modes at least, see https://tools.ietf.org/html/rfc5246#section-6.2.3).

The application layer should not unnecessarily complicated with such adhoc workarounds, instead fix the problem at the TLS layer: avoid stream ciphers if you are concerned about leaking the length.

The HTTPS bicycle attack

Posted Jan 21, 2016 12:15 UTC (Thu) by epa (subscriber, #39769) [Link]

I didn't mean in the application layer, no.

The HTTPS bicycle attack

Posted Jan 21, 2016 18:29 UTC (Thu) by hkario (subscriber, #94864) [Link] (1 responses)

AES-GCM is a stream cipher.

The solution is to pad the data in the application layer. It's impossible for the TLS layer to infer which data needs to be padded, and which not.

But you need to make sure that len(password_field) + len(padding) is constant. No making length of padding random is not the solution - that just requires the attacker to capture more sessions. See Lucky 13 mitigation, especially in the Amazon s2n library and how it wasn't sufficient.

Also, this attack is hardly novel - all TLS specifications since 1.0 warned about traffic analysis revealing the lengths of individual records. TLS is not magic pixie dust that makes everything secure once applied.

The HTTPS bicycle attack

Posted Jan 22, 2016 10:02 UTC (Fri) by epa (subscriber, #39769) [Link]

It's impossible for the TLS layer to infer which data needs to be padded, and which not.

That is surprising (to me as a non-cryptographer). I had expected it to have provision for adding some kind of chaff at the low level, if only to pad tiny bits of data to a minimum packet size or minimum block size.

No making length of padding random is not the solution - that just requires the attacker to capture more sessions.

This is true, it's only a mitigation, not a complete fix.

I agree that padding passwords to a constant length in the application layer is the right answer. I was just wondering if there's something TLS could do to mitigate the problem for applications which aren't so carefully written (which in practice will always be the majority of them).

The HTTPS bicycle attack

Posted Jan 21, 2016 12:00 UTC (Thu) by meskio (guest, #100774) [Link] (1 responses)

> Using JavaScript to hash the password (using SHA-256, say) on the client side would be one way to do that, since all passwords would hash to the same length.

I'm surprised they recommend a hash instead of SRP. Once you put some JavaScript in the mixture why not do SRP? so a MitM can't reuse your hash to login again.

The HTTPS bicycle attack

Posted Jan 22, 2016 16:28 UTC (Fri) by chfisher (subscriber, #106449) [Link]

But the SHA-256 hash is then encrypted by the stream cipher, so to reuse the hash, the attacker has to have broken the cipher, and if they have done that, it's game over anyway.

The HTTPS bicycle attack

Posted Jan 26, 2016 16:50 UTC (Tue) by jhoblitt (subscriber, #77733) [Link]

One could argue that doing away with brain-dead server side password validation rules would be a feature and not a bug...

The HTTPS bicycle attack

Posted Jan 28, 2016 11:20 UTC (Thu) by ssokolow (guest, #94568) [Link]

Yet another reason to use a password manager and generated passwords. Knowing the length doesn't really help much if it has no relation to its content.