The unstoppable Perl release train?

Posted Mar 1, 2012 8:04 UTC (Thu) by Los__D (guest, #15263)
In reply to: The unstoppable Perl release train? by autarch
Parent article: The unstoppable Perl release train?

Heh, indeed.

I assumed that the security issue was with a new way of handling UTF-8. If the boilerplate code needed for security is nothing new, then I agree that the release should go ahead, as it is already out there.

Encoding-related vulnerabilities

Posted Mar 1, 2012 12:36 UTC (Thu) by tialaramex (subscriber, #21167) [Link] (1 responses)

I have no insider knowledge of Perl (my Perl is about as good as my French, I can make myself understood but there's no mistaking me for a native) and no insider knowledge of this specific security vulnerability

BUT, I do know about UTF-8 and the usual routes for bad handling of text encodings to open up vulnerabilities are stuff like:

1. Mistakenly accepting non-canonical encodings as an end run around character restrictions, e.g. the forward slash '/' in Unix pathnames is U+002F, correctly expressed in UTF-8 as a single byte 0x2F the same as ASCII. But the UTF-8 byte sequence 0xC0 0xAF (if I did my math right) also decodes as U+002F. This "over long sequence" is prohibited by the UTF-8 standards, but a cheap-and-cheerful decoder might allow it anyway. The bytes 0xC0 and 0xAF don't match 0x2F so it's possible you can sneak a leading forward slash past a routine intended to filter them out.

2. Error-recovery routines that mangle strings in ways useful to an attacker. The Unicode specifications explain that if you can't or won't report an encoding error (e.g. return error code, throw exception) when processing you should inject the reserved codepoint U+FFFD "replacement character" in place of each code unit you couldn't handle. But it's much /simpler/ to just discard the erroneous bytes. Doing that will allow attackers to re-construct a string that would have been blacklisted earlier in processing (e.g. sneaking the word "escape" past a filter by injecting a nonsense byte between "esc" and "ape").

3. Dumb operations leading to surprising errors. A sixteen byte ASCII string can always be safely divided into two eight byte ASCII strings, the resulting strings may not make sense, but they're legal. You never get an error from trying to do this. The same is not true in UTF-8, strings may only be correctly cut at the boundary between codepoints, dividing them otherwise may result in illegal strings. You may get an unexpected error or the strings may not be eight bytes long as expected. Code that's handled everything thrown at in ASCII suddenly "dies" when attackers send gibberish that's alleged to be UTF-8 instead.

Of course Perl is always innovating so it's possible they've genuinely found a new and exciting way to screw up, but it's more likely to be something like the above.

Encoding-related vulnerabilities

Posted Mar 1, 2012 17:38 UTC (Thu) by erwbgy (subscriber, #4104) [Link]

Thank you. Informative comments like this are what makes reading LWN such a pleasure.