LWN.net Logo

with malice aforethought (Re: Unicode cheatsheet for Perl)

From:  Tom Christiansen <tchrist-AT-perl.com>
To:  Christian Hansen <christian.hansen-AT-mac.com>
Subject:  with malice aforethought (Re: Unicode cheatsheet for Perl)
Date:  Sun, 26 Feb 2012 12:22:27 -0700
Message-ID:  <8662.1330284147@chthon>
Cc:  Leon Timmermans <fawaka-AT-gmail.com>, Karl Williamson <public-AT-khwilliamson.com>, Perl5 Porters Mailing List <perl5-porters-AT-perl.org>, Jarkko Hietaniemi <jhi-AT-iki.fi>, chansen-AT-cpan.org
Archive-link:  Article, Thread

Christian Hansen <christian.hansen@mac.com> wrote
   on Tue, 21 Feb 2012 02:07:08 +0100:

>>> I would love for this to happen, I have advocated this on #p5p several
>>> times, but there is always the battle of  "backwards compatibility
>>> disease". About 10 months ago I reported a security issue reading the
>>> relaxed UTF-8 implementation (still undisclosed and still exploitable)
>>> on the perl security mailing list.

Then we are currently in a security-through-obscurity situation, wherein
only overall ignorance of an exploit "protects" us.  That's not protection;
it's a vulnerability.  Would you estimate the vulnerability is severe
enough for us to consider whether in this particular case we should
consider issuing patches for old releases, like make a 5.12.5 or 5.10.2?

>> There is absolutely no need to remain compatible with security-related
>> bugs, and every reason not to.  Indeed, security is the only thing that
>> we ever issue patches to releases that are past their end-of-life support.

> I lack the political skills to make this happen, but I'm more than willing
> to provide the proper UTF-8 implementation for this (as defined by
> Unicode/ISO/IEC 10646:2011) we could always discuss the need for the
> invented meaning of relaxed. During my years as a professional programmer
> for several high profile financial institutions in Sweden, I have only
> encountered Ill-formed UTF8 through malicious attempts or clients that
> thought that they where sending UTF-8 but using ISO-8959-1, thats my
> experience, perhaps yours looks different?

My own experiences are finding the wrong encoding used by accident, not by
malicious intent.  The situation you mention is therefore outside of my own
experiences, which makes me all the more concerned about it.  I have gigabytes
of corrupt data because of Java having the wrong defaults for what to do 
with wrong encodings.  It was a design mistake, but they locked themselves
into it forever and everyone keeps paying for that blunder.  Let's not
mimic their bad decisions.  Let's fix ours.

The thing I don't want is to have to tell people that they cannot trust
perl -C, that they cannot trust PERL_UNICODE, that they cannot trust use
utf8, that that they cannot trust use open, that they cannot trust binmode,
that they cannot trust :encoding(UTF-8), and that the only thing they can
trust is laborious and error-prone manual encoding/decoding with FB_CROAK.

If that position is nonetheless correct, it drastically needs to be fixed.
Christian, I don't know what political skills you allude to as needed to
make this happen.  Political skills to achieve a consensus that backwards
compatibility with previous behavior known to be wrong is undesirable?

It seems to me that Python went through a transition where encoding-decoding
errors changed from some sort of non-fatal to proper exceptions.  I don't know
what sort of conniptions they experience there, since it's not a backwards-
contemptible change.  But it doesn't have to be b-c, and probably shouldn't be.
Jarkko is right.

It's better to fix bugs than to document them, and it's better to document them
than not.  Right now I'm very hazy on the real status of all this stuff, and I
am very uncomfortable with the idea of relentlessly charging ahead toward a
release like a freight train with no brakes.

Absolutely nothing depends upon any particular release date, but quite a bit
depends on correct behavior, especially if it is security-related.  I know which
one of those *I* consider immeasurably more important, but Aristotle appears to
be of the opposite opinion.  Is this the "poltical will" problem you mention?

--tom



(Log in to post comments)

Copyright © 2012, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds