May 31, 2006
This article was contributed by Jake Edge.
A recent urgent update to
PostgreSQL vividly demonstrates the problems with validating user input
that are the foundation of SQL injection attacks. Widely used techniques
to escape characters in user input can still allow SQL injection when
coupled with multibyte character encodings. While this problem was first
discovered in PostgreSQL, today's security fix
announcement for MySQL indicates
a similar problem there as well.
As discussed in the LWN SQL injection
article, inserting strings
of user input into SQL queries can be hazardous. Many applications
do little or no validation of strings entered by a user before dropping
them into a query; this negligence can
lead to a compromise of the entire database. Better behaved programs
attempt to escape various troublesome characters (typically single-quote
and backslash), but because of the multibyte-encoding problem, problems can
remain.
It is not just database clients that need to validate user input, the
database server needs to validate as well as the first bug shows.
PostgreSQL allows the "\'" (backslash + single-quote) sequence to be used to
represent a single-quote character in a query as well as the two single-quote
character sequence ("''") that is the SQL standard.
Unfortunately, the escaping code
used by database clients often ignores the character encoding and just looks
for bytes with a 0x27 ("'") value and replaces them with an
escaped version. The security hole comes about because illegal multibyte
character sequences can be used to enable quotes to slip past the
escaping process. An example provided in the
technical information describes how this can be done.
In the UTF8 encoding, the byte value 0xc8 introduces a two-byte
character; the second byte must be within the range 0xa0-0xff.
However, PostgreSQL would accept any value for the second byte and treat
both bytes as a single character. A malicious user could enter
"0xc8'text", which would be converted by the well meaning
client to "0xc8''text" (or "0xc8\'text"); the
server would then treat the
0xc8' or 0xc8\ sequence as a single character, leaving an
unescaped single-quote in the input, effectively injecting the
attacker-supplied text.
The second issue stems from certain far-eastern encodings where the value
0x5c ("\") is a valid value for the second
byte of a two-byte character. In the SJIS encoding for example,
the two-byte sequence 0x95 0x5c
is a valid character, but a client that is not encoding-aware may try
to escape the 'backslash' that it sees by doubling it. Adding single-quotes
into the mix provides a means for a SQL injection.
"0x95 0x5c'text" could become "0x95 0x5c\''text", which
effectively inserts an unescaped single-quote into the query.
It is interesting to note that 0x27 ("'") is not a valid value for the
second byte of a two-byte character and, if PostgreSQL had rigidly adhered
to the SQL standard and only accepted "''" to escape single-quotes, this
issue would not exist.
There is a straightforward fix for the first problem: do not accept illegal
multibyte character sequences and refuse to process queries that contain
them. Unfortunately, the second problem is more complicated and there is
no single simple fix on the database server side. If database clients
did their escaping in an encoding aware manner, this problem would not
exist; expecting this from all clients is hopeless, however. The PostgreSQL
developers chose to disallow "\'" for any encoding that allows embedded
0x5c characters. This closes the hole for all clients that
use "''" to escape single-quotes but still allows for injections for clients
that use "\'". This change is likely to break those clients
altogether, however.
Both of these problems could have been avoided by using
prepared statements with
placeholders (i.e. 'SELECT * FROM tbl WHERE id=?'). Even if the libraries did
not implement the quoting correctly, the SQL engine would still not allow
the parameter to be treated as anything but data for that particular
spot in the query, thereby avoiding the injection. Another way to
avoid this kind of problem is to use stored procedures. As these
bugs show, it can be very difficult to appropriately filter and/or
validate user input.
(
Log in to post comments)