LWN.net Logo

SQL injection vulnerabilities in PostgreSQL

May 31, 2006

This article was contributed by Jake Edge.

A recent urgent update to PostgreSQL vividly demonstrates the problems with validating user input that are the foundation of SQL injection attacks. Widely used techniques to escape characters in user input can still allow SQL injection when coupled with multibyte character encodings. While this problem was first discovered in PostgreSQL, today's security fix announcement for MySQL indicates a similar problem there as well.

As discussed in the LWN SQL injection article, inserting strings of user input into SQL queries can be hazardous. Many applications do little or no validation of strings entered by a user before dropping them into a query; this negligence can lead to a compromise of the entire database. Better behaved programs attempt to escape various troublesome characters (typically single-quote and backslash), but because of the multibyte-encoding problem, problems can remain.

It is not just database clients that need to validate user input, the database server needs to validate as well as the first bug shows. PostgreSQL allows the "\'" (backslash + single-quote) sequence to be used to represent a single-quote character in a query as well as the two single-quote character sequence ("''") that is the SQL standard. Unfortunately, the escaping code used by database clients often ignores the character encoding and just looks for bytes with a 0x27 ("'") value and replaces them with an escaped version. The security hole comes about because illegal multibyte character sequences can be used to enable quotes to slip past the escaping process. An example provided in the technical information describes how this can be done.

In the UTF8 encoding, the byte value 0xc8 introduces a two-byte character; the second byte must be within the range 0xa0-0xff. However, PostgreSQL would accept any value for the second byte and treat both bytes as a single character. A malicious user could enter "0xc8'text", which would be converted by the well meaning client to "0xc8''text" (or "0xc8\'text"); the server would then treat the 0xc8' or 0xc8\ sequence as a single character, leaving an unescaped single-quote in the input, effectively injecting the attacker-supplied text.

The second issue stems from certain far-eastern encodings where the value 0x5c ("\") is a valid value for the second byte of a two-byte character. In the SJIS encoding for example, the two-byte sequence 0x95 0x5c is a valid character, but a client that is not encoding-aware may try to escape the 'backslash' that it sees by doubling it. Adding single-quotes into the mix provides a means for a SQL injection. "0x95 0x5c'text" could become "0x95 0x5c\''text", which effectively inserts an unescaped single-quote into the query. It is interesting to note that 0x27 ("'") is not a valid value for the second byte of a two-byte character and, if PostgreSQL had rigidly adhered to the SQL standard and only accepted "''" to escape single-quotes, this issue would not exist.

There is a straightforward fix for the first problem: do not accept illegal multibyte character sequences and refuse to process queries that contain them. Unfortunately, the second problem is more complicated and there is no single simple fix on the database server side. If database clients did their escaping in an encoding aware manner, this problem would not exist; expecting this from all clients is hopeless, however. The PostgreSQL developers chose to disallow "\'" for any encoding that allows embedded 0x5c characters. This closes the hole for all clients that use "''" to escape single-quotes but still allows for injections for clients that use "\'". This change is likely to break those clients altogether, however.

Both of these problems could have been avoided by using prepared statements with placeholders (i.e. 'SELECT * FROM tbl WHERE id=?'). Even if the libraries did not implement the quoting correctly, the SQL engine would still not allow the parameter to be treated as anything but data for that particular spot in the query, thereby avoiding the injection. Another way to avoid this kind of problem is to use stored procedures. As these bugs show, it can be very difficult to appropriately filter and/or validate user input.


(Log in to post comments)

Preventing SQL injection with stored procedures

Posted Jun 1, 2006 3:53 UTC (Thu) by mrshiny (subscriber, #4266) [Link]

I'm curious as to how using stored procedures prevents SQL injection? In my experience you can create a call to a stored procedures just like a call to SQL:
String query = "{ call some_package.somefunc('" + arg + "'); }";
CallableStatement cs = connection.prepareCall(query);
cs.execute();
The above is just as vulnerable to sql injection as any normal SQL statement. Am I missing something?

Preventing SQL injection with stored procedures

Posted Jun 1, 2006 4:29 UTC (Thu) by xoddam (subscriber, #2322) [Link]

Your example prepares the procedure using untrusted input, so the
prepared query itself is untrustworthy.

Preparing stored queries without using untrusted input requires the use
of placeholders for the arguments, and passing the input *later* when the
query is executed. Then no untrusted input will ever be parsed as SQL so
there is no injection vulnerability.

Preventing SQL injection with stored procedures

Posted Jun 1, 2006 9:39 UTC (Thu) by nix (subscriber, #2304) [Link]

It's also more efficient to use prepared queries and (I think) easier to read.

So you win on all fronts.

Preventing SQL injection with stored procedures

Posted Jun 1, 2006 13:18 UTC (Thu) by mrshiny (subscriber, #4266) [Link]

Right, place-holders in the prepared query will prevent injection attacks. But the same functionality is available for normal queries, so I guess I'm not seeing how "using stored procedures" is advisable. Really it comes down to "use prepared queries/procedure-calls". A developer who doesn't understand "use prepared queries" can't be trusted to make the leap to prepared queries for procedure calls.

Preventing SQL injection with stored procedures

Posted Sep 27, 2007 18:50 UTC (Thu) by einhverfr (guest, #44407) [Link]

I know this is an old post, but there is one area where this can make a difference.

The stored procedure idea does not prevent sql injection. However, in combination with appropriate db permissions it could be used to arbitrarily restrict what a user can do in the database. If the db is locked down well enough, and all access is through stored procs, and if you use db native accounts, you may not have to worry about sql injection attacks in the application in the same way you would otherwise.

There are, however, two big caveat to this issue. If user-supplied input is used to create the stored procedure name, this could be exploitable as well. The second is that not all queries are parameterizable inside stored procedures on all databases. Hence you could have SQL injection *inside* your stored procedures. In some cases, you have just moved the issues of SQL injection tracking back into the stored procs.

SQL injection vulnerabilities in PostgreSQL

Posted Jun 1, 2006 8:15 UTC (Thu) by nim-nim (subscriber, #34454) [Link]

Both of these problems would have been avoided if there was a development culture where i18n is not an afterthought and software writers validated their code with something else that english ASCII.

The year-2000 bug was a joke, the we've-developped-in-english-ascii-then-turned-i18n-on bug will continue striking for years.

And the worst part people are writing code *today* which assumes one char=one byte, and language=english. Even with a clear example of the consequences of this attitude your article manages to completely skip over this aspect.

i18n is *not* a translator problem.

(Another example of the way our software culture is profoundly conservative is the way GUI writers still think in terms of pixels and 75/96 dpi screens, while Dell and friends are maddly shipping whidescreen LCDs.)

afterthought i18n and the conservative software culture

Posted Jun 1, 2006 17:43 UTC (Thu) by rfunk (subscriber, #4054) [Link]

The problem is that the vast majority of Americans don't deal with
anything but English (and Anglicized spellings) in their everyday lives,
so i18n truly is an afterthought. Worse, most of them can't read or
write any other languages either.

As for a conservative software culture, part of it is developers who
don't (have time to?) keep up with current issues, but there's also a
strong culture of backward-compatibility. Enough people have old stuff
that developing for the latest and greatest isn't necessarily a good
idea. The trick is to be able to develop for both old and new at once,
and that's often difficult (requiring even more learning than for just
the new) or impossible.

afterthought i18n and the conservative software culture

Posted Jun 1, 2006 18:21 UTC (Thu) by nim-nim (subscriber, #34454) [Link]

So what ? You don't need to speak other langages to make the effort to learn and understand their technical requirements. Developpers have their software interact with hardware and software with much stranger and complex needs.

The problem is 100% cultural, and I don't mean cultural in the sense americans don't speak other langages, I mean cultural in the sense the software community collectively decided not to tackle some problems. The nationality of the software writer matters very little - they all code from the same textbooks which all present computers as manipulating strings of mono-byte characters. Likewise GUI writers all use the same reference material which assumes screen pixel density is a known stable variable. (Of course the fact a lot of these textbooks are written by people immersed in the american english-only worldview does not help.)

afterthought i18n and the conservative software culture

Posted Jun 2, 2006 21:16 UTC (Fri) by tjc (subscriber, #137) [Link]

The problem is 100% cultural, and I don't mean cultural in the sense americans don't speak other langages, I mean cultural in the sense the software community collectively decided not to tackle some problems.
It's mostly a motivation problem. The people to whom it matters most are apparently not motivated enough, or insufficient in number, to do something about it. The reason that most free/open source software only supports the ISO-Latin-1 character set is because that's what most developer's use themselves.

Just wanting to be PC isn't very motivating for most people. The same goes for listening to other people bitch about what one ought to be doing with one's free time.

afterthought i18n and the conservative software culture

Posted Jun 3, 2006 8:50 UTC (Sat) by nim-nim (subscriber, #34454) [Link]

> Just wanting to be PC isn't very motivating for most people.

I don't see where the PC angle is when it ends up in security bugs (and this is not an isolated case). You're just confirming what I wrote.

afterthought i18n and the conservative software culture

Posted Jun 3, 2006 15:59 UTC (Sat) by tjc (subscriber, #137) [Link]

Yeah, you're right about the security aspect.

I was referring to the big political war that has dogged Unicode. It got ugly. It probably still is, but I don't seem to care anymore.

SQL injection vulnerabilities in PostgreSQL

Posted Jun 2, 2006 0:41 UTC (Fri) by dlang (✭ supporter ✭, #313) [Link]

frankly I want software that doesn't require i18n to be turned on. processing multi-byte characters is slower then processing single byte characters (for a number of reasons). I don't want to pay that overhead for systems that don't need it (and face it, most systems really don't need it, that's why the world survived with ascii for so long)

as for the gui folks, it's not the pixles that are the issue, but the assumption that the gui software has the right to take the entire screen. if they didn't assume that they had the entire screen then the shape of that screen wouldn't matter

internationalisation

Posted Jun 2, 2006 2:33 UTC (Fri) by xoddam (subscriber, #2322) [Link]

> frankly I want software that doesn't require i18n to be turned on.

If the software supports internationalisation at all, then 'just ASCII
thanks' is one particular localisation, and there's nothing to switch
off. A little extra optimisation for this case wouldn't be a bad idea
though.

> processing multi-byte characters is slower then processing
> single byte characters (for a number of reasons).

UTF8 helps with this as long as the vast majority of your characters are
in the ASCII range. But UTF8 has its own minor performance headaches,
particularly in Eastern Asian locales where a 16-bit character is much
more convenient.

A significant annoyance and performance issue is when such things as
font-measuring have per-character primitive methods that take a UCS32
parameter, and the string must be expanded repeatedly to its individual
characters (or cached as an array of 32-bit values). Bringing the whole
UTF (8 or 16) string right down to the lowest level might eliminate some
of that overhead.

> that's why the world survived with ascii for so long

The world? For 'so long'? For how long exactly was the *American*
Standard the sole character set encoding in *any* non-English-language
environment? The earliest Japanese derivatives of ASCII began to be
*standardised* in 1969 (JIS C 6220), while the US computing community was
still worrying about converting between ASCII and ECBDIC.

SQL injection vulnerabilities in PostgreSQL

Posted Jun 2, 2006 7:45 UTC (Fri) by nim-nim (subscriber, #34454) [Link]

You would be right but for the internet.

When systems were not interconnected, or the interconnect was slow and limited, local encodings worked fine. Nowadays with dataflows all over the world local-encoding-only systems are the exception not the norm (show me an ascii-only system and I'm almost sure any serious investigation will find users frustrated by its encoding limitations)

And anyway optimising for the ascii case when you end up turning i18n anyway is wrong on so many levels (speed, security) I won't expand on it.

Eliminating the problem

Posted Jun 1, 2006 8:28 UTC (Thu) by ncm (subscriber, #165) [Link]

Most injection holes are a result of trying to scrub user input by trying to escape characters from a list of troublemakers (or, worse, not bothering). If, instead, programs would discard (or, if necessary, escape) all characters *except* those in a known-good list, most of the subtleties would vanish. It's much better to eliminate a problem than to patch around it.

Eliminating the problem

Posted Jun 1, 2006 9:24 UTC (Thu) by smitty_one_each (subscriber, #28989) [Link]

Certainly the problem exists for multiple applications across arbitrary platforms. Could something like the LSB champion an unbork_string( some_string ) function that Does The Right Thing, and then just gently ridicule everyone to get on board?
The only winners in a piecemeal approach are the bad guys and the security consultants.

Eliminating the problem

Posted Jun 1, 2006 12:39 UTC (Thu) by jzbiciak (✭ supporter ✭, #5246) [Link]

We can't even get people to agree on strlcpy.... I can't imagine a more complex function getting traction easily.

Eliminating the problem

Posted Jun 1, 2006 13:27 UTC (Thu) by mrshiny (subscriber, #4266) [Link]

The thing is, SQL injection is trivial to prevent; simply use prepared statements with placeholders that are bound using an API. This improves the performance of the system AND increases security... Frankly I'm confused as to why you WOULDN'T use prepared queries.

Consider this example:

String sql = "select * from user_acct where username = '" + username + "' and password = '" + password + "'";
Statement stmt = connection.createStatement(sql);
stmt.executeQuery();

In this case the database will get a new statement every time a new user tries to log in to the system. The database usually caches compiled statements and execution plans to increase performance, but here each statement will look different. Also, it's vulnerable to the SQL injection where the password is ' or 1 = 1--.

Now consider the following example:

String sql = "select * from user_acct where username = ? and password = ?"; PreparedStatement stmt = connection.prepareStatement(sql); stmt.setString(1, username); stmt.setString(2, password); stme.executeQuery();

In this case, there is no chance for SQL injection, and furthermore the database can cache the SQL and execution plan on the server to increase performance, because for each user it is the same. Clearly anyone who DOESN'T use prepared statements is in need of some job training or a new career path.

Eliminating the problem

Posted Jun 1, 2006 14:31 UTC (Thu) by iabervon (subscriber, #722) [Link]

The thing I find even stranger is that it's pretty simple to write an equivalent for StringBuffer that has different methods for appending SQL text and constants, and automatically handles formatting for PreparedStatements. So:

SQLBuffer buffer = new SQLBuffer();
buffer.append("select * from user_acct where username = ").
       add(username).append(" and password = ").
       add(password);
PreparedStatement stmt = connection.prepareStatement(buffer.getSQL());
buffer.fill(stmt);
stmt.executeQuery();

That way, you don't have to worry about getting the variables mixed up, or dealing with the fact that you can't really trust the database driver to handle a java.util.Date.

The example you gave isn't as compact in this form, but that difference goes away if you've got queries where some clause is optional.

Eliminating the problem

Posted Jun 1, 2006 14:50 UTC (Thu) by mrshiny (subscriber, #4266) [Link]

I'd still feel better using a prepared statement, even if there are optional clauses in the statement. In such cases I normally do something like this:

List args = new ArrayList();
String sql = "select * from daily_revenue where transaction type = ? ";
args.add(transType);
if (sinceLastLogin) {
  sql += " and trans_date >= ? "; // for lots of optional clauses, use StringBuffer
  args.add(lastLoginDate);
}

Then later on you just bind all the variables in the order they appeared in the list. Also you can use a var-args function (in Java 1.5 and other languages that support var-args) to automate things like date converstion; if the type of one of the objects in the args list is Date or Calendar or some other non-SQL-friendly type, you can convert it automatically.

Eliminating the problem

Posted Jun 1, 2006 15:12 UTC (Thu) by iabervon (subscriber, #722) [Link]

My version is using a prepared statement. My SQLBuffer contains a StringBuffer and a List, and SQLBuffer.add() appends a "?" to the buffer, and adds the argument to its list, which it goes through in fill() using the loop that you omitted from the end of your example. My version is really identical to yours, except that my SQLBuffer methods abstract the pattern that you're open-coding (and, therefore, it's harder to screw up).

Eliminating the problem

Posted Jun 2, 2006 12:05 UTC (Fri) by smitty_one_each (subscriber, #28989) [Link]

>Frankly I'm confused as to why you WOULDN'T use prepared queries.

Oh, the motives might break down along the traditional compiled/dynamic lines.
I like to have a single function that can transform a the Request.Form into an arbitrary array of SQL statements, particularly for INSERT/UPDATE situations.
For generic text fields, I just replace ' with `, and I'm on my merry way. O`Neal never noticed, though I admit this could simply be "moving the problem".

Eliminating the problem

Posted Jun 2, 2006 19:02 UTC (Fri) by mrshiny (subscriber, #4266) [Link]

The only advantage, that I can think of, is that you can generate complete SQL statements ahead of time in one place, and later on execute them. However, if that is the pattern you wish to accomplish, it's trivial to wrap the generated string and the arguments to bind together in one object. Otherwise, I still don't see the problem... you can generate dynamic sql statements for prepared queries, and bind the parameters afterwards. Where I work we do this all the time; also another poster in this thread has even gone to the lengths of creating an SQL statement abstraction that generates the SQL and stores the parameters to bind in one step. It's easy and foolproof.

Eliminating the problem

Posted Jun 1, 2006 14:42 UTC (Thu) by jschrod (subscriber, #1646) [Link]

But this approach immediately leads to problems in an international context -- because most often it leads to the ban of all non-ASCII characters in names or addresses, as we have experienced so often in the past. But I live in Rödermark, and not in Rodermark or Roedermark, and I want to input that properly. The same holds surely for folks from China or Japan.

Nah, IMNSHO prepared queries with parameters are the only proper way to go.

Cheers, Joachim

Eliminating the problem

Posted Jun 2, 2006 2:45 UTC (Fri) by xoddam (subscriber, #2322) [Link]

> The same holds surely for folks from China or Japan.

Not to mention Iceland :-)

Eliminating the problem

Posted Jun 9, 2006 9:25 UTC (Fri) by aquasync (subscriber, #26654) [Link]

This shouldn't matter, the ö will be replaced with \ö, and then when evaluated as an part of an sql string, it should be turned back into ö (even if it was actually made up of multiple bytes).
That is provided that the escape policy is to replace \[\a-z]|(0-9){3} or whatever with the relevant unescaped thing, and otherwise to just copy the character verbatim into the output string.

Eliminating the problem

Posted Jun 9, 2006 9:03 UTC (Fri) by aquasync (subscriber, #26654) [Link]

Exactly, this seems a lot safer to me.
Perl's quotemeta function works in this way, (``all characters not matching "/[A-Za-z_0-9]/" will be preceded by a backslash...''), and provided SQL's string escapes work in a similar way, there should be no problems.

backslashes

Posted Jun 1, 2006 17:51 UTC (Thu) by rfunk (subscriber, #4054) [Link]

The article mentions that using backslashes as escape characters
exacerpates the problem. Unfortunately a major web-development language
(PHP) encourages using backslashes as escape characters, with its
addslashes() function and magic_quotes_gpc=on default.

The fact that these misfeatures may be deprecated or disrecommended now
doesn't help much, since there's so much old documentation and advice out
there, and so many PHP programmers who barely even understand what
they're copying let alone the concept of SQL injection or multibyte
characters.

Preared statement syntax... is database dependent

Posted Jun 4, 2006 6:12 UTC (Sun) by dps (subscriber, #5725) [Link]

Assuming I read the psoitgresql documentations straight you should change that ? into $1 when using postgreSQL (or Oracle, which I beleive would also allows you to use $foo too).

MySQL and ODBC want a ? as shown in the aticle.

If you are targeting the same query at both MySQL and postreSQL either doing utf8-aware string escaping, not using a multibyte character set (e.g. latin1), or implementing per-backend query syntax conversion is required. My code converts $<number> to ? becuase it makes the string shorter and is therefore easier to implement.

Of course the psotgreSQL fix will test your exception handling capabilites when that query containing invalid utf-8 gets rejected for this reason.

Also note that the 2nd (and 3rd, 4th, 5th and 6th) bytes in UTF-8 must satisfy (v & 0xc0==0x80). 0xc8 0x81 is valid UTF 8. 0xc8 0x5c and 0xc8 0xff are not valid UTF-8. Incidently 0xfe and 0xff *never* appear in valid UTF-8.

SQL injection vulnerabilities in PostgreSQL

Posted Jun 8, 2006 8:57 UTC (Thu) by philips (guest, #937) [Link]

UTF-8 was specifically designed to avoid such problems: all non ASCII characters have byte representation using non-ASCII only characters. And convieniently, all standard control symbols - like single quote, double quote, slash, back slash, percent, space - are in ASCII range.

From all my long experience, UTF-8 can be very inconvinient to handle/etc, but still it saved (and still saves) me from many internationalization headaches: I have to live permanently with three locales.

Copyright © 2006, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds