Amateurs
Posted Oct 29, 2014 22:48 UTC (Wed) by HelloWorld (guest, #56129) [Link]
Amateurs
Posted Oct 29, 2014 23:18 UTC (Wed) by chirlu (subscriber, #89906) [Link]
Amateurs
Posted Oct 29, 2014 23:27 UTC (Wed) by dskoll (subscriber, #1630) [Link]
The bug is indeed non-obvious and it's a direct consequence of the horror that is PHP.
Any real scripting language (Perl, Python) distinguishes between arrays and hashes. PHP conflates the two, yielding this bug.
Amateurs
Posted Oct 30, 2014 13:46 UTC (Thu) by oldtomas (guest, #72579) [Link]
Like ummm... Lua?
No -- I don't particularly like PHP. But I think the main problem PHP has is a *culture problem*.
To illustrate, my most recent war story: I inherited a PHP project here (the users _love_ it, which is important). It had one embedded copy of PDO[1] inside, so I thought "oh, good, at least that). Came sysadmins and told me to do the database on port 3307 instead of MySQL's[2] standard 3306. "No prob" I thought, "just change the DSN in the configuration file".
By now you'll guess: it didn't work. Some debugging later (and thanks to a tip-off from some PHP admin here) I discovered, to my dismay, that the app was taking apart the DSN *with regular expressions* and doing a mysql_connect itself.
Note that no language (mis)feature went into this. It's this attitude of "getting shit done, no matter what".
At this point, if someone loves PHP and wants to do a service to the language and the environment, I think the best they can do is to comb the internet in search for "bad patterns" and set them straight. To me, *this* is the language/environment's biggest liability.
[1] A database abstraction library, half-way to an ORM. Kinda DBI, for you Perl folks. Most importantly here: it allows one to do SQL without sprintf
[2] Another world of fail in itself, with high affinity to PHP.
[3] DSN is a short descriptor on how to connect to a database, which is a kind of informal standard (inspired, AFAIK on Microsoft's ODBC)
Amateurs
Posted Oct 30, 2014 14:22 UTC (Thu) by tialaramex (subscriber, #21167) [Link]
You say it's not a language problem, but in my experience a leading cause of code that tries to pull DSNs apart is libraries that don't take a DSN but instead present only a bunch of separate parameters like hostname and port number. In an expansive sense (considering the language as encompassing also libraries for common things like database access) this is a language problem.
Amateurs
Posted Oct 30, 2014 15:21 UTC (Thu) by oldtomas (guest, #72579) [Link]
It was inadequate, that's right -- but this wasn't the point I was trying to make. Rather that although the app programmers incur the cost of including PDO (which does parse the DSN correctly) they still go and hack it themselves, and in a gruesome way. No idea why, perhaps the doc was too hard to read, or it didn't occur to the developer "gee, there must be a function in PDO to do this". Or the lib was doing strange things.
Of course I do similar hacks at times, to "get things up-n-running", but then I always try to come back and clean up the mess. And in the meantime there is a comment at this spot documenting that I'm not (yet) doing the right thing.
It's exactly this feeling "there must be a better way to do this" what I miss most in this culture.
Amateurs
Posted Oct 30, 2014 14:27 UTC (Thu) by dskoll (subscriber, #1630) [Link]
Like ummm... Lua?
I've never used Lua; my only scripting experience is with Perl, Python and Tcl. Perl and Python distinguish between (arrays, lists) and (dictionaries, hashes). Tcl doesn't have numerically-indexed arrays; it only has hashes.
I think the main problem PHP has is a *culture problem*.
Oh, I agree. The average PHP "coder" is a rank amateur, and the comments on the annotated PHP manual are nothing but fodder for thedailywtf.com. But I still believe a large part of PHP's problem is the language design itself.
Amateurs
Posted Oct 30, 2014 14:46 UTC (Thu) by anselm (subscriber, #2796) [Link]
Tcl doesn't have numerically-indexed arrays; it only has hashes.
Yes it does have numerically-indexed arrays, they're called “lists”. The accessor operations are a bit wordy – what Python would call “a[0][0]“ is “[lindex a 0 0]” and “a[0][0] = "foo"” in Python becomes “lset a 0 0 "foo"” –, but under the hood Tcl lists are essentially arrays, and the accessors are O(1) operations.
Amateurs
Posted Oct 30, 2014 22:02 UTC (Thu) by dskoll (subscriber, #1630) [Link]
Yes it does have numerically-indexed arrays, they're called “lists”.
I was unaware that [lindex $list $n] was an O(1) operation in Tcl. That must be a relatively new change. However, the Tcl performance wiki is rather opaque about this. It says: "lindex is now O(1) on lists that already have an internal representation. [...] KBK: But be aware that you may inadvertently copy a Tcl_Obj if you're not careful, and destroy the performance advantage."
So... huh? :)
Amateurs
Posted Oct 30, 2014 22:45 UTC (Thu) by anselm (subscriber, #2796) [Link]
“lindex” has been O(1) on (internal) lists since Tcl 8.0, which came out in the late 1990s, so it is not exactly a new change.
The thing to watch out for is Tcl's habit of transparently converting between a list's string representation and its internal representation as essentially a C array as the need arises. These conversions are of course not O(1), and if to do “lindex” you need to convert your list from the string representation to the internal array representation first you are going to take a hit, even though Tcl will use the internal representation from then on for as long as it can. Tcl is really quite clever to avoid such conversions as far as possible but occasionally something's gotta give.
You want to make sure that you only use list operations on a list; if on every other access you treat it as a string (which Tcl will let you do because way back around the last Ice Age it basically only knew about strings) then the back-and-forth conversions will kill your performance stone dead.
Arrays
Posted Oct 31, 2014 6:43 UTC (Fri) by mp (subscriber, #5615) [Link]
What about JavaScript? It's an interesting one, isn't it? And it's growing.
Arrays
Posted Oct 31, 2014 7:34 UTC (Fri) by niner (subscriber, #26151) [Link]
Arrays
Posted Oct 31, 2014 15:09 UTC (Fri) by jezuch (subscriber, #52988) [Link]
Somewhat, somehow. I never remember if iterating an array will give you a list of values or a list of indices, like if you were iterating a hash with integer indices. I remember I got bitten by something like that but it's been some time since I did serious coding in JavaScript, so...
For the record: I rather like it as a language. But it has... warts.
Arrays
Posted Oct 31, 2014 19:28 UTC (Fri) by mathstuf (subscriber, #69389) [Link]
I prefer thinking of JavaScript as a wart with some nice bits of language in it. :)
Arrays
Posted Nov 1, 2014 14:24 UTC (Sat) by ibukanov (subscriber, #3942) [Link]
The PHP bug is still possible in JavaScript as for (i in array) loops over all all enumeratable properties of the array. For example,
var i, a = [1, 2];
a.x = 3;
for (i in a) { console.log(i); }
prints 0 1 x. However, almost all JS guides advise to use an explicit for (i = 0; i < array.length; i++) loop form that is immune to the problem.
Arrays
Posted Nov 2, 2014 0:44 UTC (Sun) by mchapman (subscriber, #66589) [Link]
Maybe it's just because I've been doing Perl for too long, but it seems to me that any language that doesn't distinguish arrays from hashes at the syntactic level is potentially vulnerable to this sort of problem. Arrays and hashes in Perl look different and are used differently because they *are* different.
Arrays
Posted Nov 5, 2014 13:00 UTC (Wed) by job (guest, #670) [Link]
Amateurs
Posted Nov 4, 2014 12:59 UTC (Tue) by gerv (subscriber, #3376) [Link]
http://blog.gerv.net/2014/10/new-class-of-vulnerability-i...
Gerv
Amateurs
Posted Nov 4, 2014 21:49 UTC (Tue) by dskoll (subscriber, #1630) [Link]
Yes, the scalar vs list context distinction in Perl is a trap for the unwary. :(
Overall, though, Perl is not a giant ball of fail the way PHP is.
Amateurs
Posted Nov 5, 2014 5:52 UTC (Wed) by gerv (subscriber, #3376) [Link]
Amateurs
Posted Oct 29, 2014 23:20 UTC (Wed) by cortana (subscriber, #24596) [Link]
New vulnerability class?
Posted Oct 30, 2014 0:03 UTC (Thu) by cesarb (subscriber, #6266) [Link]
I'm sure that whoever designed the language found it a brilliant idea to have one concept instead of two, but it's harder to explain, harder to implement, and just asking for trouble.
It's harder to explain: the same structure can be viewed as a linear sequence (append/prepend and fast indexing by position), as a mapping (set of key/value pairs, fast insertion, removal, and indexing by key), or as a bizarre hybrid of both.
It's harder to implement: it looks deceptively simple (just use the position in the sequence as an integer key), but for a higher performance implementation (which you'll want since it's a central concept in the language), you'll want to separate the sequential part (stored as a simple array of the values) from the mapping part (stored as a hash or a tree).
It's just asking for trouble: most uses will treat it either as an array (list of items) or as a key/value store (map from key to value, or sometimes set of values), but rarely as both at the same time. The corner cases are confusing; suppose you have a sequence of 3 items, add a key/value pair, and append another item to the sequence. What is the key for the new item? Does the answer change depending on the key from the just inserted pair?
In this vulnerability, the programmer expected a sequence, and was handed a mapping. And in something reminiscent of a mass assignment vulnerability, the full power of this combined data structure is available in the parameters passed through the HTTP request.
I think this is a new vulnerability class which doesn't have a name yet. I'd call it "sequence/mapping confusion", or more generally "confused data type" (by analogy with "confused deputy").
To defend against this vulnerability class, every "tainted" variable should be scrubbed into either a pure sequence or a pure mapping (trivial on languages which use separate types for both) before being used for anything, and all uses of a single variable should be consistent (never use a sequence method on a mapping variable or a mapping method on a sequence variable). As shown in this vulnerability, "foreach ($data as $i => $value)" is a mapping method; it should never be used on a sequence, even if it works.
New vulnerability class?
Posted Oct 30, 2014 11:15 UTC (Thu) by grawity (subscriber, #80596) [Link]
most uses will treat it either as an array (list of items) or as a key/value store (map from key to value, or sometimes set of values), but rarely as both at the same time
And unfortunately it's going to be hard to get rid of, since there are large widely-used PHP-based projects that use such horrors in configuration files:
array(
'saml:SP',
'authproc' => array(
50 => array(
'class' => 'core:AttributeAdd',
'%replace',
'uid' => array('guest'),
),
),
),
New vulnerability class?
Posted Oct 30, 2014 11:47 UTC (Thu) by endecotp (guest, #36428) [Link]
Let's call it "weak typing".
New vulnerability class?
Posted Oct 30, 2014 12:03 UTC (Thu) by dskoll (subscriber, #1630) [Link]
I think this is a new vulnerability class which doesn't have a name yet.
We should name this vulnerability "A knife-sharp corner-case in PHP's fractal of bad design
A "highly critical public service announcement" from Drupal
Posted Oct 29, 2014 23:44 UTC (Wed) by job (guest, #670) [Link]
Copyright © 2014, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds