Generic red-black trees
Posted Jun 16, 2012 4:16 UTC (Sat) by
daniel.santos (subscriber, #85158)
In reply to:
Generic red-black trees by ppisa
Parent article:
Generic red-black trees
Since the start of this, my documentation said to define your compare functions as accepting the actual types of your key pointers, but in version 4, I've solved the problem of the compare function's type not being verified (at least, when using the RB_DEFINE_INTERFACE macro). So it will still call compare as type rb_compare_f from the generic functions, but if the compare function you supply to the macro doesn't match up, it will generate an understandable error or warning. Here, I've given it the wrong return type (line 65 is where the macro is called):
/my/path/grbtest2.c: In function ‘__rb_sanity_check_my_objects_rel’:
/my/path/grbtest2.c:65:1: error: passing argument 1 of ‘__rb_verify_compare_fn_ret’ from incompatible pointer type [-Werror]
In file included from /my/path/grbtest.h:30:0,
from /my/path/grbtest2.c:21:
include/linux/rbtree.h:917:20: note: expected ‘long int *’ but argument is of type ‘unsigned int *’
cc1: all warnings being treated as errors
And here, I've defined the compare function as accepting types of const u32 instead of const *u32 (line 57 is the compare function's definition).
/my/path/grbtest2.c: In function ‘__rb_sanity_check_my_objects_rel’:
/my/path/grbtest2.c:65:1: error: passing argument 1 of ‘compare_u32’ makes integer from pointer without a cast [-Werror]
/my/path/grbtest2.c:57:15: note: expected ‘u32’ but argument is of type ‘u32 *’
I've implemented similar type checks for all other parameters as well, each calling a static inline function that tells you what it's checking for, i.e., __rb_verify_root(), __rb_verify_left(), etc. That way, you'll have a better chance at knowing what's actually wrong. In the generated code examinations I've made thus far, I haven't seen any compilers actually emit code for these type checks.
As far as calling compare by pointer, you will be most pleased to learn that as of version 4.6.? (4.6.0 maybe?, definitely in 4.6.2), gcc is now capable of optimizing out a call-by-constant-pointer-to-inline(able)-function. Thus, in gcc 4.6.2, optimization is perfect. I mentioned in my previous post the optimization from 4.1 to 4.5 was "acceptable", by this I mostly mean that gcc 4.5.3 did everything that it should do, except inline the call to the compare function (I don't have as many details from 4.1 to 4.4). In fact, the reason I had to add the __flatten attribute was because without it, neither gcc 4.6.3 nor 4.7.0 would inline the call-by-constant-struct-member-function-pointer (i.e., passing compare in struct rb_relationship), but it would if I passed it directly as a function parameter. I'm still not certain if this is a bug or I hit the max pseudo-instructions and it stopped inlining, but the result is actually smaller & faster code.
But anyway, thanks for these links! I've been studying everything I can find that might give me hints on how to make this better. One qunadry I'm still sorting out about my compare function is dealing with compound keys. Currently, if you have a compound key (that is, two fields that need comparing), you basically choose a key to pass for the key offset, and then in your compare function, use container_of() or rb_entry() to get a pointer to your struct and then access your other key(s). Here's an example for one rbtree in drivers/mtd/ubi/wl.c.
static inline long compare_ubi_wl_entry(const int *a_ec, const int *b_ec) {
long diff = (long)*a_ec - (long)*b_ec;
if (!diff) {
diff = (long)container_of(a_ec, struct ubi_wl_entry, ec)->pnum -
(long)container_of(b_ec, struct ubi_wl_entry, ec)->pnum;
}
return diff;
}
When you call insert, you will always have a full struct who's key will be examined by pointer. The drawback is that when you call find, you generally don't have a struct sitting around to access its key member and you run into the same issue as with Kent's implementation of needing to stick one on the stack. For this reason, I've been considering allowing the user to specify two different compare functions: one accpeting key pointers, and another accepting struct pointers, optionally using the same compare function for both (this would bloat the rbtree.h source code, but not the generated code).
I also meant to address my choice for choosing the return type of long over int in my last post. I made this decision when working on the fair scheduler, as it seemed to allow for a more optimal compare function than int when you need to compare 64-bit values. When doing this on a 64-bit build arch, you just compare them, but on a 32-bit build, you still have to do a bit more work since you have to fit the comparision result in a 32-bit long. But if I choose int, I would have to do that amount of work on both archs. Personally, I really wish there was a way in C to say "compare these two values, and use CPU's negative & zero flags on these branches", because I don't see gcc optimizing out code like this (on AMD64):
int diff = a > b ? 1 : (a < b ? -1 : 0);
if (diff > 0) {
// do something
} else if (diff < 0) {
// do something else
} else {
// do something else else
}
Maybe I just need to file a gcc bug for this, because in every case, I've seen gcc store the 1, -1 or 0 in a register and compare it to zero later, and unless I missed something, I didn't see any instructions that would change those flags in between.
(
Log in to post comments)