LWN.net Logo

Workqueues and internal API conventions

The internal kernel API has developed a number of conventions over the years. One of the most prevalent has to do with the return values from functions. In many cases, a function will return zero as an indicator of success, or a negative error code on failure. This convention goes against the normal C conventions for boolean values - a "false" value means that everything is OK. But it reflects the fact that, while all happy functions are alike, every unhappy function is unhappy in its own way. It is useful to be able to return a variety of error codes.

There are exceptions to this convention, however. One of the more famous is copy_to_user() and copy_from_user(), both of which will, on failure, return the number of bytes which were not copied. Back in 2002, Rusty Russell audited 5500 calls to these functions and determined that 415 of them interpreted the return value incorrectly. He proposed changing the interface to match the kernel's conventions, but had no success. See the May 23, 2002 LWN Kernel Page for more on this episode.

More recently, Alan Stern has been burned by the workqueue interface. Functions like queue_work() return a "normal" boolean value - zero on failure, non-zero if the requested work was actually queued. Alan suggested that these functions should be changed, and offered to fix up all in-tree callers in the process. The answer he got back was that fixing the return code would be a good thing, but that the name of the functions should be changed at the same time. Otherwise out-of-tree code could misinterpret the new return value with no indication to the programmer.

The resulting patch does just that. With this patch, the functions for adding work to an arbitrary workqueue become:

    int add_work_to_q(struct workqueue_struct *queue, 
                      struct work_struct *work);
    int add_delayed_work_to_q(struct workqueue_struct *queue,
                              struct work_struct *work,
			      unsigned long delay);
    int add_delayed_work_to_q_on(int cpu,
                                 struct workqueue_struct *queue,
				 struct work_struct *work,
				 unsigned long delay);

As expected, these functions return zero on success and a negative error code (-EBUSY) on failure. The return code makes sense because the only reason for the operation to fail in current code is if the given work_struct is already on a workqueue.

Similar changes have been made to the functions which operate on the generic, shared workqueue (schedule_work() and friends). They are now:

    int add_work(struct work_struct *work);
    int add_delayed_work(struct work_struct *work, unsigned long delay);
    int add_delayed_work_on(int cpu, struct work_struct *work,
                            unsigned long delay);

In all each case, wrapper functions with the old names have been provided so that out-of-tree code which has not been updated will not break. Most of the time, anyway. It seems that most in-tree callers never bothered to check the return value from these functions in the first place, and Alan has concluded that out-of-tree callers will be the same. So the new version of the old functions are declared as void, returning no value at all. Instead, they log a warning when an operation fails. As a result of this change, code which actually checks the return value will fail to compile, and, presumably, the author will update it to the new functions. Everything else will continue to run as it always did.

Alan has also proposed an addition to the kernel coding style document. It reads (in part):

If the name of a function is an action or an imperative command, the function should return an error-code integer. If the name is a predicate, the function should return a "succeeded" boolean.

There does not seem to be much disagreement over this proposal, so that is likely to be how things go. This convention is still not likely to extend to copy_to_user() and copy_from_user(), however.


(Log in to post comments)

Workqueues and internal API conventions

Posted Aug 31, 2006 21:57 UTC (Thu) by giraffedata (subscriber, #1954) [Link]

This seems to fly in the face of Linux kernel convention. The policy has always been, "keep in-tree code clean; screw out-of-tree code." I can't count the number of times that policy has been implemented. Here, we have in-tree code dirtied by having two functions for the same thing, for the purpose of backward compatibility with some out-of-tree code.

I wonder why there wasn't objection.

Workqueues and internal API conventions

Posted Sep 6, 2006 19:19 UTC (Wed) by roelofs (guest, #2599) [Link]

I wonder why there wasn't objection.

There's a huge difference between compile-time failures and silent runtime failures. Linus has made it very clear that he will not accept other forms of "functionality regression" at the end-user level, and I suspect this is a logical expression of that same pragmatic philosophy.

Greg

Workqueues and internal API conventions

Posted Aug 31, 2006 22:05 UTC (Thu) by giraffedata (subscriber, #1954) [Link]

This convention goes against the normal C conventions for boolean values - a "false" value means that everything is OK

This is a strange observation. First, there's no C convention that a "false" boolean value means "something's wrong," and second, there's no C convention that the return code of a function is a boolean value.

The strongest C convention for return codes, which is actually older than C, is that the return code is an error code -- it tells what's wrong. 0 is a value that naturally says, "nothing."

Workqueues and internal API conventions

Posted Sep 7, 2006 10:12 UTC (Thu) by anandsr21 (guest, #28562) [Link]

The reasoning for the Unix convention is that success can only happen in one way while Failure can happen in a number of ways. Also it can only be the unique number 0 because other's will not be able to fulfill the requirement for uniqueness as there are many negative numbers and many positive numbers. I also use this convention for my code. Unless the function name implies a test it should follow this condition.

Workqueues and internal API conventions

Posted Sep 11, 2006 20:32 UTC (Mon) by devinjones (guest, #11272) [Link]

...while all happy functions are alike, every unhappy function is unhappy in its own way
Ah yes, the Anna Karenina principle.

Copyright © 2006, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds