How Do I Make This Hard to Misuse? [LWN.net]

How Do I Make This Hard to Misuse?

Posted Mar 31, 2008 18:53 UTC (Mon) by ajross (guest, #4563) [Link] (6 responses)

I'm surprised to find a huge one missing: Don't provide two ways to do the same thing if one of them is wrong.

Rusty is talking about kernel code, so I guess he might be assuming a higher quality of developer than I am. But misunderstandings of "thick" APIs is probably the source of more "API misuse" (and other bugs and misfeatures) than anything else I'm aware of.

All library code has this disease to some extent or another (Java has it like the plague), and what it means in the real world is that coders with limited understanding of the library as a whole go thumbing through the documentation looking for a gadget that does what they want, and then plug it in, all the while failing to realize that another gadget would have been a much better choice. And since they don't understand the library as a whole, they don't have a prayer of understanding the tradeoffs and misfeatures that result from their choice.

The solutions to this are either (1) make developers understand the design of the libraries they use at a deep level, or (2) write libraries with a minimal but complete feature set, such that developers don't get stuck using the wrong tool for the job. Choice 1 is clearly preferable but very expensive -- there aren't many such developers.

Option 2 seems like a much better choice. The only downsides are that more "boilerplate" code must be written to make up for the lack of "convenience" (ugh) functionality. And, I guess, there will be objections from people who don't understand this "minimal" aesthetic and want to choose the wrong solution from the choices offered by a much larger library.

Personal plug: here's my take on "minimal but complete" functionality as expressed in a small embeddable scripting language: http://plausible.org/nasal.

How Do I Make This Hard to Misuse?

Posted Mar 31, 2008 20:43 UTC (Mon) by nix (subscriber, #2304) [Link]

Hear, hear, indeed. Forget Java: Win32 is the truly horrible example in 
this area, with dozens of ways to do some fundamental things, all with 
different (often poorly documented) shortcomings.

How Do I Make This Hard to Misuse?

Posted Mar 31, 2008 21:11 UTC (Mon) by i3839 (guest, #31386) [Link] (1 responses)

Nasal looks interesting.

> Small! 146k source code.

Is that in lines of code? ;-)

How Do I Make This Hard to Misuse?

Posted Mar 31, 2008 21:24 UTC (Mon) by ajross (guest, #4563) [Link]

Goodness no: kilobytes of C code.  And it includes only the core library stuff, and ignores
extension code (currently readline, pcre, sqlite, gtk and cairo) and soft-coded libraries
(there's an XML parser and a few other gadgets).  The current code in CVS is a little bigger
now, at 158k.

Looking at line endings as the LOC metric I count 5507 lines.  Lines with semicolons are an
easy trick to use if you want a measure that ignores comments (i.e. code complexity, not
verbosity), and there are 2620 of those.  I'm sure there are others out there, but I don't
much care.  The point is that it's small. :)

How Do I Make This Hard to Misuse?

Posted Mar 31, 2008 21:31 UTC (Mon) by rgmoore (✭ supporter ✭, #75) [Link] (2 responses)

what it means in the real world is that coders with limited understanding of the library as a whole go thumbing through the documentation looking for a gadget that does what they want, and then plug it in, all the while failing to realize that another gadget would have been a much better choice. ... The solutions to this are either (1) make developers understand the design of the libraries they use at a deep level, or (2) write libraries with a minimal but complete feature set,

I think that you're missing an important possibility: write really good documentation. Sometimes you really want a dozen gadgets, each of which is different from the others in some small but important way. That's a lot less likely to trip up your users if you:

Group closely related functions in the documentation, so that it's easy for users to compare the different functions and pick the best one,
Cross-reference the functions, so that anyone who reads the documentation for functionA also knows that there are closely related functionA1, functionA2, etc., and/or
Provide a high level overview of the functional category, complete with a discussion of the differences between the functions and when you might want to use one instead of another.

I know that when I'm writing documentation for closely related functions, I try to do some or all of these. It's very helpful when coming back to use something I wrote a few years ago to find notes like "function_A does not guarantee that data will be written in any particular order. If output order is important, use function_B instead."

How Do I Make This Hard to Misuse?

Posted Mar 31, 2008 21:44 UTC (Mon) by ajross (guest, #4563) [Link] (1 responses)

I think that you're missing an important possibility: write really good documentation.

I limited my treatment to techniques that have been proven to actually work in practice. :)

Seriously: what you describe would be great. I've just never seen anything like it. At best (or at least the best I've seen), you get documentation like what Sun provides for the JDK: a very clean, readable, hyperlinked guide to a true rats nest of a library that only a tiny elite class of Java gurus actually understand.

The core problem being that great documentation does nothing for those who don't read it, and the sheer size of modern libraries guarantees that users won't read the documentation. You can get around this by finding developers who can read and distill only the core architecture from the effusive documentation, but then you're basically implementing a version of my "option 1" above.

For that matter, good developers tend to get the least relative benefit from all that "convenience code" anyway, and are happy to write the 2-3 lines of boilerplate needed to turn their iterator output into an array, or vice versa, etc... Which means that even given a guarantee that only talented developers will use your library, you're still better off making it minimal than you are adding functionality.

How Do I Make This Hard to Misuse?

Posted Mar 31, 2008 22:12 UTC (Mon) by nix (subscriber, #2304) [Link]

Oh, believe me, even when I describe exactly what functions do in the 
header files, complete with examples... they *still* don't get read, or 
people read the first line and ignore the DO NOT DO THIS in screaming 
flashing red with associated MIDI of a screaming police siren (or the 
closest I can get to that in source code). If I make whateveritis not 
compile if misused, it gets hacked by someone else so that it *does* 
compile when misused, because 'that was easier'. (No it bloody wasn't.)

Given that I work in the financial sector I'm tempted to see if I can 
write something which if misused in an unlikely way transfers the contents 
of the misuser's bank account into mine, and document this as a failure 
mode. I'd be rich within the week! ;}

reasons for kmalloc GFP_ATOMIC...

Posted Mar 31, 2008 20:10 UTC (Mon) by hmh (subscriber, #3838) [Link] (1 responses)

AFAIK, kmalloc in fact CANNOT know if it could sleep or not.  As it was explained to me in the
in_atomic() thread on LKML, that information just doesn't exist in the kernel right now.  You
simply have to know in which context you are, and tell everyone about it (thus, GFP_ATOMIC).

reasons for kmalloc GFP_ATOMIC...

Posted Mar 31, 2008 21:57 UTC (Mon) by jengelh (guest, #33263) [Link]

And you might even have a reason to call it with GFP_ATOMIC even if you have a user context
and could, theoretically, sleep!

How Do I Make This Hard to Misuse?

Posted Apr 1, 2008 5:41 UTC (Tue) by jd (guest, #26381) [Link]

A few others:

Types should cover the legal values and as little beyond that as efficiency constraints permit
Know thine invariants (...and preconditions ...and postconditions)
The argument list should be what is needed by the code and nothing more, in content and format
A clean solution is superior to an oblique one, insofar as a clean solution exists
An oblique solution is superior to an oblique bug
If it doesn't work when compiled and optimized, but tests fine the rest of the time, you are testing what you are thinking and not what you are coding

How Do I Make This Hard to Misuse?

Posted Apr 1, 2008 5:41 UTC (Tue) by jzbiciak (guest, #5246) [Link] (13 responses)

One of my favorite C APIs to love to hate: fputc(int, FILE *) and fgets(char *, int, FILE *)

Why is the file pointer the last argument?! As Rusty pointed out, "context" arguments such as handles (FILE * in this case) idiomatically belong at the front, just like in fprintf(FILE *, const char *, ...).

How Do I Make This Hard to Misuse?

Posted Apr 1, 2008 10:21 UTC (Tue) by xbobx (subscriber, #51363) [Link] (12 responses)

> Why is the file pointer the last argument?!

That would probably be an optimization.  C function calling convention is that arguments are
pushed onto the stack in reverse order, so with this function the FILE pointer is pushed
first.  Then the caller is free to manipulate the stack without touching the FILE pointer, and
possibly call these functions multiple times.  Otherwise, code that repeatedly gets input
from/outputs to stdout (e.g., *everything*, at least when libc was designed) has to push/pop
the stdout FILE ptr around _every_ call to these functions.

Not that this would ever be noticeable on modern hardware, just saying...

How Do I Make This Hard to Misuse?

Posted Apr 1, 2008 11:14 UTC (Tue) by jzbiciak (guest, #5246) [Link] (11 responses)

Sounds plausible (and quaint!), but I'm not sure I'm following the scenario. I'm interpreting "reverse order" as "left most argument pushed last." So, suppose putchar(int c) is a wrapper around fputc(int c, FILE *f):

int putchar(int c)
{
    return fputc(c, stdout);
}

So, in pseudo-code, the resultant assembly ought to look roughly like this (assuming tail-call optimization):

    POP 'c' into a register
    PUSH 'stdout'
    PUSH 'c' back on stack
    JUMP to fputc and let it return for us.

It seems like if arguments went in reverse order, then adding the FILE * argument at the beginning would be the optimization:

    PUSH 'stdout'
    JUMP to fputc and let it return for us

I must be missing something.

How Do I Make This Hard to Misuse?

Posted Apr 1, 2008 17:22 UTC (Tue) by vmole (guest, #111) [Link] (7 responses)

My guess is that it's even simpler than that: we've got a function that puts a character to stdout, and we need one that puts to an arbitrary FILE *, so we'll add an argument.

Many years ago, I worked on a Data General. The COPY and MOVE commands took, as a first argument, the destination file or directory. This made some sort of sense from an implementation point of view, but, as the TA warned at the time, "Sooner or later *all* of you are going to overwrite a source file." I think we all did.

How Do I Make This Hard to Misuse?

Posted Apr 7, 2008 7:49 UTC (Mon) by liljencrantz (guest, #28458) [Link] (6 responses)

Kind of how the ln command in unix works, then? I've always found this massively unintuitive.

How Do I Make This Hard to Misuse?

Posted Apr 7, 2008 14:38 UTC (Mon) by nix (subscriber, #2304) [Link] (5 responses)

So do I, but I think that's because of overexposure to C, where that sort 
of thing is helpfully always the other way around.

If you think about it, ln(1) is perfectly consistent with cp(1): it 
creates or updates (for directories) the last thing you list.

How Do I Make This Hard to Misuse?

Posted Apr 7, 2008 17:17 UTC (Mon) by vmole (guest, #111) [Link] (4 responses)

It's consistent, kinda. The problem comes when describing: "cp foo bar" translates as "copy _foo_ to _bar_" okay, but the obvious translation of "ln foo bar" to "link _foo_ to _bar_" doesn't; the latter seems to say _bar_ is the original, at least to my taste. You have to process it as "create a link to _foo_ named _bar_". Or just memorize it. :-)

How Do I Make This Hard to Misuse?

Posted Apr 8, 2008 8:24 UTC (Tue) by IkeTo (subscriber, #2122) [Link] (3 responses)

> but the obvious translation of "ln foo bar" to "link _foo_ to _bar_" doesn't

I see this problem as an inaccuracy of the translation "link _foo_ to _bar_".  This seems to
imply that both _foo_ and _bar_ are pre-existing, and somehow a "link" is created between them
as a result of running the command.  Obviously not what is done by "ln".  It is instead to
"build a link to _foo_ called _bar_".  The cp is to "make a copy of _foo_ called _bar_".
Pretty consistent to me.

How Do I Make This Hard to Misuse?

Posted Apr 8, 2008 12:13 UTC (Tue) by jzbiciak (guest, #5246) [Link] (2 responses)

Although, since the final argument can be a directory, perhaps the best connector for both is
"at":

Make a copy of foo _at_ bar
Make a link to foo _at_ bar

Or in the plural case:

Make copies of foo, bar, baz, quux _at_ dest
Make links to  foo, bar, baz, quux _at_ dest

How Do I Make This Hard to Misuse?

Posted Apr 9, 2008 3:06 UTC (Wed) by roelofs (guest, #2599) [Link] (1 responses)

Also keep in mind that the target for ln is optional. Thus:

   ln -s foo/bar .
   ln -s foo/bar

...are equivalent. Sadly, cp foo/bar doesn't quite work.

Greg

How Do I Make This Hard to Misuse?

Posted Apr 9, 2008 3:33 UTC (Wed) by jzbiciak (guest, #5246) [Link]

That aspect of 'cp' always drove me nuts, probably because I learned MS-DOS first.  I've found
myself tempted to write a wrapper around 'cp' to make that form work.

I won't, though, only because I know it'll wreak havoc when I go to use someone else's account
for whatever reason.  (e.g. to show them how to do something.)

How Do I Make This Hard to Misuse?

Posted Apr 1, 2008 18:00 UTC (Tue) by felixfix (subscriber, #242) [Link]

Possibly push the file handle, push the char, call, replace the char at top of stack, call,
repeat.  But I am just guessing, and hate working on code like that.  I spent two years
dealing with the memory constraints that made such code tempting, and despised it.

How Do I Make This Hard to Misuse?

Posted Apr 2, 2008 4:27 UTC (Wed) by xbobx (subscriber, #51363) [Link] (1 responses)

True, for that specific case it is better the other way around. But suppose you have a function such as:

void print_strings(FILE *stream, int num_strings, const char **list) {
    int i;
    for (i = 0; i < num_strings; i++) {
        fputs(list[i], stream);
    }
}

In this case, the assembly for this function will push stream once, then just push/pop n pointers to strings onto the stack and call fputs to print all of the strings. One could imagine that this would be useful when, say, implementing fprintf or other similar higher-level functions which all output to the same FILE *.

How Do I Make This Hard to Misuse?

Posted Apr 2, 2008 4:36 UTC (Wed) by jzbiciak (guest, #5246) [Link]

See?  I knew I was missing something.  Thanks to you and felixfix both, since you are both
describing the same particular optimization.  Cute, in an ugly, quaint way.  :-D

I'm certain I'm guilty of far worse horrors in my 2 decades of assembly programming.  This
optimization never occurred to me since I've always managed to pass arguments in registers.

(An odd fluke of history, that.  I've written whole-program assembly where I control all the
conventions, C with inline-asm only (no function calls from asm) on machines with stack-based
calling conventions, and C callable asm on machines with register based calling conventions.)

How Do I Make This Hard to Misuse?

Posted Apr 1, 2008 6:42 UTC (Tue) by olecom (guest, #42886) [Link]

> 9. The compiler/linker won't let you get it wrong.
>
>    As a C person, I like that[...]
> compile errors (it evalates sizeof(char[1-2*!!(cond)]) which won't
> compile if cond is true).
>
>    I use this in the kernel's module_param(name, type, perm) macro to
> check that the read/write permissions for the module parameter are sane
> (a common mistake was to specify 644 instead of 0644).
[]
> 1. Read the correct mailing list thread and you'll get it right.
>
>    The reason the some strange interface quirk exists might be for
> compatibility with some strange OS or compiler, weird corner case or even
> older versions of this codebase. In other words, historical reasons ("see,
> on the VAX we only had 6 characters for..."). You sometimes only find this
> when you send a patch to fix it and the original author yells at you.
>
>    Sometimes they add it to the FAQ. That does not increase the interface's
> score very much: please try harder.

Q: don't you think streaming editor can handle that?
A: our tools have not such thing

http://article.gmane.org/gmane.linux.kernel/659995

When they will go out of the C box, or just programing language box?
Extending gcc to to waste more time, yes!
_________

Open-before-use

Posted Apr 1, 2008 10:56 UTC (Tue) by epa (subscriber, #39769) [Link] (2 responses)

It's hard for the compiler to ensure that the user calls your "open" routine before your other routines, but an "assert()" can at least get you to this level.

In C++ it would be normal practice to make the 'open' routine the constructor, so you automatically have to call it first before any member functions. But you can do this in C too, if your functions all take a handle argument and open() is the only one that generates such a handle.

Open-before-use

Posted Apr 1, 2008 12:59 UTC (Tue) by nlucas (guest, #33793) [Link]

But you can't do it in practice many times, because it's normal to need to "reopen" the
resource (because of a connection error, the usb device was disconnected, you don't know the
resource name beforehand, etc.), which means the added logic for this case is just the same as
not opening it in the constructor (the default constructor, at least) and providing
"open"/"Close" methods.

Open-before-use

Posted Apr 1, 2008 15:35 UTC (Tue) by NAR (subscriber, #1313) [Link]

That still wouldn't solve the problem, one could write code like this, even if only an "open" would return a valid handle:

handle_t* handle;
read(handle);

How Do I Make This Hard to Misuse?

Posted Apr 1, 2008 16:28 UTC (Tue) by piggy (guest, #18693) [Link]

> 5. Do it right or it will always break at runtime.

The person who taught me my testing skills (Kevin Curry) had a nice way to phrase this:
"Programmers fix core-dumps, so make sure that you dump core."