LWN.net Logo

Zeuthen: Writing a C library, part 1

Zeuthen: Writing a C library, part 1

Posted Jun 28, 2011 13:13 UTC (Tue) by crazychenz (guest, #56983)
Parent article: Zeuthen: Writing a C library, part 1

This article had all types of goodies, but you really have to know what you're looking for to find them. To prevent from being the stereotypical internet pessimist... here are some real comments I have on the article (without peer review.) ;-)

-- Library initialization and shutdown --

"Avoid init() / shutdown() routines - if you can’t avoid them, do make sure they are idempotent, thread-safe and reference-counted."

This is not clear enough. Global init and shutdown should be avoided or treated as non-thread/fork safe. A structure that holds all the state information for a library should have corresponding init() and shutdown() calls.

"Use environment variables for library initialization parameters, not argc and argv."

At a low level, a library should have all parameters set with API calls. One level up would include a call such as get_env_opts() to grab environement variable settings and get_cmd_opts(char *) to parse a well defined command line argument list. Then the library user or executable can decide how to handle the library configuration.

"You can easily have two unrelated library users in the same process - often without the main application knowing about the library at all. Make sure your library can handle that."

Uh... this makes little to no sense. But I'll supplement it with... make sure to clearly document the capabilities and behavior of your library in a multi-threaded or multi-process environment. Simply making something "thread-safe" is pointless without context and usage guidance.

"Avoid unsafe API like atexit(3) and, if portability is a concern, unportable constructs like library constructors and destructors (e.g. gcc’s __attribute__ ((constructor)) and __attribute__ ((destructor)))."

OK... so to put this more simple, if portability (of source code) is of high concern, know your dependencies! Highly portable code should by default avoid compiler dependent extensions, and non-POSIX functions. The exception is allowing for tweaked code to be enabled with CPP (pre-processor macros).

-- Memory management --

"Provide a free() or unref() function for each type your library introduces."

Agreed. But IMHO:
- type_init - sets and allocated type to sane values
- type_new - allocates a type and inits the new allocation
- type_free - deallocates a type
- type_ref, type_getref, type_get - creates a new reference (increments reference count)
- type_unref, type_release, type_put - removes reference (decrements reference count)

Another one I like to include (separate from type_free()) is:
void type_destroy(type_t ** t)
Usually when freeing memory, you should always

free(ptr);
ptr = NULL;

I like to simplify that to a one liner that looks like:

type_destroy(&ptr);

type_destroy exists to decrement reference count, deallocate memory if reference count is zero, and sets pointer to NULL so subsequent "if (ptr)" checks operate as intended.

"Ensure that memory handling consistent across your library."

Memory management should be complete and well defined. As always, it should be clearly described in documentation with trivial and non-trivial examples.

"Note that multi-threading may impose certain kinds of API."

This should be inherit in the programmers skill set, but I agree. OO programming with instanced variables/references and locking mechanisms lend toward a more friendly multi-threaded experience. Static variables, global variables, lend to a less thread safe experience.

My rule of thumb for this is usually:
If there no non-constant variables defined globally, the library is _capable_ of being thread safe. The effort to make the library thread safe depends on the locking or concurrency logic overhead required.

"Make sure the documentation is clear on how memory is managed."

Documentation should always have trivial AND non-trivial examples and a list of potential Pitfalls

"Abort on OOM unless there are very good reasons for handling OOM."

Disagree. There may be valid exceptions to this, and one involves allocations that require _HUGE_ amounts of memory. If this occurs and fails, it is likely something caused by user input and not that the system is low on memory. Prime example is loading a >4GB file into memory on a 32bit system.

Some extra advise (stuff I've struggled with in the past with my own libraries) includes:
- Minimize the usage of "user-defined" void* types.
- When doing memory management, don't forget to think about where the memory your pointers are pointing to is located with respect to the heap or the stack. Nastyness can occur if you free stack memory or don't free heap memory.
- C does not do reference counting, so having a reference counting mechanism for C in a multi-threaded environment should be an absolute requirement. Realistically, you won't be able to always track when to free an allocated type without reference counting. This should especially be considered when using lists or trees to store references to memory.

-- Multiple Threads and Processes --

"Document if and how the library can be used from multiple threads."

Agreed, documentation should include trivial and non-trivial examples and a list of potential pitfalls.

"Document what steps need to be taken after fork() or if the library is now unusable."

You need to understand that your library is now duplicated but looking at the same file descriptors and streams as another as well as having a different pid. To handle this case gracefully, you'll need advanced locking, or IPC techniques. I agree that it'd be safer to just exec() if possible.

"Document if the library is creating private worker threads."
And don't forget about child processes.

Some extra advised that I've struggled with in the past with my own libraries include:
- Another multi-threading pitfall I've found is knowing where/when to create a new reference to memory. In short, you should always create a reference that a thread will use outside and before the thread execution (if applicable.) The issue is when you have multiple threads executing on a structure, at any time a thread can "unref" the memory potentially causing it to be freed. But as long as your current scope has a valid reference, reference counted memory should prevent it from being freed.

As a foot note:
I'd advise you get a peer review on any follow up parts to this series to make the language flow a little better. Other than the roughness of the article, it did have some good advise.

Thanks,
Chenz


(Log in to post comments)

Zeuthen: Writing a C library, part 1

Posted Jun 28, 2011 13:53 UTC (Tue) by davidz2525 (subscriber, #58065) [Link]

>> "You can easily have two unrelated library users in the same process - often without the main application knowing about the library at all. Make sure your library can handle that."
> Uh... this makes little to no sense

The canonical example here is that your application is using library A and library B and both A and B are using library C as a private implementation detail. You would think that this just works out of the box, but often it doesn't mostly because of global variables and other assumptions. I should probably include that example.

Thanks for your other suggestions - for most, I had something in the original text that I decided to cull to make the text shorter as it was already too long. Once the full series is done, my plan is to revise each blog entry in the series and then post another blog entry listing what changes I've made including giving credit (e.g. linking to your comment for changes inspired by it etc.).

Zeuthen: Writing a C library, part 1

Posted Jun 28, 2011 14:02 UTC (Tue) by crazychenz (guest, #56983) [Link]

Got it. Thanks for the clarification.

Chenz

Zeuthen: Writing a C library, part 1

Posted Jun 28, 2011 20:55 UTC (Tue) by JoelSherrill (guest, #43881) [Link]

This is actually a common thing to encounter when using existing libraries in a single process embedded system. Different threads may be completely independent logically but both use a library which has a single global state.

If you are lucky, the library keeps all this data in a structure so you can use what are often called "per task variables" in an RTOS. This adds the contents of a global pointer to the context of a thread. It is then switched in and out with the thread. RTEMS and VxWorks both have these.

In a similar vein, reusing existing libraries used to dynamically
linked, multi-process environments to a single process, statically linked environment can also lead to symbol name conflicts when global symbols have common names.

Zeuthen: Writing a C library, part 1

Posted Jun 29, 2011 5:59 UTC (Wed) by malcolmt (guest, #65441) [Link]

Using environment variables to pass information directly from outside user to a library has a long-established and fairly robust history. In particular for things like debugging, path and locale setting. I agree that libraries should *also* be able to be configured via API params in case the calling program has a need to set the same things, but if the library also knows to look in the environment it means I can type LC_ALL=de_DE and any glibc using program will get it right, instead of only those where the outer program bothered to allow me to set the language. So I disagree with your recommendation there.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds