LWN.net Logo

Kiobufs removed

One of the advantages of the new "commits" mailing list is that one can see the patches which slip quietly into the kernel without public discussion. One of those is this patch by Christoph Hellwig, via Andrew Morton, which removes the "kiobuf" infrastructure from the kernel. This patch has been merged by Linus, and will show up in the 2.5.43 development kernel.

The kiobuf structure was developed by Stephen Tweedie as a way, initially, of implementing the raw block I/O devices in the 2.3 development series. Using kiobufs, kernel code can perform operations directly to and from user-space buffers without having to worry about walking page tables, pinning pages into memory, and so on. Kiobufs did the job they were designed to do, and they found their way into a number of kernel developments.

Not everybody was happy with the kiobuf interface, however. Many saw it as a heavyweight structure, requiring a lot of time (and memory) to set up and tear down. Kiobufs also forced the splitting of large I/O operations into small chunks - often as small as a single 512-byte sector, but never larger than 64KB. As a result, kiobufs never became the high-performance I/O mechanism that it was intended to be.

So what replaces kiobufs in the 2.5 kernel? Modern direct I/O code uses the get_user_pages() function:

        int get_user_pages (struct task_struct *tsk,
                            struct mm_struct *mm,
                            unsigned long start, int len,
                            int write, int force, 
                            struct page **pages, 
                            struct vm_area_struct **vmas);

This function faults in len user pages starting at start, and locks them into the page cache. Return values include the struct page pointers (in pages) and pointers to the associated VMA structures (in vmas); either can be NULL if the caller is not interested in that information. Code which used kiobufs will want the struct page pointers, which can be used to set up DMA operations or other direct transfers; most callers do not need the VMA pointers. The pages should be passed (individually) to page_cache_release() when the operation is complete.

The asynchronous I/O patches have also, at times, included a new kvec structure which looks like a lighter, faster version of kiobufs. No patches with kvecs have been merged by Linus, however.

Kiobufs, meanwhile, have reached a dead end. It's worth remembering, though, that kiobufs were the pioneering effort into the use of struct page pointers for direct I/O. The code may be gone, but the lessons learned from kiobufs live on in the current implementation.


(Log in to post comments)

Kiobufs removed

Posted Oct 23, 2002 23:37 UTC (Wed) by bhepple (guest, #2581) [Link]

Judging by the acrimony over kiobufs which is still visible in usenet, they won't be missed. In their defence I would say they have often been used (well, by me at least) far outside their intended ambit because of the lack in Linux of a decent alternative for what is really a very common and simple if cryptic need - that filled, it seems, by get_user_pages(), and I say that in the fond hope of a solution without having had the chance to look through the source yet.

I'm sure there must be many other driver writers who've had to use the nasty things (kiobufs) just to accomplish what get_user_pages() promises to deliver - that is, the ability to to pin user space from the driver and obtain physical addresses for direct DMA to & from PCI devices (or presumably any other type of DMA).

I have to provide support for this operation on a multitude of OS's - Solaris, OpenServer, AIX, HP-UX and UnixWare - and I can attest that the Linux solutions to date for this problem have been crude at best. My solutions over the last 3 years or so (and I qualify all this by saying that I'm no expert on the Linux kernel, I just work there, so I may well have missed something) was bounce buffers on 2.2.x and kiobufs on 2.4.4. And what a bumpy road that was.

Getting large buffers in 2.2.x was hairy - especially on SMP machines. Without pre-allocating a permanent chunk of RAM at system startup, something I could not impose on my users, the only way was the heap. But the kernel heap was often fragmented and limited to maybe as little as 1 page betimes. This boiled down to DMA failures on busy systems or after a variable period of activity in spite of drivers that initially appeared to work soon after booting. Welcome to the twilight zone.

On 2.4.x kiobufs were indeed heavyweight and needed special and hearty programming to make them perform well without sucking up all the resources in the machine. I often felt it would have been easier to walk the darn VM pages and pin them myself. But the code was just too inscrutable so I was stuck with kiobufs and the feeling that I was using them way out of their intended milieu. Of course it didn't help that the kiobufs implementation went from being fairly lightweight at 2.4.0 to a monster at 2.4.7 or thereabouts.

By comparison, the older systems such as Solaris, AIX and HP-UX have canned interfaces for this function. It's such a common need and while I would blush and cross my fingers behind my back if I said that anything in AIX was easy, the other systems handled it with panache.

I am looking forward to browsing through the get_user_pages() code and seeing if there is anything there I can scalp into my 2.4.x kernels. If someone has already done this, I'd be most grateful to hear about the experience, and I encourage Messrs. Corbet & Rubini to add their priceless prose to the subject and explain it all to us lesser mortals.

I am, after all, a very lazy programmer.

Copyright © 2002, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds