|
|
Subscribe / Log in / New account

Driver porting: Zero-copy user-space access

This article is part of the LWN Porting Drivers to 2.6 series.
The kiobuf abstraction was introduced in 2.3 as a low-level way of representing I/O buffers. Its primary use, perhaps, was to represent zero-copy I/O operations going directly to or from user space. A number of problems were found with the kiobuf interface, however; among other things, it forced large I/O operations to be broken down into small chunks, and it was seen as a heavyweight data structure. So, in 2.5.43, kiobufs were removed from the kernel.

This article looks at how to port drivers which used the kiobuf interface in 2.4. We'll proceed on the assumption that the real feature of interest was direct access to user space; there wasn't much motivation to use a kiobuf otherwise.

Zero-copy block I/O

The 2.6 kernel has a well-developed direct I/O capability for block devices. So, in general, it will not be necessary for block driver writers to do anything to implement direct I/O themselves. It all "just works."

Should you have a need to perform zero-copy block operations, it's worth noting the presence of a useful helper function:

    struct bio *bio_map_user(struct block_device *bdev, 
                             unsigned long uaddr,
			     unsigned int len,
			     int write_to_vm);

This function will return a BIO describing a direct operation to the given block device bdev. The parameters uaddr and len describe the user-space buffer to be transferred; callers must check the returned BIO, however, since the area actually mapped might be smaller than what was requested. The write_to_vm flag is set if the operation will change memory - if it is a read-from-disk operation. The returned BIO (which can be NULL - check it) is ready for submission to the appropriate device driver.

When the operation is complete, undo the mapping with:

    void bio_unmap_user(struct bio *bio, int write_to_vm);

Mapping user-space pages

If you have a char driver which needs direct user-space access (a high-performance streaming tape driver, say), then you'll want to map user-space pages yourself. The modern equivalent of map_user_kiobuf() is a function called get_user_pages():

    int get_user_pages(struct task_struct *task, 
                       struct mm_struct *mm,
		       unsigned long start, 
		       int len, 
		       int write, 
		       int force,
		       struct page **pages, 
		       struct vm_area_struct **vmas);

task is the process performing the mapping; the primary purpose of this argument is to say who gets charged for page faults incurred while mapping the pages. This parameter is almost always passed as current. The memory management structure for the user's address space is passed in the mm parameter; it is usually current->mm. Note that get_user_pages() expects that the caller will have a read lock on mm->mmap_sem. The start and len parameters describe the user-buffer to be mapped; len is in pages. If the memory will be written to, write should be non-zero. The force flag forces read or write access, even if the current page protection would otherwise not allow that access. The pages array (which should be big enough to hold len entries) will be filled with pointers to the page structures for the user pages. If vmas is non-NULL, it will be filled with a pointer to the vm_area_struct structure containing each page.

The return value is the number of pages actually mapped, or a negative error code if something goes wrong. Assuming things worked, the user pages will be present (and locked) in memory, and can be accessed by way of the struct page pointers. Be aware, of course, that some or all of the pages could be in high memory.

There is no equivalent put_user_pages() function, so callers of get_user_pages() must perform the cleanup themselves. There are two things that need to be done: marking of modified pages, and releasing them from the page cache. If your device modified the user pages, the virtual memory subsystem may not know about it, and may fail to write the pages to permanent storage (or swap). That, of course, could lead to data corruption and grumpy users. The way to avoid this problem is to call:

    SetPageDirty(struct page *page);

for each page in the mapping. Current (2.6.3) kernel code checks to ensure that pages are not reserved first with code like:

    if (!PageReserved(page))
        SetPageDirty(page);

But pages mapped from user space should not, normally, be marked reserved in the first place.

Finally, every mapped page must be released from the page cache, or it will stay there forever; simply pass each page structure to:

    void page_cache_release(struct page *page);

After you have released the page, of course, you should not access it again.

For a good example of how to use get_user_pages() in a char driver, see the definition of sgl_map_user_pages() in drivers/scsi/st.c.


to post comments

Driver porting: Zero-copy user-space access

Posted Feb 13, 2004 14:34 UTC (Fri) by grisu1976 (guest, #19435) [Link]

I don't really understand why the kiobuf interface does not exist anymore. In linux kernel 2.4 the kiobuf interface used get_user_pages, or am i wrong? The kiobuf interface was easier to use than get_user_pages - that's my opinion

Driver porting: Zero-copy user-space access

Posted Mar 3, 2004 7:55 UTC (Wed) by bhepple (guest, #2581) [Link]

Hmmm, a quick recursive grep through the 2.6.3 driver source and include files showed exactly 0 users of set_page_dirty_lock() and 1 user of put_page() (in drivers/char/agp/generic.c)

There _is_ a
#define page_cache_release(page) put_page(page)
in include/linux/pagemap.h and it is quite a popular little chap in the device driver code with 13 hits in the entire tree.

Am I missing something or should we be using page_cache_release instead of put_page and is it (and set_page_dirty_lock) _really_ needed after all - I can hardly believe all those drivers are causing "data corruption and grumpy users"...

Driver porting: Zero-copy user-space access

Posted Nov 2, 2005 21:29 UTC (Wed) by rwbowman (guest, #33561) [Link]

TRUE or FALSE - if I map and lock a user space buffer using get_user_pages during ioctl, and then allow the ioctl to return, the pages will remnain locked until I release them with page_cache_release.

I've heard (twice now) where folks say they don't remain locked.?
Thanks!

Driver porting: Zero-copy user-space access

Posted Feb 15, 2006 17:17 UTC (Wed) by ceb (guest, #35717) [Link] (3 responses)

Does anyone have any experience in using get_user_pages in a real system. As far as I can tell it is totally unreliable when the machine is in any way loaded. As well as getting zero addresses for pages returned, the physical addresses don't seem to correspond to the data to be transferred. This is on various flavors of Linux 2.6 based kernels.

This seems to rule out performing DMA directly from user space but I would like to be told that I'm wrong.

Driver porting: Zero-copy user-space access

Posted Jun 2, 2007 1:48 UTC (Sat) by yltian (guest, #45556) [Link]

Please "grep get_user_pages linux/drivers/media/video/video-buf.c". It's a good example. V4L2 uses this method to share memory betweent DMA and application.

Driver porting: Zero-copy user-space access

Posted Sep 6, 2007 15:28 UTC (Thu) by dnevil (guest, #47212) [Link] (1 responses)

Sounds like your system is allocating memory from ZONE_HIGHMEM (since you are on a heavily loaded system, and you are probably running i386). The comments in include/linux/mm_types states that if page->virtual is null then the page is not mapped into kernel virtual memory (i.e. it is HIGHMEM). You should try allocating your memory as soon after system boot as possible. Are there other 'tricks' for assuring that user-allocated memory comes from ZONE_NORMAL or ZONE_DMA?

Driver porting: Zero-copy user-space access

Posted Sep 6, 2007 16:31 UTC (Thu) by dnevil (guest, #47212) [Link]

Try booting with a kernel parameter of "highmem=0".

Driver porting: Zero-copy user-space access

Posted Jul 20, 2007 12:43 UTC (Fri) by bartjes (guest, #46350) [Link] (3 responses)

As I am quite new with Linux, searching the entire morning to find a solution for getting data out of my driver to a user application as efficient as possible, I stumble accross deprecanted mechanisms, function prototypes, and sometimes even a bit of explanation.
All this looks very complex to me and I REALLY could use a basic example where a piece of shared memory is allocated, freed, passed from driver to user-spacve or vice versa, and how the driver fills information into the structure and how the user process can read this.

It's sad that while searching all morning, an example is nowhere to be found, or it's way too complicated to understand, does compile with my terminal full of errors.....

Any suggestion would be helpful, Thanks, Bart

Driver porting: Zero-copy user-space access

Posted Dec 18, 2008 6:37 UTC (Thu) by bmasood (guest, #55659) [Link]

Bart,

I am facing the exact problem that you mentioned: Need to get my data from device to user space as efficiently as possible. I was wondering if you did find any example?

Thanks
Bilal

Driver porting: Zero-copy user-space access

Posted Mar 13, 2009 18:26 UTC (Fri) by will (guest, #46624) [Link] (1 responses)

> For a good example of how to use get_user_pages() in a char driver, see the definition of sgl_map_user_pages() in drivers/scsi/st.c.

Driver porting: Zero-copy user-space access

Posted May 8, 2013 21:37 UTC (Wed) by Ajaxelitus (guest, #56754) [Link]

Another, much more modern and very tiny example may be found at

https://gist.github.com/17twenty/2930467


Copyright © 2003, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds