|
|
Subscribe / Log in / New account

Driver porting: the BIO structure

This article is part of the LWN Porting Drivers to 2.6 series.
The block layer in 2.4 (and prior) kernels was organized around the buffer head data structure. The limits of buffer heads have long been clear, however. It is hard to create a truly high-performance block I/O subsystem when the underlying buffer head structure forces each I/O request to be split into 512-byte chunks. So one of the first items on the 2.5 block "todo" list was the creation of a way to represent block I/O requests that supported higher performance and greater flexibility. The result was the new BIO structure. [Crude BIO diagram]

BIO basics

As with most real-world code, the BIO structure incorporates a fair number of tricky details. The core of the structure (as defined in <linux/bio.h>) is not that complicated, however; it is as appears in the diagram to the right. The BIO structure itself contains the usual collection of housekeeping information, along with a pointer (bi_io_vec) pointing to an array of bio_vec structures. This array represents the (possibly multiple) segments which make up this I/O request. There is also an index (bi_idx) giving an offset into the bi_io_vec array; we'll get into its use shortly.

The bio_vec structure itself has a simple definition:

    struct bio_vec {
	struct page	*bv_page;
	unsigned int	bv_len;
	unsigned int	bv_offset;
    };

As is increasingly the case with internal kernel data structures, the BIO now tracks data buffers using struct page pointers. There are some implications of this change for driver writers:

  • Data buffers for block transfers can be anywhere - kernel or user space. The driver author need not be concerned about the ultimate source or destination of the data.

  • These buffers could be in high memory, unless the driver author has explicitly requested that bounce buffers be used (Request Queues I covers how to do that). The driver author cannot count on the existence of a kernel-space mapping for the buffer unless one has been created explicitly.

  • More than ever, block I/O operations are scatter/gather operations, with data coming from multiple, dispersed buffers.

At first glance, the BIO structure may seem more difficult to work with than the old buffer head, which provided a nice kernel virtual address for a single chunk of data. Working with BIOs is not hard, however.

Getting request information from a BIO

A driver author could use the information above (along with the other BIO fields) to get the needed information out of the structure without too much trouble. As a general rule, however, direct access to the bio_vec array is discouraged. A set of accessor routines has been provided which hides the details of how the BIO structure works and eases access to that structure. Use of these routines will make the driver author's job easier, and, with luck, will enable a driver to keep working in the face of future block I/O changes.

So how does one get request information from the BIO structure? The beginning sector for the entire BIO is in the bi_sector field - there is no accessor function for that. The total size of the operation is in bi_size (in bytes). One can also get the total size in sectors with:

    bio_sectors(struct bio *bio);

The function (macro, actually):

    int bio_data_dir(struct bio *bio);

returns either READ or WRITE, depending on what type of operation is encapsulated by this BIO.

Almost everything else requires working through the bio_vec array. The encouraged way of doing that is to use the special bio_for_each_segment macro:

    int segno;
    struct bio_vec *bvec;

    bio_for_each_segment(bvec, bio, segno) {
	/* Do something with this segment */
    }

Within the loop, the integer variable segno will be the current index into the array, and bvec will point to the current bio_vec structure. Usually the driver programmer need not use either variable; instead, a new set of macros is available for use within this sort of loop:

struct page *bio_page(struct bio *bio)
Returns a pointer to the current page structure.

int bio_offset(struct bio *bio)
Returns the offset within the current page for this operation. Block I/O operations are often page-aligned, but that is not always the case.

int bio_cur_sectors(struct bio *bio)
The number of sectors to transfer for this bio_vec.

char *bio_data(struct bio *bio)
Returns the kernel virtual address for the data buffer. Note that this address will only exist if the buffer is not in high memory.

char *bvec_kmap_irq(struct bio_vec *bvec, unsigned long *flags)
This function returns a kernel virtual address which can be used to access the data buffer pointed to by the given bio_vec entry; it also disables interrupts and returns an atomic kmap - so the driver should not sleep until bvec_kunmap_irq() has been called. Note that the flags argument is a pointer value, which is a departure for the usual convention for macros which disable interrupts.

void bvec_kunmap_irq(char *buffer, unsigned long *flags);
Undo a mapping which was created with bvec_kmap_irq().

char *bio_kmap_irq(struct bio *bio, unsigned long *flags);
This function is a wrapper around bvec_kmap_irq(); it returns a mapping for the current bio_vec entry in the given bio. There is, of course, a corresponding bio_kunmap_irq().

char *__bio_kmap_atomic(struct bio *bio, int i, enum km_type type)
Use kmap_atomic() to obtain a kernel virtual address for the ith buffer in the bio; the kmap slot designated by type will be used.

void __bio_kunmap_atomic(char *addr, enum km_type type)
Return a kernel virtual address obtained with __bio_kmap_atomic().

A little detail which is worth noting: all of bio_data(), bvec_kmap_irq(), and bio_kmap_irq() add the segment offset (bio_offset(bio)) to the address before returning it. It is tempting to add the offset separately, but that is an error which leads to weird problems. Trust me.

Completing I/O

Given the information from the BIO, each block driver should be able to arrange a transfer to or from its particular device. Note that a helper function (blk_rq_map_sg()) exists which makes it easy to set up DMA scatter/gather lists from a block request; we'll get into that when we look at request queue management.

When the operation is complete, the driver must inform the block subsystem of that fact. That is done with bio_endio():

    void bio_endio(struct bio *bio, unsigned int nbytes, int error);

Here, bio is the BIO of interest, nbytes is the number of bytes actually transferred, and error indicates the status of the operation; it should be zero for a successful transfer, and a negative error code otherwise.

Other BIO details

The bi_private field in the BIO structure is not used by the block subsystem, and is available for the owner of the structure to use. Drivers do not own BIOs passed in to their request function and should not touch bi_private there. If your driver creates its own BIO structures (using the functions listed below, usually), then the bi_private field in those BIOs is available to it.

As mentioned above, the bi_idx BIO field is an index into the bi_io_vec array. This index is maintained for a couple of reasons. One is that it can be used to keep track of partially-complete operations. But this field (along with bi_vcnt, which says how many bio_vec entries are to be processed) can also be used to split a BIO into multiple chunks. Using this facility, a RAID or volume manager driver can "clone" a BIO into multiple structures all pointing at different parts of the bio_vec array. The operation is quick and efficient, and allows a large operation to be quickly dispatched across a number of physical drives.

To clone a BIO in this way, use:

    struct bio *bio_clone(struct bio *bio, int gfp_mask);

bio_clone() creates a second BIO pointing to the same bio_vec array as the original. This function uses the given gfp_mask when allocating memory.

BIO structures contain reference counts; the structure is released when the reference count hits zero. Drivers normally need not manipulate BIO reference counts, but, should the need arise, functions exist in the usual form:

    void bio_get(struct bio *bio);
    void bio_put(struct bio *bio);

Numerous other functions exist for working with BIO structures; most of the functions not covered here are involved with creating BIOs. More information can be found in <linux/bio.h> and block/biodoc.txt in the kernel documentation directory.


to post comments

bi_private

Posted Mar 27, 2003 8:57 UTC (Thu) by axboe (subscriber, #904) [Link] (3 responses)

Good article, but one thing needs to be corrected concerning the use of bi_private. This field is _owned_ by whoever owns the bio, so it's definitely not for free use by the block driver (unless the block driver itself allocated the bio, of course)! In fact, this is a very important point as otherwise stacking drivers cannot work properly.

So in short, you may only look/modify bi_private if you are the owner of the bio.

bi_private

Posted Mar 27, 2003 16:55 UTC (Thu) by corbet (editor, #1) [Link]

Hey, if that's the only thing I messed up, I'm happy. The article has been tweaked accordingly, thanks.

bi_private

Posted Mar 28, 2003 22:13 UTC (Fri) by Peter (guest, #1127) [Link]

Good article, but one thing needs to be corrected concerning the use of bi_private.

Don't listen to this "axboe" character. He doesn't know anything about the BIO subsystem.

bi_private

Posted Dec 17, 2006 22:30 UTC (Sun) by test5073 (guest, #42204) [Link]

I am new to writing drivers. Do you have an example code? I would appreciate if anyone could point me to some example of redirecting the IO.

Thanks,

GKO

Driver porting: the BIO structure

Posted Apr 16, 2004 8:49 UTC (Fri) by rf (guest, #20877) [Link] (1 responses)

Hello,

I would like to read from a device more as 8 sectors on one time.

somebody has an idea ?

thanks.

Driver porting: the BIO structure

Posted Apr 16, 2004 14:49 UTC (Fri) by rf (guest, #20877) [Link]

reading from device with struct bio goes well, but if the read-size is bigger as 16K => 32 sectoren it fails.... (32K, 64K , 128K)
I get an error.

Here I must reboot.


what is the reason for that?

bio_endio

Posted Jul 7, 2004 21:42 UTC (Wed) by ccoffing (guest, #22874) [Link]

Is the following statement outdated?
    When the operation is complete, the driver must inform the block subsystem of that fact. That is done with bio_endio()

Over on http://lwn.net/Articles/27055/ you say

    Note that drivers should not call bio_endio() as transfers complete; the block layer will take care of that.

Driver porting: the BIO structure

Posted Jul 20, 2004 8:33 UTC (Tue) by rf (guest, #20877) [Link]

Any devices allow to transfer fixed amount of bytes per one request.
SATA allows only 100KB = 200 sectors to read/write from/to device per one request.
Can I change this value(), for example to 1M oder ....

maybe :
request_queue_t *q;
q = bdev_get_queue(bdev);
q->max_sectors = XXXXX;

I am unsafe .... ?


thanks...

bio_for_each_segment() and bio_cur_sectors()

Posted Jan 8, 2005 1:27 UTC (Sat) by roman (guest, #24157) [Link]

Contrary to the article, it's not reliable to use bio_cur_sectors() within a bio_for_each_segment(bvec, bio, segno) loop. The problem is that bio_cur_sectors() uses bio->bi_idx to index into bio->bi_io_vec, but bio_for_each_segment() uses segno as an index and (rightfully) leaves bio->bi_idx unchanged.

This also applies to bio_page(), bio_offset(), and bio_data().

Creating BIO structure

Posted Feb 17, 2005 4:48 UTC (Thu) by explorer (guest, #27894) [Link] (2 responses)

Hi,
I want to create my own bio structure to write the data to a block disk.

I have block device details in terms of dev_t, I have data in my driver allocated pages.

How I can achieve this?

* Use bio_alloc to get bio
* Using bio_add_page add my own buffer (got using page_alloc) to the bio structure
* Construct the block_device *bi_bdev structure { Actually I have dev_t type for my block device, how I can get corresponding block_device structure?), construct the gendisk structure
* submit_bio

Is there any way to get block_disk structure from dev_t type? get_bdisk returns an object of type block_disk, but its gendisk structure is empty.

Regards,
Explorer

Creating BIO structure

Posted Feb 22, 2005 17:28 UTC (Tue) by vabank (guest, #28037) [Link] (1 responses)

there are fs/block_dev.c entry point:
struct block_device *open_by_devnum(dev_t dev, unsigned mode)

Creating BIO structure

Posted Feb 23, 2005 7:14 UTC (Wed) by explorer (guest, #27894) [Link]

Hi vabank,

Thanks for your reply.

I have already tested that api open_by_devnum.
But it oops.

The problem is open_by_devnum(block_device *bd, int mode)
calls blkdev_get function which makes use of bd->bd_inode field to get the block device structure.

How I can get the inode field for a dev_t structure?

Sorry for the inconvenience.

Regards,
Explorer.

Driver porting: the BIO structure

Posted Apr 2, 2005 5:40 UTC (Sat) by transfer168 (guest, #28972) [Link]

I mount a device by " losetup /dev/loop0 file". Would you please tell me how to get information (blk_size, blksize_size, hardsect_size, read_ahead, etc.) in 2.6 kernel space?

Thanks For any suggestions.

How to support partitions in Block driver

Posted Aug 19, 2005 13:03 UTC (Fri) by jbmukund (guest, #31946) [Link]

Dear all,

I have few basic queries regarding partition implementation in my Block driver.
I am working on 2.6.10 kernel(x86 architecture).
My driver works fine with normal devices & fails to work when trying to access the partitioned devices.

How do enable partion support in my Blcok driver?
can some commnet please?

Regards,
Mukund Jampala

Driver porting: the BIO structure

Posted Dec 14, 2005 15:15 UTC (Wed) by mirage.cn (guest, #34548) [Link]

Ask another question:

If my driver want a contigous virtual address space for all the pages in bio(or bvec), how can I get it by an easy way?

(as the 2.4 kernel can suppport a 'nice kernel virtual address for the data)

thanks

Driver porting: the BIO structure

Posted Dec 26, 2017 12:31 UTC (Tue) by SysManOne (guest, #118298) [Link] (2 responses)

Hello !

Is there updates according to new kernel changes ?

Driver porting: the BIO structure

Posted Apr 12, 2018 4:20 UTC (Thu) by badman250 (guest, #123617) [Link] (1 responses)

I find nothing, the newest BIO structure changed a lot

Driver porting: the BIO structure

Posted Aug 28, 2018 8:09 UTC (Tue) by SysManOne (guest, #118298) [Link]

Too old.
Now there is a lot of macro shit is supposed to be used to access BIO's fields.
Block device driver change bi_iter, so u need to save it to use in u own "end_bio()".


Copyright © 2003, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds