API changes in the 2.6 kernel series
This article will be updated to keep track of the internal changes for each 2.6 kernel release. Its permanent location is:
http://lwn.net/Articles/2.6-kernel-api/
If you are looking for changes prior to 2.6.26, you'll find them on the older version of this page.
Last update: September 9, 2009.
2.6.31 (September 9, 2009)
- There is a new workqueue function:
int __cancel_delayed_work(struct delayed_work *work);
Unlike cancel_delayed_work(), it will not wait to ensure that the work function is not actually running.
- There is a new atomic function:
int atomic_dec_and_mutex_lock(atomic_t *cnt, struct mutex *lock);
This function will decrement cnt and, if cnt reaches zero, it will acquire the given lock.
- A number of block layer request
queue API changes have been merged; all drivers must now dequeue
requests before executing them. Beyond that, the merging of the
storage topology patches (in preparation for 4K-sector disks) mean
that block drivers must now distinguish between the physical block
size on the disk and the logical block size used by the kernel.
- The 32-bit x86 architecture now supports the atomic64_t
type.
- The kernel memory leak
detector has been merged at last. The kmemcheck kernel memory
checker to detect the use of uninitialized memory has also been merged.
- The fsnotify backend has been merged. This code provides a new,
common implementation for dnotify and inotify; it also will serve as
the base for the "fanotify" code (formerly TALPA), which has not been
merged as of this writing.
- Tree read-copy update (RCU) is now the
default, though Classic RCU is still available.
- Changes to the include/asm-generic header files were merged.
These changes are meant to serve as a model for or be used directly by
new architectures rather than copying from an existing architecture.
The S+core (score) architecture depends on these changes and the
MicroBlaze architecture will be using them to clean up its ABI.
- All network drivers have converted to the new net_device_ops API and the
old API available with COMPAT_NET_DEV_OPS has been removed.
- The rfkill core has been rewritten for devices that implement a way to
stop all radio transmission from the device (in response to a laptop
key for turning off wireless, for example). Various drivers have also
been updated to use the new rfkill API.
- Debugfs has had all of its references throughout the tree turned into
/sys/kernel/debug/ in both documentation and code. In
addition, LWN's updated guide to
debugfs was added to the Documentation directory.
- Unicode handling in the kernel has been updated, with functions like
utf_mbstowcs() being renamed to utf8s_to_utf16s() for
better readability.
- The TTM GPU memory manager (covered a bit
over a year ago) has been merged.
- Quite a bit of Big Kernel Lock (BKL) removal code has been merged in
the fs/ tree. Now, all of the super_operations and
address_space_operations are called without holding the BKL.
- IRQF_SAMPLE_RANDOM, which governs whether a device's interrupts
are used as an entropy source, has been added to the
feature-removal-schedule.
- The memory debugging infrastructure for DRM has been
removed. "
It hasn't been used in ages, and having the user tell you how much memory is being freed at free time is a recipe for disaster even if it was ever used.
" - David Miller is now the IDE subsystem maintainer, taking over from Bartlomiej Zolnierkiewicz, in a friendly handoff. Miller plans to put IDE into maintenance-only mode.
2.6.30 (June 9, 2009)
- The threaded interrupt
handlers patch has been merged, making it possible for drivers to
set up an interrupt handler which runs in its own thread. Over the
long term, it is hoped that drivers will move in this direction,
eventually making it possible to remove facilities like tasklets.
- The adaptive spinning mutex patch has been merged. This change will
cause mutexes to behave more like spinlocks in the contended case. If
(and only if) the lock is held by code running on a different CPU, the
mutex code will
spin on the assumption that the lock will be released soon. This behavior
results in significant performance improvements. Btrfs, which had its own spinning mutex
implementation, has been converted to the new mutexes.
- There is a new set of functions added to the crypto API which allow
for piecewise compression and decompression of data.
- The bus_id member of struct device is gone; code
needing that information should use the dev_name() macro
instead.
- There is a new timer function:
int mod_timer_pending(struct timer_list *timer, unsigned long expires);
It is like mod_timer() with the exception that it will not reactivate an already-expired timer.
- There have been some changes around the fasync() function in
struct file_operations. This function is now responsible for
maintaining the FASYNC bit in struct file; it is
also now called without the big kernel lock held. Finally, a positive
return value from fasync() is mapped to zero, meaning that
the return value from fasync_helper() can be returned
directly by fasync().
- The SCSI layer has a new support library for object storage device
support; see Documentation/scsi/osd.txt for details.
- The x86 "subarchitecture" mechanism has been removed, now that no
architectures actually use it. The Voyager architecture has been
removed as a result of these changes.
- x86 is also the first architecture to use a new per-CPU memory
allocator merged for 2.6.30. This allocator changes little at the API
level, but it will provide for more efficient and flexible per-CPU
variable management.
- Support for compressing the kernel with the bzip2 or lzma algorithms
has been added. Support for the old zImage format has been
removed.
- The asynchronous function
call infrastructure is now enabled by default.
- The DMA operations debugging
facility has been merged.
- The owner field of struct proc_dir_entry has been
removed, causing lots of changes throughout the tree.
- There is a new memory debug tool controlled by the PAGE_POISONING
configuration variable. Turning this feature on causes a pattern to
be written to all freed pages and checked at allocation time. The
result is "a large slowdown," but also the potential to catch a number
of use-after-free errors.
- The new function:
int pci_enable_msi_block(struct pci_dev *dev, int count);
allows a driver to enable a block of MSI interrupts.
- As part of the FS-Cache work, the "slow work" thread pool mechanism
has been merged. Some have expressed the hope that it would become
the One True Kernel Thread Pool, but there seems to be little progress
in that direction. See this
article and Documentation/slow-work.txt for more
information.
- There is a pair of new printing functions:
int vbin_printf(u32 *bin_buf, size_t size, const char *fmt, ...); int bstr_printf(char *buf, size_t size, const char *fmt, const u32 *bin_buf);
The difference here is that vbin_printf() places the binary value of its arguments into bin_buf. The process can be reversed with bstr_printf(), which formats a string from the given binary buffer. The main use for these functions would appear to be with Ftrace; they allow the encoding of values to be deferred until a given trace string is read by user space.
- Also added is printk_once(), which only prints its message
the first time it is executed.
- The "kmemtrace" tracing facility has been merged. Kmemtrace provides
data on how the core slab allocations function. See Documentation/vm/kmemtrace.txt for
details.
- A number of ftrace changes have been merged. There is a workqueue tracer which tracks the operations of workqueue threads. The blktrace block subsystem tracer can now be used via ftrace. The new "event" tracer allows a user to turn on specific tracepoints within the kernel; tracepoints have been added for various scheduler and interrupt events. "Raw" events (with binary-formatted data) are available now. The new "syscall" tracer is for tracing system calls.
2.6.29 (March 23, 2009)
- The massive task credentials
patch set has been merged. This code reorganizes the handling of
process credentials (user ID, capabilities, etc.). One of the
immediate implications of this change is direct references to
credential-oriented fields in the task structure need to be changed;
for example, current->user->uid becomes
current_uid(). See Documentation/credentials.txt for a
description of the new API.
- The ftrace code has seen a lot of internal changes. The function
tracing feature has seen a number of improvements, and the developers
have added
mechanisms to profile the behavior of if statements,
provide function call graphs,
obtain user-space stack traces, and
follow CPU power-state transitions.
- Most of the callback functions/methods associated with the
net_device structure have been moved out of that structure
and into the new struct net_device_ops. In-tree drivers
have been converted to the new API.
- The priv field has been removed from struct
net_device; drivers should use netdev_priv() instead.
- The generic PHY layer now has power management support. To that end,
two new methods - suspend() and resume() - have been
added to struct phy_driver.
- The networking layer now supports large receive offload (or
"generic receive offload") operation.
- The NAPI API has been cleaned up somewhat; in particular, functions
like netif_rx_schedule(), netif_rx_schedule_prep(),
and netif_rx_complete() have lost the unneeded struct
net_device parameter.
- The poll() file operation is now allowed to sleep; see this article for more
information on this change.
- The CPU mask mechanism, used to represent sets of processors in the
system, is in the middle of being massively reworked. The problem is
that CPU masks were often put on the stack, but, as the number of
processors grows, the stack lacks room for the mask. The new API is designed to
get these masks off the stack, and to guard against anybody ever
trying to put one back. See this
posting by Rusty Russell for details on this work.
- An infrastructure for
asynchronous function calls has been merged. This code is still a
work in progress, though, and, for 2.6.29, it will not be activated in
the absence of the fastboot command-line parameter.
- The exclusive I/O memory
allocation functions have been merged.
- There is a new synchronous hash interface called "shash." It
simplifies the use of synchronous hash operations while allowing the
same tfm to be used simultaneously in different threads. All in-tree
users have been switched to the new API.
- The hrtimer code has been simplified with the removal of variable
modes for callback functions. All processing is now done in hardirq
context.
- A new set of LSM hooks has been added; these support pathname-based
security operations. With the merging of these hooks, one major
obstacle to the inclusion of security modules like AppArmor and TOMOYO
has been removed.
- The kernel will now refuse to build with GCC 4.1.0 or 4.1.1; those
versions have unfortunate bugs which prevent the building of a working
kernel. Versions 3.0 and 3.1 have also been deemed to be too old and
will not be supported in 2.6.29.
- Video4Linux drivers now use a separate v4l2_file_operations
structure to hold their VFS-like callbacks. The prototypes of a
number of these functions have been changed to remove the
inode argument.
- Video4Linux2 has also acquired a new "subdevice" concept, meant to
reflect the fact that video "devices" tend to be, in reality, a set of
cooperating devices. See the new
document for a description of how this mechanism works.
- Two new functions - stop_machine_create() and
stop_machine_destroy() - allow the independent creation of
the threads used by stop_machine(). That, in turn, lets
those threads be created before trying to actually stop the machine,
making that operation more resistant to failure.
- The exports for a number of SUNRPC functions have been changed to
GPL-only.
- The internal MTD (memory technology device) API has seen significant changes aimed at supporting larger devices (those requiring 64-bit sizes).
2.6.28 (December 24, 2008)
- Discard request
and request timeout handling have been added to the block layer; a
number of other internal API changes have been made as well. See this article for details.
- Video4Linux2 drivers no longer have their open() function
called with the big kernel lock held. The lock_kernel()
calls have been pushed down into individual drivers within the
mainline tree; external drivers will need to be fixed.
- A number of tracing-related patches have been merged. These include
the tracepoints
mechanism, some instrumentation in the core scheduler code,
improvements to the ftrace function tracing feature,
a new ftrace-based stack tracer,
a new ftrace-based boot (initcall) tracer, and
the low-level trace
buffer code.
- The sysctl strategy() function prototype has changed: the
unused name and nlen parameters have been removed.
- Asynchronous I/O support can now be configured out of the kernel,
saving about 7KB of space on systems where AIO is not needed.
- As planned, device_create_drvdata() has been renamed to
device_create(), with the same parameters.
- There is now a mechanism to enable and disable output from
pr_debug() and dev_dbg() calls on a per-module
basis. Control is through a virtual file in debugfs. There is no
documentation file associated with this change; instructions on how
to use this feature can be found in the
patch changelog.
- The new dev_WARN() function:
dev_WARN(struct device *dev, char *format, ...);
will output the formatted warning, along with a full stack trace. This will allow the warnings to be collected at kerneloops.org and incorporated into the reports there.
- The new %pR formatting directive allows printk() and
friends to output the contents of resource structures.
- There is a new function intended to make life easier for PCI driver
writers:
static inline void *pci_ioremap_bar(struct pci_dev *pdev, int bar);
This function will remap the entire PCI I/O memory region, as selected by the bar argument.
- There is a new core_param() macro:
core_param(name, var, type, perm);
Its purpose is to define "core" parameters and let them be represented in /sys/module/kernel/parameters.
- It is now possible to create a workqueue running at realtime priority
with:
struct workqueue_struct *create_rt_workqueue(const char *name);
- The block driver API has changed considerably, with the inode
and file parameters being removed from most block device
operations. The new API looks like this:
struct block_device_operations { int (*open) (struct block_device *bdev, fmode_t mode); int (*release) (struct gendisk *gd, fmode_t mode); int (*locked_ioctl) (struct block_device *bdev, fmode_t mode, unsigned cmd, unsigned long arg); int (*ioctl) (struct block_device *bdev, fmode_t mode, unsigned cmd, unsigned long arg); int (*compat_ioctl) (struct block_device *bdev, fmode_t mode, unsigned cmd, unsigned long arg); int (*direct_access) (struct block_device *bdev, sector_t sector, void **kaddr, unsigned long *pfn); int (*media_changed) (struct gendisk *gd); int (*revalidate_disk) (struct gendisk *gd); int (*getgeo)(struct block_device *bdev, struct hd_geometry *geo); struct module *owner; };
The new prototypes do away with the file and inode structure pointers which were passed in previous kernels. Note that the ioctl() method is now called without the big kernel lock; code needing BKL protection must explicitly define a locked_ioctl() function instead.
- The range timer API has been merged; callers can now specify a time period in which they would like the timeout to be delivered. The kernel can then take advantage of the range to coalesce wakeups and keep the processor idle for longer periods.
2.6.27 (October 9, 2008)
- The register_security() function has been removed. Security
modules which wish to implement stacking must now do so explicitly.
- The request_queue_t type is gone at last; block drivers
should use struct request_queue instead.
- Quite a bit of big kernel
lock removal work has been merged. For
char devices, the open() method from struct
file_operations is no longer protected by the BKL. Calls to
fasync() have also lost BKL protection.
- Many drivers have been converted to use the firmware loader, making it
possible to strip the firmware from the kernel for those who are
inclined to do so. See this
article for more information on the firmware work.
- The API work in the i2c layer continues; there is now an autodetection
capability which allows new-style drivers to detect devices on their
buses automatically.
- The SCSI layer has gained new support for "device handlers," which are
mostly concerned with multipath management. Some of this code has
been moved over from the device mapper.
- The new suspend and
hibernate infrastructure has been merged, providing a wider set of
callbacks for power management events. The PCI and platform bus
interfaces have been enhanced with support for this new
infrastructure.
- The TTY layer continues to evolve; significant changes include the
introduction of a new tty_port structure meant to hold
information common to all TTY ports and a rework of the line
discipline code.
- The mac80211 code has a new module which can simulate any number of
IEEE 802.11 radios; it is suitable for testing mac80211 functionality
and associated user-space tools.
- There is a new "rfkill" mechanism for unified handling of "radio off"
switches on wireless devices.
- A number of Video4Linux2 format-related callbacks have been renamed to
make them match the names used with the associated buffer types.
In addition, the vidioc_enum_fmt_vbi_cap() callback has been
deprecated and marked for removal in 2.6.28.
- The videobuf layer now has support for controllers which cannot do
scatter/gather I/O.
- The USB "gadget" framework has been massively reworked to provide
better support for composite devices.
- The prototype for device_create() has changed:
struct device *device_create(struct class *class, struct device *parent, dev_t devt, void *drvdata, const char *fmt, ...);
Those who see a resemblance to device_create_drvdata() are right; all in-tree users were converted over to that interface, the old device_create() was removed, and device_create_drvdata() was renamed. For now, a macro makes calls to device_create_drvdata() do the right thing, but that macro will probably go away before the 2.6.27 final release.
- User-space UIO drivers can now write a signed value to the
/dev/uioX device to enable and disable interrupts.
- Debugfs (finally) has a function for removing an entire directory
tree:
void debugfs_remove_recursive(struct dentry *dentry);
As a result, code creating hierarchies in debugfs no longer need remember the dentry of every file they create.
- The tracehook mechanism for defining static trace points (described in
this article) has been
merged, along with a number of trace points in the core kernel.
- A new, lockless form of get_user_pages() has been added:
int get_user_pages_fast(unsigned long start, int nr_pages, int write, struct page **pages);
Details of this interface can be found in this article, with the one note that early versions were called fast_gup() instead. (See also the related lockless page cache work, which was also merged).
- The long-debated mmu-notifiers patch has
been merged. The notifiers
allow external memory management units (as may be seen in some
graphics cards or in virtualized guests) to be told about decisions
made by the core memory management code.
- There is a new framework for debugging boot-time memory
initialization; there's also "a few basic defensive measures" intended
to prevent difficult-to-debug boot problems.
- The new function:
int object_is_on_stack(void *obj);
returns a true value if the pointed-to object is on the current kernel stack.
- There is a new macro for issuing warnings:
WARN(condition, format, ...);
It's much like WARN_ON() in that it will produce a full oops listing; the difference is the added printk()-style format string and arguments.
- A new helper function:
int flush_work(struct work_struct *work);
waits for the specific workqueue job work to finish executing.
- dma_mapping_error() and pci_dma_mapping_error() have
new prototypes:
int dma_mapping_error(struct device *dev, dma_addr_t dma_addr); int pci_dma_mapping_error(struct pci_dev *hwdev, dma_addr_t dma_addr);
In each case, they have gained a new argument specifying which device the mapping is being done for.
- There are a couple of new radix tree functions:
unsigned int radix_tree_gang_lookup_slot(struct radix_tree_root *root, void ***results, unsigned long first_index, unsigned int max_items); unsigned int radix_tree_gang_lookup_tag_slot(struct radix_tree_root *root, void ***results, unsigned long first_index, unsigned int max_items, unsigned int tag);
They are useful for looking up multiple items in a single call.
- Slab cache constructors no longer have a pointer to the cache itself
as an argument; they now take a single void * pointer to
the object itself.
- The long list of Video4Linux2 ioctl() callbacks has been moved into its own structure (struct v4l2_ioctl_ops) which is pointed to by the ioctl_ops member of struct video_device.
2.6.26 (July 13, 2008)
- At long last, support for the KGDB interactive debugger has been
added to the x86 architecture. There is a DocBook document in the
Documentation directory which provides an overview on how to use this
new facility.
- Page attribute table (PAT) support is also (again, at long last)
available for the x86 architecture. PATs allow for fine-grained
control of memory caching behavior with more flexibility than the
older MTRR feature. See Documentation/x86/pat.txt for more
information.
- ioremap() on the x86 architecture will now always return an
uncached mapping. Previously, it had taken a more relaxed approach,
leaving the caching as the BIOS had set it up. The practical result
was to almost always create uncached mappings, but with
occasional exceptions. Drivers which depend on a cached mapping will
now break; they will need to use ioremap_cache() instead.
- The nopage() virtual memory area operation has been removed;
all in-tree code is now using fault() instead.
- Two new functions (inode_getsecid() and
ipc_getsecid()), added to support security modules and the
audit code, provide general access to security IDs associated with
inodes and IPC objects. A number of superblock-related LSM callbacks
now take a struct path pointer instead of struct
nameidata. There is also a new set of hooks providing
generic audit support in the security module framework.
- The now-unused ieee80211 software MAC layer has been removed; all of
the drivers which needed it have been converted to mac80211. Also
removed are the sk98lin network driver (in favor of skge) and bcm43xx
(replaced by b43 and b43legacy).
- The generic semaphores
patch has been merged. The semaphore code also has new
down_killable() and down_timeout() functions.
- The ata_port_operations structure used by libata drivers now
supports a simple sort of operation inheritance, making it easier to
write drivers which are "almost like" existing code, but with small
differences.
- A new function (ns_to_ktime()) converts a time value in
nanoseconds to ktime_t.
- The final users of struct class_device have been converted to
use struct device instead. The class_device type
has been removed.
- The seq_file code now accepts a return value of SEQ_SKIP from
the show() callback; that value causes any accumulated output
from that call to be discarded.
- The Video4Linux2 API now defines a set of controls for camera devices;
they allow user space to work with parameters like exposure type, tilt
and pan, focus, and more.
- On the x86 architecture, there is a new configuration parameter which
allows gcc to make its own decisions about the inlining of functions,
even when functions are declared inline. In some cases, this
option can reduce the size of the kernel's text segment by over 2%.
- The legacy IDE layer has gone through a lot of internal changes which
will break any remaining IDE drivers.
- The SLUB allocator supports a new sysfs file
(/sys/kernel/slab/name/order) which allows system
administrators to change the size of page allocations used by the
named slab.
- A condition which triggers a warning from WARN_ON will now
also taint the kernel.
- The get_info() interface for /proc files has been
removed. There is also a new function for creating /proc
files:
struct proc_dir_entry *proc_create_data(const char *name, mode_t mode, struct proc_dir_entry *parent, const struct file_operations *proc_fops, void *data);
This version adds the data pointer, ensuring that it will be set in the resulting proc_dir_entry structure before user space can try to access it.
- The object debugging
infrastructure has been merged.
- The klist type now has the usual-form macros for declaration and
initialization: DEFINE_KLIST() and KLIST_INIT().
Two new functions (klist_add_after() and
klist_add_before()) can be used to add entries to a klist in
a specific position.
- kmap_atomic_to_page() is no longer exported to modules.
- There are some new generic functions for performing 64-bit integer
division in the kernel:
u64 div_u64(u64 dividend, u32 divisor); u64 div_u64_rem(u64 dividend, u32 divisor, u32 *remainder); s64 div_s64(s64 dividend, s32 divisor) s64 div_s64_rem(s64 dividend, s32 divisor, s32 *remainder);
Unlike do_div(), these functions are explicit about whether signed or unsigned math is being done. The x86-specific div_long_long_rem() has been removed in favor of these new functions. - There is a new string function:
bool sysfs_streq(const char *s1, const char *s2);
It compares the two strings while ignoring an optional trailing newline.
- The prototype for i2c probe() methods has changed:
int (*probe)(struct i2c_client *client, const struct i2c_device_id *id);
The new id argument supports i2c device name aliasing.
- There is a new configuration (MODULE_FORCE_LOAD) which controls whether the loading of modules can be forced if the kernel thinks something is not right; it defaults to "no."
Posted May 13, 2010 9:00 UTC (Thu)
by constantine (guest, #53664)
[Link]
Posted Oct 11, 2010 15:36 UTC (Mon)
by Wowbagger (guest, #69958)
[Link]
However, there is a great big PowerPC shaped hole in the book. There are a great deal of areas in doing device drivers for the PPC that, as far as I can tell, are not documented ANYWHERE (including the source, which as a paucity of comments, making it less than useful as a reference guide).
I cannot help write the section (because if I understood it well enough to write it, I wouldn't need the section!), but I can tell you some areas that I think could use a good write-up:
0) General embedded PPC type issues - working with some of the Freescale embedded devices, for example.
I also think some more general topics need to be covered better:
Posted Feb 25, 2012 10:21 UTC (Sat)
by przemoc (guest, #67594)
[Link]
Posted Jun 1, 2013 3:54 UTC (Sat)
by duxing2007 (guest, #91235)
[Link]
Interactive map of Linux kernel functions with links: http://www.makelinux.net/kernel_map
API changes in the 2.6 kernel series
LDD needs a section on PPC, and a couple of embedded type issues.
1) OpenFirmware and the OF device tree - what it is, who sets it up, how to extend it for your own devices.
2) MPIC interrupts, and how to get from the MPIC hardware to request_irq().
3) Memory space management: how to map devices that occupy address space into RAM when they aren't bound to some PCI device.
1) Memory space management: how to map devices that occupy address space into RAM when they aren't bound to some PCI device. (yes, this is a repeat of #3 above, but this is more generic).
2) Allocating large blocks of memory aligned by size (e.g. how to allocate a 64K block that is aligned on 64K) - this comes up with a number of DMA devices and address space translators (like PCI interfaces).
3) Dealing with PCI devices that can change their configuration at run time (e.g. FPGAs that, upon being programmed, export new BARS). A set of "best practices" for dealing with such devices ("should they generate hot plug events? if so, how?") would be invaluable to people creating such hardware.
API changes in the 2.6 kernel series
API changes in the 2.6 kernel series