API changes in the 2.6 kernel series
This article will be updated to keep track of the internal changes for each 2.6 kernel release. Its permanent location is:
http://lwn.net/Articles/2.6-kernel-api/
This page will, doubtless, remain incomplete for a while. If you see an omission, please let us know by sending a note to kernel@lwn.net rather than by posting comments here. The chances of a prompt update are higher, the article will not become cluttered with redundant comments, and we'll be more than happy to credit you here.
If you are a Linux Device Drivers, Third Edition reader looking for information on changes since the book was published: LDD3 covers version 2.6.10 of the kernel, so only the changes starting with 2.6.11 are relevant.
Last update: April 29, 2008
2.6.25 (April 16, 2008)
- There have been a great many changes to the low-level device model
APIs dealing with kobjects and ksets. These changes have, in turn,
forced a large number of adjustments throughout the tree. See
Documentation/kobject.txt for an
overview of the new API.
- There is a new set of security module functions for dealing with
filesystem mount and unmount operations.
- The chained scatterlist API has been augmented with the sg_table patches.
- There have been some changes to the block request completion API. See
this article for a
description of the new way of doing things.
- A large number of SUNRPC symbols (rpc_* and
rpcauth_*) have been changed to GPL-only exports.
- The "flatmem" and "discontigmem" memory models have been removed on
the 64-bit x86 architecture; "sparsemem" is now used for all builds.
- The fastcall function attribute didn't do anything on the x86
architecture, so it has been removed.
- x86 has a new set of functions for easily manipulating page
attributes. They are:
set_memory_uc(unsigned long addr, int numpages); /* Uncached */ set_memory_wb(unsigned long addr, int numpages); /* Cached */ set_memory_x(unsigned long addr, int numpages); /* Executable */ set_memory_nx(unsigned long addr, int numpages); /* Non-executable */ set_memory_ro(unsigned long addr, int numpages); /* Read-only */ set_memory_rw(unsigned long addr, int numpages); /* Read-write */There is also a set of set_pages_* functions which take a struct page pointer rather than a beginning address.
- Early-boot debugging of x86 systems via the FireWire port is now
supported.
- Bidirectional command support has been added to the SCSI layer.
- There is a new process state called TASK_KILLABLE. It is a
blocked state similar to TASK_UNINTERRUPTIBLE, with the
difference that a wakeup will happen upon delivery of a fatal signal.
The idea is to allow (almost) uninterruptible sleeps, but to still
allow the process to be killed outright - thus ending the problem of
unkillable processes stuck in the "D" state. There is a new set of
functions for using this state: wait_event_killable(),
schedule_timeout_killable(), mutex_lock_killable(),
etc.
- add_disk_randomness() has been unexported as there are no
more in-tree users.
- pci_enable_device_bars() has been replaced by two
more-specific functions: pci_enable_device_io() and
pci_enable_device_mem().
- The high-resolution timer API has been augmented with:
unsigned long hrtimer_forward_now(struct hrtimer *timer, ktime_t interval);It will move the given timer's expiration forward past the current time as determined by the associated clock.
- The device structure now holds a pointer to a
device_dma_parameters structure:
struct device_dma_parameters { unsigned int max_segment_size; unsigned long segment_boundary_mask; };These parameters are used by the DMA mapping layer (and the IOMMU mapping code in particular) to ensure that I/O operations are set up within the device's constraints. The PCI layer supports this feature with two new functions:
int pci_set_dma_max_seg_size(struct pci_dev *dev, unsigned int size); int pci_set_dma_seg_boundary(struct pci_dev *dev, unsigned long mask);Drivers for devices with unusually strict DMA limitations should probably use these functions to ensure that those restrictions are respected.
- Many nopage() methods have been replaced by the newer
fault() API; the near-term plan is to remove
nopage() altogether. See this article for a
description of the new way of "page not present" handling.
- A generic resource counter mechanism was merged as part of the memory
controller patch set; see <linux/res_counter.h> for the
details.
- reserve_bootmem() has a new flags parameter. Most
callers will set it to BOOTMEM_DEFAULT; the kdump code,
though, uses BOOTMEM_EXCLUSIVE to ensure that it is the only
one to touch the memory.
- Most architectures now have support for cmpxchg64() and
cmpxchg_local().
- There is a new set of string functions:
extern int strict_strtoul(const char *string, unsigned int base, unsigned long *result); extern int strict_strtol(const char *string, unsigned int base, long *result); extern int strict_strtoull(const char *string, unsigned int base, unsigned long long *result); extern int strict_strtoll(const char *string, unsigned int base, long long *result);These functions convert the given strings to various forms of long values, but they will return an error status if the given string value, as a whole, does not represent a proper integer value. These functions are now used in the parsing of kernel parameters.
2.6.24 (January 24, 2008)
- The i386/x86_64
architecture merger has during this kernel series. The result is a single
architecture, called "x86," which can be built for 32-bit and 64-bit
processors.
- The Video4Linux layer has some new internal support for composite
devices involving more than one driver (many V4L2 devices involve, at
a minimum, separate drivers for the controller and the sensor).
- Also in Video4Linux: the video-buf layer has been replaced with a more
generic implementation which works with a wider range of devices
(including USB devices and those which do not support scatter/gather
DMA).
- The NAPI interface used in network drivers has been reworked to better
support devices with multiple transmit queues.
- The networking layer has a new function for printing MAC addresses:
char *print_mac(char *buf, const u8 *addr);The buf buffer should be declared with DECLARE_MAC_BUF(); the output is suitable for formatting in printk() with "%s".
- The NETIF_F_LLTX (lockless transmit) flag for network devices
has been deprecated and should not be used in new code.
- The functions ktime_sub_us() and ktime_sub_ns() have
been added; they subtract the given number of microseconds or
nanoseconds from a ktime_t value.
- The hard_header() method has been removed from struct
net_device; it has been replaced by a per-protocol
header_ops structure pointer.
- The debugfs filesystem has some new functions
(debugfs_create_x8(), debugfs_create_x16(),
debugfs_create_x32()) which make it easy to export files
containing hexadecimal numbers.
- Various small sysfs-related API changes have been made. The
name field has been removed from the kobject
structure. The prototypes of the user-event callbacks have been
changed. Many of the subsystem-related calls have been removed -
subsystems never really did much of anything anyway;
get_bus() and put_bus() are also gone.
- A new value DMA_MASK_NONE can be stored in the
device structure dma_mask field to indicate that the
device is incapable of performing DMA.
- The VFS has a couple of new address space operations
(write_begin() and write_end()) aimed at fixing some
deadlock scenarios; see this article
for more information.
- The scatterlist chaining
patches have been merged and many parts of the kernel have been
updated to use this feature.
- The CFLAGS= and CPPFLAGS= options now work with the
kernel build system in the expected way: they add flags to be passed
to the C compiler and preprocessor, respectively.
- The prototype for slab constructor callbacks has changed to:
void (*ctor)(struct kmem_cache *cache, void *object);The unused flags argument has been removed and the order of the other two arguments has been reversed to match other slab functions.
- The DECLARE_MUTEX_LOCKED() macro has been removed.
- The long-deprecated SA_* interrupt flags have been removed in
favor of the IRQF_* equivalents.
- A number of block layer utilities have seen prototype changes. The
most evident change, perhaps, is bio_endio() and the
associated bio_end_io_t callback:
void bio_endio(struct bio *bio, int error); typedef void (bio_end_io_t) (struct bio *, int);These functions now always completes the entire BIO, so the size argument has been removed.
- The paravirt_ops structure has been split into several smaller, more
specialized operations vectors. These include pv_init_ops
(boot-time operations), pv_time_ops (for time-related
operations), pv_cpu_ops (privileged instructions),
pv_irq_ops (interrupt handling), pv_mmu_ops (page
table management), and a few others.
- There are some new bit operations which have been added:
int test_and_set_bit_lock(unsigned long nr, unsigned long *addr); void clear_bit_unlock(unsigned long nr, unsigned long *addr); void __clear_bit_unlock(unsigned long nr, unsigned long *addr);These operations are intended to be used in the creation of single-bit locks; they work without the need for any additional memory barriers.
- There is a new KERN_CONT priority level for
printk(). It is, in fact, empty; it is meant to serve as a
marker for printk() calls which continue a previous (not
terminated with a newline) printed line.
- The filesystem export operations, used to make filesystems available
over protocols like NFS, have been reworked. Two new methods
(fh_to_dentry() and fh_to_parent()) replace the old
get_dentry() interface. There is a new structure (struct
fid) used to describe file handles. This work is aimed at making
the export interface easier to use and (eventually) supporting 64-bit
inode numbers.
- The virtio patches - providing an infrastructure for I/O into and out of virtualized guests - have been merged.
2.6.23 (October 9, 2007)
- The UIO interface for
the creation of user-space drivers has been merged. While UIO is
aimed at user space, there is a kernel-space component for driver
registration and interrupt handling.
- unregister_chrdev() now returns void.
- There is a new notifier chain which can be used (by calling
register_pm_notifier()) to obtain notification before and
after suspend and hibernate operations.
- The new "lockstat" infrastructure provides statistics on the amount of
time threads spend waiting for and holding locks.
- The new fault() VMA operation replaces nopage() and
populate(). See this article
for a description of the current fault() API.
- The generic netlink API now has the ability to register (and
unregister) multicast groups on the fly.
- The destructor argument has been removed from
kmem_cache_create(), as destructors are no longer supported.
All in-kernel callers have been updated.
- There is a new clone() flag - CLONE_NEWUSER - which
creates a new user namespace for the process; it is intended for use
with container systems.
- There is a new rtnetlink API for managing software network devices.
- The networking core can now work with devices which have more than one
transmit queue. This is a feature which was needed to properly
support some wireless devices.
- The sysfs core has been significantly rewritten to weaken the
connection between sysfs entries and internal kobjects. The new code
should make life easier for driver writers who will have fewer object
lifecycle issues to worry about.
- The never-used enable_wake() PCI driver method has been
removed.
- Drivers wanting to get the revision ID from the PCI config space
should now just use the value found in the new revision
member of the pci_dev structure. All in-tree drivers have
been changed to use this new approach.
- The SCSI layer has picked up a couple of scatter/gather accessor
functions - scsi_dma_map() and scsi_dma_unmap() - in
preparation for chained scatter/gather lists and bidirectional
requests. Most drivers in the kernel have been updated to use these
functions.
- The idr code has a
couple of new helper functions:
idr_for_each() and idr_remove_all().
- sys_ioctl() is no longer exported to modules.
- The page table helper functions ptep_establish(),
ptep_test_and_clear_dirty()
and ptep_clear_flush_dirty() have been removed - they had no
in-kernel users.
- Kernel threads are non-freezable by default; any kernel thread which
should be frozen for a suspend-to-disk operation must now call
set_freezable() to arrange for that to happen.
- The SLUB allocator is now the default.
- The new function is_owner_or_cap(inode) tests for access
permission based on the current fsuid and capabilities; it replaces
the open-coded test previously found in several filesystems.
- There is a new utility function:
char *kstrndup(const char *s, size_t max, gfp_t gfp);This function duplicates a string along the lines of the user-space strndup().
2.6.22 (July 8, 2007)
- The mac80211 (formerly "Devicescape") wireless stack has been merged,
creating a whole new API for the creation of wireless drivers,
especially those requiring software MAC support.
- The eth_type_trans() function now sets the
skb->dev field, consistent with how similar functions for
other link types operate. As a result, many Ethernet drivers have
been changed to remove the (now) redundant assignment.
- The header fields in the sk_buff structure have been renamed
and are no longer unions. Networking code and drivers can now just
use skb->transport_header,
skb->network_header, and skb->mac_header.
There are new functions for finding specific headers within packets:
tcp_hdr(), udp_hdr(), ipip_hdr(), and
ipipv6_hdr().
- Also in the networking area: the packet scheduler has been reworked to
use ktime values rather than jiffies.
- The i2c layer has seen significant new changes meant to make i2c
drivers look more like drivers for other buses. There are, for
example, new probe() and remove() methods for
notifying devices when i2c peripherals come and go. Since i2c is not
a self-describing bus, the support code still needs help to know where
i2c devices might be; for many classes of device, this information can
be had from the system BIOS.
- The crypto API has a new set of functions for use with asynchronous
block ciphers. There is also a new cryptd kernel thread
which can run any synchronous cipher in an asynchronous mode.
- The subsystem structure has been removed from the Linux
device model; there never really was any need for it. Most code which
was expecting a struct subsystem argument has been changed to
use the relevant kset instead.
- There is a new version of the in-kernel rpcbind (portmapper) client
which supports versions 2-4 of the rpcbind protocol. The portmapper
API has changed as a result.
- Numerous changes to the paravirt_ops methods have been made.
Additionally, paravirt_ops is no longer a GPL-only export.
- There is a new memory function:
void *krealloc(const void *p, size_t new_size, gfp_t flags);As one would expect, it changes the size of the allocated memory, moving it if need be.
- The SLUB allocator has
been merged as an experimental (for now) alternative to the slab
code. The SLUB API generally matches slab, but the handling of
zero-length allocations has
changed somewhat.
- A new macro has been added to make the creation of slab caches easier:
struct kmem_cache KMEM_CACHE(struct-type, flags);The result is the creation of a cache holding objects of the given struct_type, named after that type, and with the additional slab flags (if any). - The SLAB_DEBUG_INITIAL flag has been removed, along with the
associated SLAB_CTOR_VERIFY flag passed to constructors. The
result is a set of changes which ripples through quite a few source
files. The unused SLAB_CTOR_ATOMIC flag is also gone.
- The SuperH architecture has working kgdb support again.
- The ia64 architecture has a new tool which will inject machine check
errors into a running system. Not recommended for production
machines.
- The deferrable timers
patch has been merged. There is also a new macro for initializing
workqueue entries (INIT_DELAYED_WORK_DEFERRABLE()) which
causes the job to be queued in a deferrable manner.
- The old SA_* interrupt flags have not been removed as
originally scheduled, but their use will now generate warnings at
compile time.
- There is a new list_first_entry() macro which, surprisingly,
gets the first entry from a list.
- The atomic64_t and local_t types are now fully
supported on a wider set of architectures.
- Workqueues have been reworked again. There is a new
function:
void cancel_work_sync(struct work_struct *work);This function tries to cancel a single workqueue entry, be it on the shared (keventd) or a private workqueue. Meanwhile run_scheduled_work() has been removed.
2.6.21 (April 25, 2007)
- Sysfs now supports the concept of "shadow directories" - multiple
versions of a directory with the same name. This feature is to be
used with container applications, allowing each namespace to have
resources (network interfaces, for example) with the same name. To
that end, two new functions have been added:
int sysfs_make_shadowed_dir(struct kobject *kobj, void *(*follow_link)(struct dentry *, struct nameidata *)); struct dentry *sysfs_create_shadow_dir(struct kobject *kobj);sysfs_make_shadowed_dir() takes the existing directory for a kobject and makes it shadowed - capable of having multiple instantiations. The follow_link() method must be able to pick out the right version for any given situation. A call to sysfs_create_shadow_dir() will create a new instantiation for a directory which has been made shadowed.
Note that this feature is likely to change somewhat in 2.6.22.
- Quite a few kobject functions - kobject_init(),
kobject_del(), kobject_unregister(),
kset_register(), kset_unregister(),
subsystem_register(), subsystem_unregister(), and
subsys_create_file() - now return harmlessly if passed a
NULL pointer.
- Many kernel subsystems which once used class_device
structures have been changed to use struct device instead;
this work is toward a long-term goal of getting rid of the class tree
and having a single device tree in sysfs.
- There is a new function:
int device_schedule_callback(struct device *dev, void (*func)(struct device *))This function will arrange for func() to be called at some future time in process context. It's meant to enable device attributes to unregister themselves, but one can imagine other applications as well.
- The ALSA system on chip ("ASoC") layer provides extensive support for
the implementation of sound drivers on embedded systems; see the
documentation files packaged with the kernel for details.
- Significant changes have been made to the crypto support interface.
- The device resource
management patches, making a lot of driver code
easier to write, have been merged.
- The DMA memory zone (ZONE_DMA) is now optional and may not be
present in all kernels.
- The local_t type has been made consistent across
architectures and has gained some documentation.
- The nopfn() address space operation can now return
NOPFN_REFAULT to indicate that the faulting instruction
should be re-executed.
- A new function, vm_insert_pfn(), enables the insertion of a
new page into a process's address space by page-frame number.
- A new driver API for general-purpose I/O signals has been added.
- The sysctl code has been heavily reworked, leading to a number of
internal API changes.
- The clockevents and dynamic tick patches have been merged. Most code will not require changes, but kernel developers should be aware of code which depends on jiffies.
2.6.20 (February 4, 2007)
- The workqueue API has seen a
major rework which requires changes in almost any code using
workqueues. In short: there are now two different types of
workqueues, depending on whether the delay feature is to be used or
not. The work function no longer gets an arbitrary data pointer; its
argument, instead, is a pointer to the work_struct structure
describing the job. If you have code which is broken by these
changes, this set of
instructions by David Howells is likely to be helpful.
- Some additional workqueue changes have been merged as well. There is
a new "freezable" workqueue type, indicating a workqueue which can be
safely frozen during the software suspend process. The new function
create_freezeable_workqueue() will create one. Another new
function, run_scheduled_work(), will cause a
previously-scheduled workqueue entry to be run synchronously. Note
that run_scheduled_work() cannot be used with delayed
workqueues.
- Much of the sysfs-related code has been changed to use struct
device in place of struct class_device. The latter
structure will eventually go away as the class and device mechanisms
are merged.
- There is a new function:
int device_move(struct device *dev, struct device *new_parent);This function will reparent the given device to new_parent, making the requisite sysfs changes and generating a special KOBJ_MOVE event for user space.
- A number of kernel header files which included other headers no longer
do so. For example, <linux/fs.h> no longer includes
<linux/sched.h>. These changes should speed kernel
build times by getting rid of large number of unneeded includes, but
might break some out-of-tree modules which do not explicitly include
all the headers they need.
- The internal __alloc_skb() function has a new parameter,
being the number of the NUMA node on which the structure should be
allocated.
- The slab allocator API has been cleaned up somewhat. The old
kmem_cache_t typedef is gone;
struct kmem_cache should be used instead. The various
slab flags (SLAB_ATOMIC, SLAB_KERNEL, ...) were all
just aliases for the equivalent GFP_ flags, so they have been
removed.
- A new boot-time parameter (prof=sleep) causes the kernel to
profile the amount of time spent in uninterruptible sleeps.
- dma_cache_sync() has a new argument: the device
structure for the device doing DMA.
- The paravirt_ops code
has gone in, making it easier for the kernel to support multiple
hypervisors. Anybody wanting to port a hypervisor to this code should
note that it is somewhat volatile and likely to remain that way for
some time.
- The struct path
changes have been merged, with changes rippling through the
filesystem and device driver subsystems. In short, code accessing the
dentry pointer from a struct file pointer, which used to read
file->f_dentry, should now read
file->f_path.dentry. There are defines making the older
style of code work - for now.
- There is now a generic layer for human input devices; the USB HID code
has been switched over to this new layer.
- A new function, round_jiffies(), rounds a jiffies value up to
the next full second (plus a per-CPU offset). Its purpose is to
encourage timeouts to occur together, with the result that the CPU
wakes up less frequently.
- The block "activity function," a callback intended for the implementation of disk activity lights in software, has been removed; nobody was actually using it.
2.6.19 (November 29, 2006)
- The prototype for interrupt handler functions has changed. In short, the
regs argument has been removed, since almost nobody used it.
Any interrupt handler which needs the pre-interrupt register state can
use get_irq_regs() to obtain it.
- The latency tracking
infrastructure patch has been merged.
- The readv() and writev() methods in the
file_operations structure have been removed in favor of
aio_readv() and aio_writev() (whose prototypes have been
changed). See this
article for more information.
- The no_pfn()
address space operation has been added.
- SRCU - a version of read-copy-update which allows read-side blocking -
has been merged. See this
article by Paul McKenney for lots of details.
- The CHECKSUM_HW value has long been used in the networking
subsystem to support hardware checksumming. That value has been
replaced with CHECKSUM_PARTIAL (intended for outgoing packets
where the job must be completed by the hardware) and
CHECKSUM_COMPLETE (for incoming packets which have been
completely checksummed by the hardware).
- A number of memory management changes have been merged, including tracking of dirty
pages in shared memory mappings, making the DMA32 and HIGHMEM zones
optional, and an architecture-independent mechanism for tracking
memory ranges (and the holes between them).
- The pud_page() and pgd_page() macros now return a
struct page pointer, rather than a kernel virtual address.
Code needing the latter should use pud_page_vaddr() or
pgd_page_vaddr() instead.
- A number of driver core
changes including experimental parallel device probing and some improvements to
the suspend/resume process.
- There is now a notifier chain for out-of-memory situations; the idea
here is to set up functions which might be able to free some memory
when things get very tight.
- The semantics of the kmap() API have been changed a bit: on
architectures with complicated memory coherence issues,
kmap() and kunmap() are expected to manage coherency
for the mapped pages, thus eliminating the need to explicitly flush
pages from cache.
- PCI Express Advanced Error Reporting is now supported in the PCI
layer.
- A number of changes have been made to the inode structure in
an effort to make it smaller.
- Much improved suspend and resume support for the USB layer.
- A new set of functions has been added to allow USB drivers to quickly
check the direction and transfer mode of an endpoint.
- A somewhat reduced version of Wireless Extensions version 21. Most of
the original functionality has been removed with the idea that the
wireless extensions will soon be superseded by something else.
- Vast numbers of annotations enabling the sparse utility to
detect big/little endian errors.
- The flags field of struct request has been split
into two new fields: cmd_type and cmd_flags. The
former contains a value describing the type of request (filesystem
request, sense, power management, etc.) while the latter has the flags
which modify the way the command works (read/write, barriers, etc.).
- The block layer can be disabled entirely at kernel configuration
time; this option can be useful in some embedded situations.
- The kernel now has a generic boolean type, called bool; it
replaces a number of homebrewed boolean types found in various parts
of the kernel.
- There is a new function for allocating a copy of a block of memory:
void *kmemdup(const void *src, size_t len, gfp_t gfp);A number of allocate-then-copy code sequences have been updated to use kmemdup() instead.
2.6.18 (September 19, 2006)
- The Video4Linux 2 API has changed: the huge ioctl() method
has been moved into the V4L2 code. Video drivers provide a very long
list of methods specific to the individual ioctl() commands.
See <media/v4l2-dev.h>.
- The generic IRQ layer
has been merged. The SA_* flags to request_irq()
have been renamed; the new prefix is IRQF_. A long series of patches
has converted in-tree drivers over to the new names; The old names
are scheduled for removal in January, 2007.
- 64-bit resources are now
supported. This change affects a number of users of the resource
management API.
- The kernel lock
validator has gone in, along with a number of fixes for potential
deadlocks found by the validator.
- At long last, the devfs subsystem has been removed.
- An API and support for
the Intel I/OAT DMA engine.
- The skb_linearize() function has been reworked, and no longer
has a GFP flags argument. There is also a new
skb_linearize_cow() function which ensures that the resulting
SKB is writable.
- Network drivers should no longer manipulate the xmit_lock
spinlock in the net_device structure; instead, the following
new functions should be used:
int netif_tx_lock(struct net_device *dev); int netif_tx_lock_bh(struct net_device *dev); void netif_tx_unlock(struct net_device *dev); void netif_tx_unlock_bh(struct net_device *dev); int netif_tx_trylock(struct net_device *dev); - The long-deprecated inter_module API has finally been removed
altogether.
- A new kernel API providing access to the "inotify" functionality has
been added.
- The old scsi_request infrastructure has been removed, since
there are no longer any in-tree drivers which use it.
- The include file <linux/usb_input.h> is now
<linux/usb/input.h>.
- The VFS get_sb() filesystem method has a new prototype:
int (*get_sb)(struct file_system_type fstype, int flags, const char *dev_name, void *data, struct vfsmount *mnt);The mnt parameter is new; it allows the filesystem to receive a pointer to the target mount point structure. The mount point should be associated with the superblock in the get_sb() method with a call to:
int simple_set_mnt(struct vfsmount *mnt, struct super_block *sb);The return value of get_sb() has also been changed to an int error status. The various get_sb_*() convenience functions have had the same changes applied. The purpose of all this work is to allow NFS to share superblocks across mount points.
- The statfs() superblock operation has a new prototype:
int (*statfs)(struct dentry *dentry, struct kstatfs *stats);The old struct super_block pointer is now a dentry pointer instead.
- Some functions have been added to make it easy for kernel code to
allocate a buffer with vmalloc() and map it into user space.
They are:
void *vmalloc_user(unsigned long size); void *vmalloc_32_user(unsigned long size); int remap_vmalloc_range(struct vm_area_struct *vma, void *addr, unsigned long pgoff);The first two functions are a form of vmalloc() which obtain memory intended to be mapped into user space; among other things, they zero the entire range to avoid leaking data. vmalloc_32_user() allocates low memory only. A call to remap_vmalloc_range() will complete the job; it will refuse, however, to remap memory which has not been allocated with one of the two functions above.
- The read-copy-update API is now accessible only to GPL-licensed
modules. The deprecated function synchronize_kernel() has
also been removed.
- There is a new strstrip() library function which removes
leading and trailing white space from a string.
- A new WARN_ON_ONCE macro will test a condition and complain
if that condition evaluates true - but only once per boot.
- A number of crypto API changes have been merged, the biggest being a
change to most algorithm-specific functions to take a pointer to the
crypto_tfm structure, rather than the old "context" pointer.
This change was necessary to support parameterized algorithms.
- There is a new make target "headers_install". Its purpose is to install a set of kernel headers useful for libraries and user-space tools. A limited set of headers is installed, and those headers are sanitized on their way to the destination directory. It is hoped that distributors will use this mechanism to set up kernel headers for inclusion from user space in the future.
2.6.17 (June 17, 2006)
- Support for the SPARC "Niagara" architecture.
- EXPORT_SYMBOL_GPL_FUTURE()
has been merged.
- The safe notifier patch has been
merged, creating a new API for all notifier users.
- The SLAB_NO_REAP slab cache option, which ostensibly caused
the slab not to be cleaned up when the system is under memory
pressure, has been removed. The kmem_cache_t typedef is also
being phased out in favor of struct kmem_cache.
- The "softmac" 802.11 subsystem has been merged. This code may
eventually be phased out, however, in favor of the Devicescape code.
- There is a new real-time clock subsystem, providing generalized RTC
support and a well-defined driver interface.
- A new utility function has been added:
int execute_in_process_context(void (*fn)(void *data), void *data, struct execute_work *work);This function will arrange for fn() to be called in process context (where it can sleep). Depending on when execute_in_process_context() is called, fn() could be invoked immediately or delayed by way of a work queue.
- The SMP alternatives
patch has been merged.
- A rework of the relayfs API - but the sysfs interface has been left
out for now.
- There is a new tracing mechanism for developers debugging block
subsystem code.
- There is a new internal flag (FMODE_EXEC) used to indicate
that a file has been opened for execution.
- The obsolete MODULE_PARM() macro is gone forevermore.
- A new function, flush_anon_page(), can be used in conjunction
with get_user_pages() to safely perform DMA to anonymous
pages in user space.
- Zero-filled memory can now be allocated from slab caches with
kmem_cache_zalloc(). There is also a new slab debugging
option to produce a /proc/slab_allocators file with detailed
allocation information.
- There are four new ways of creating mempools:
mempool_t *mempool_create_page_pool(int min_nr, int order); mempool_t *mempool_create_kmalloc_pool(int min_nr, size_t size); mempool_t *mempool_create_kzalloc_pool(int min_nr, size_t size); mempool_t *mempool_create_slab_pool(int min_nr, struct kmem_cache *cache);The first creates a pool which allocates whole pages (the number of which is determined by order), while the second and third create a pool backed by kmalloc() and kzalloc(), respectively. The fourth is a shorthand form of creating slab-backed pools.
- The prototype for hrtimer_forward() has changed:
unsigned long hrtimer_forward(struct hrtimer *timer, ktime_t now, ktime_t interval);The new now argument is expected to be the current time. This change allows some calls to be optimized. The data field has also been removed from the hrtimer structure.
- A whole set of generic bit operations (find first set, count set bits,
etc.) has been added, helping to unify this code across architectures
and subsystems.
- The inode f_ops pointer - which refers to the
file_operations structure for the open file - has been marked
const. Quite a bit of code, which used to change that
structure, has been changed to compensate. Similar changes have been
made in many filesystems. "
The goal is both to increase correctness (harder to accidentally write to shared datastructures) and reducing the false sharing of cachelines with things that get dirty in .data (while .rodata is nicely read only and thus cache clean).
" - local_t is now a signed type.
- Attributes in sysfs can be
pollable.
- A class_device can now have attribute groups created at
registration time; to take advantage of this capability, store the
desired gropus in the new groups field.
- The splice(), vmsplice(), and tee() system calls have been merged. Supporting those calls requires implementing two new file_operations methods. See this article for the final form of the splice_read() and splice_write() functions.
2.6.16 (March 19, 2006)
- The mutex code has been
merged. The use of semaphores for mutual exclusion is now deprecated,
and the current semaphore API may go away altogether.
- The high-resolution kernel
timer code has been merged. The new API allows for greater
precision in timer values, though the underlying implementation is
still limited by the timer interrupt resolution.
- A new list function, list_for_each_entry_safe_reverse(), does
just what one would expect.
- A 64-bit atomic type, atomic_long_t, has been added.
Supported functions are:
- long atomic_long_read(atomic_long_t *l);
- void atomic_long_set(atomic_long_t *l, long i);
- void atomic_long_inc(atomic_long_t *l);
- void atomic_long_dec(atomic_long_t *l);
- void atomic_long_add(long i, atomic_long_t *l);
- void atomic_long_sub(long i, atomic_long_t *l);
- The "SLOB" memory allocator has been merged. SLOB is a drop-in
replacement for the slab allocator, intended for very low-memory
systems.
- The dentry structure has been changed: the d_child
and d_rcu fields are now overlaid in a union. This change
shrinks this heavily-used structure and improves its cache behavior.
- The usb_driver structure has a new field
(no_dynamic_id) which lets a driver disable the addition of
dynamic device IDs. The owner field has also been removed
from this structure.
- The device probe() and remove() methods have been moved
from struct device_driver to struct bus_type. The
bus-level methods will override any remaining driver methods.
- Some significant changes to the SCSI subsystem aimed at eliminating
the use of the old scsi_request structure. The SCSI software
IRQ is no longer used; postprocessing happens via the generic block
software IRQ instead.
- Much of the core device model code has been reeducated to use the term
"uevent" instead of "hotplug." Some changes which are visible outside
of the core code include:
- kobject_hotplug() becomes kobject_uevent()
- struct kset_hotplug_ops becomes struct kset_uevent_ops, and its hotplug() member is now uevent()
- add_hotplug_env_var() becomes add_uevent_var()
- The block I/O barrier code has been rewritten. This
patch changes the barrier API and also adds a new parameter to
end_that_request_last().
- The block_device_operations structure has a new method
getgeo(); its job is to fill in an hd_geometry
structure with information about the drive. With this operation in
place, many block drivers will not need an ioctl() function
at all.
- Linas Vepstas's PCI error
recovery patch has been merged.
- Compilers prior to gcc 3.2 can no longer be used to build
kernels.
- When the kernel is configured to be optimized for size, gcc (if it's version 4.x) is given the freedom to decide whether inline functions should really be inlined. The __always_inline attribute now truly forces inlining in all cases. This is an outcome from the discussion on inline functions held at the beginning of the year.
2.6.15 (January 2, 2006)
- The nested class device
patch was merged, allowing class_device structures to
have other class_devices as parents. This patch is a hack to
make the input subsystem work with sysfs. This code will change again
in the future; see Greg
Kroah-Hartman's article for more information on what is planned.
- The prototypes for the driver model class "interface" methods
add() and remove() have changed; there is now a new
parameter pointing to the relevant interface structure.
- A new platform_driver structure has been added to describe
drivers for devices built into the core "platform."
- The prototypes for the suspend() and resume()
methods in struct device_driver have changed. They are also
only called once per event, rather than three times as in previous
kernels.
- Two new fields have been added to the device_pm_info which
control how drivers should act on hardware-created wakeup events; see
this article for
details.
- There is a notification mechanism which lets interested modules know
when a USB device is added to (or removed from) the system. This
system is used by some core code; drivers do not normally need to hook
in to it.
- The gfp_t type
is now used throughout the kernel. If you have a function which takes
memory allocation flags, it should probably be using this type.
- Code using reader/writer semaphores can now use
rwsem_is_locked() to test the (read) state of the semaphore
without blocking.
- The new vmalloc_node() function allocates memory on a
specific NUMA node.
- The "reserved" bit for memory pages has, for all practical purposes,
been removed.
- vm_insert_page()
has been added to make it easier for drivers to remap RAM into user
space VMAs.
- There is a new kthread_stop_sem() function which can be used
to stop a kernel thread which might be currently blocked on a specific
semaphore.
- RapidIO bus support has
been merged into the mainline.
- The netlink connector
mechanism makes netlink code easier to write. Independently, a
type-safe netlink interface has been added and is used in parts of the
networking subsystem.
- These kernel symbols have been unexported and are no longer available
to modules: clear_page_dirty_for_io,
console_unblank, cpu_core_id
hugetlb_total_pages, idle_cpu,
nr_swap_pages, phys_proc_id,
reprogram_timer, swapper_space,
sysctl_overcommit_memory, sysctl_overcommit_ratio,
sysctl_max_map_count, total_swap_pages,
user_get_super, uts_sem, vm_acct_memory,
and vm_committed_space.
- Version 1 of the Video4Linux API is now officially scheduled for
removal in July, 2006.
- The owner field has been removed from the pci_driver
structure.
- A number of SCSI subsystem typedefs (Scsi_Device,
Scsi_Pointer, and Scsi_Host_Template) have been
removed.
- The DMA32 memory zone has been added to the x86-64
architecture; its purpose is to make it easy to allocate memory below
the 4GB barrier (with the new GFP_DMA32 flag).
- A call to rcu_barrier() will block the calling process until all current RCU callbacks have completed.
2.6.14 (October 27, 2005)
- A new PHY abstraction layer has been added for network drivers.
- The sk_buff structure has changed again; the changes will
force a recompile but shouldn't otherwise be a problem.
- Version 19 of the
wireless extensions has been merged. Among other things, this version
deprecates the get_wireless_stats() method in the
net_device structure.
- The klist API has
changed. The order of the parameters has been reversed for
klist_add_head() and klist_add_tail(). It is now
necessary to provide a pair of reference counting functions when
setting up a list with klist_init().
- The relayfs virtual filesystem, which enables high-rate data transfers
between the kernel and user space, has been merged.
- kzalloc() has
been added as a way of obtaining pre-zeroed memory.
- Two new versions of
schedule_timeout() have been added.
- The new TASK_INTERACTIVE state flag tells the scheduler not
to perform the usual accounting on sleeping processes.
- SKB's which are expected to be cloned can be efficiently allocated
with alloc_skb_fclone().
- A few new helper functions for mapping block I/O requests have been
added; see this article
for details.
- Securityfs, a virtual filesystem intended for use with security modules, has been merged.
2.6.13 (August 28, 2005)
- The HZ constant is now configurable at kernel build time.
- The timer API now includes try_to_del_timer_sync(), which
makes a best effort to delete the timer; it is safe to call in atomic
context.
- The block_device_operations structure now has an
unlocked_ioctl() member.
- The return value from netif_rx() has changed; it now will
return one of only two values: NETIF_RX_SUCCESS or
NETIF_RX_DROP.
- pci_dma_burst_advice can be used by PCI drivers to learn the
optimal way of bursting DMA transfers.
- The text searching API has been
added.
- A new memory allocation function, kzalloc(), has been added.
- A big set of driver core changes, including the removal of the class_simple interface and new prototypes for the device structure sysfs methods. [FR]
2.6.12 (June 17, 2005)
- cancel_rearming_delayed_work()
was added to the workqueue API.
- The timeout value passed to usb_bulk_msg() and
usb_control_msg() is now expressed in milliseconds instead of
jiffies.
- An interrupt-disabling spinlock is used in the rwsem implementation.
It was never correct to call one of the variants of
down_read() or down_write() with interrupts
disabled, but it is even less correct now.
- The fields in the net_device structure have been rearranged,
which will break binary-only drivers.
- kref_put() now returns an int value: nonzero if the
kref was actually released.
- kobject_add() and kobject_del() no longer generate
hotplug events. If you need these events, you must call
kobject_hotplug() explicitly. The wrapper functions
kobject_register() and kobject_unregister() do still
generate hotplug events.
- kobj_map() no longer takes a subsystem argument; instead, it
needs a pointer to a semaphore which it can use for mutual exclusion.
- A new function, sysfs_chmod_file(), allows permissions to be
changed on existing sysfs attributes.
- There is a new generic
sort() function which should be used in preference to
creating yet another implementation.
- A new attribute (__nocast) is being used with sparse
to disable a number of implicit casts and find probable bugs.
- io_remap_page_range() is now deprecated; use
io_remap_pfn_range() instead.
- A set of functions has
been added to work with big-endian I/O memory.
- synchronize_kernel() is deprecated. Callers
should instead use either synchronize_sched() (to verify that
all processors have quiesced) or synchronize_rcu() (to verify
that all processors have exited RCU critical sections).
- The flag argument to blk_queue_ordered() has changed
to indicate how ordered writes are handled by the device. Possible
values are QUEUE_ORDERED_NONE (ordering is not possible),
QUEUE_ORDERED_TAG (ordering is forced with request tags), and
QUEUE_ORDERED_FLUSH (ordering is done with explicit flush
commands). For the last case, the request queue has two new methods,
prepare_flush_fn() and end_flush_fn(), which are
called before and after a barrier request.
- A new function, valid_signal(), can (and should) be used to
test whether signal numbers from user space are valid.
- The Developers Certificate of Origin, the document acknowledged by all those "Signed-off-by:" headers, has changed. The new version adds a clause noting that contributions - and the information that goes with them - are public information which can be redistributed.
2.6.11 (March 2, 2005)
- The kernel now performs access
checking for read() and
write() calls before invoking the driver- or
filesystem-specific file_operations method.
- The bcopy() function, unused in the mainline kernel, has been
removed.
- The prototype of the suspend() method in struct
pci_driver has changed; the
state parameter is now of type pm_message_t.
- The rwlock_is_locked() macro has been removed; instead, use
either read_can_lock() or write_can_lock(). There
is also a new spin_can_lock() for regular spinlocks.
- Three new ways of waiting for completions have been added:
wait_for_completion_interruptible(),
wait_for_completion_timeout(), and
wait_for_completion_interruptible_timeout().
- For USB drivers: the usb_device_descriptor and
usb_config_descriptor structures now keep all fields in the
wire (little-endian) form. [GKH]
- pci_set_power_state() and pci_enable_wake()
have new prototypes: power states are represented with the
pci_power_t type rather than an int. [GKH]
- The Big Kernel Semaphore
patch was merged. As a result, code which is protected by
lock_kernel() is now preemptible. This change should not
affect most code developed in this century, but there are always
exceptions.
- The file_operations structure now contains an unlocked_ioctl() member. If that member
is non-NULL, it will be called in preference to the regular
ioctl() method - and the big kernel lock will not be held.
New code should use unlocked_ioctl() and the programmer
should ensure that the proper locking has been performed.
There is also a new compat_ioctl() method which is called, if present, when a 32-bit process calls ioctl() on a 64-bit system.
- Run-time initialization of spinlocks is being converted away from the
assignment form (using SPIN_LOCK_UNLOCKED) to explicit
spin_lock_init() calls. No noises have yet been made about
removing SPIN_LOCK_INIT, but the writing should be considered
to be on the wall. If and when the real-time preemption patches are merged,
the assignment form may no longer be possible.
- debugfs has been merged;
it is a virtual filesystem intended for use by kernel hackers who want
to export debugging information from their code.
- Binary attributes in sysfs can now offer mmap() support; see
this patch for the details.
- Four-level page tables
have been merged. This change affects surprisingly little code, but,
if you are manually walking through the page table tree, you will have
to take the new level into account.
- Socket buffers can be obtained from alloc_skb_from_cache(),
which uses a slab cache.
- A new memory allocation flag (__GFP_ZERO) was added; it
allows kernel code to request that the allocated memory be zeroed. It
is part of the larger prezeroing patch
which has not, yet, been merged.
- Linus has reimplemented
pipes with a circular buffer construct which
will, eventually, be mutated into a more generic form.
- Work is being done toward the goal of removing the semaphore from struct subsystem. If your code depends on this semaphore, which it shouldn't, expect to have to change it soon.
2.6.10 (December 24, 2004)
- Calling pci_enable_device() is required to get interrupt
routing to work. [GKH]
- A new function, pci_dev_present(), can be used to determine
whether a specific device is present or not. [GKH]
- The prototypes to pci_save_state() and
pci_restore_state() have changed: the buffer
argument is no longer needed (the space has been allocated in
struct pci_dev instead). [GKH]
- The kernel build system was tweaked; the preferred name for kernel
makefiles is now Kbuild. The change is meant to highlight
the fact that kernel makefiles are rather different than the
user-space variety, but very few, if any makefiles have been renamed.
- add_timer_on(), sys_lseek(), and a number of other
kernel functions are no longer exported to modules. Most of the
driver core functions have been changed to GPL-only exports.
- I/O space
write barriers are now supported.
- The prototype of kunmap_atomic() has changed. This change should not
affect properly-written code, but should generate warnings when a
struct page pointer is (erroneously) passed to that
function.
- atomic_inc_return() was added as a way to increment the value
of an atomic_t variable and get the new value.
- The little-used "BIO walking" helper functions
(process_that_request_first()) have been removed.
- The venerable remap_page_range() function has been changed to
remap_pfn_range(); the new function uses a page frame number
for the physical address, rather than the actual address.
remap_page_range() is still supported - for now.
- wake_up_all_sync(), unused in the mainline tree, was
removed.
- A simple, stream-oriented circular buffer
implementation was added.
- The kernel event
mechanism was merged, making it possible to notify user space of
relevant kernel events.
- vfs_permission() was replaced by generic_permission(), which has an optional callback for ACL checking. [MS]
2.6.9 (October 18, 2004)
- Kprobes
was merged, making another debugging technique available.
- Spinlocks are implemented completely out of line now. This change
should not affect any code.
- wait_event_timeout() was added.
- Kobjects now use the kref type to handle reference counting.
Most code should be unaffected by this change.
- A new set of functions for accessing I/O
memory was introduced. The new functions are cleaner and
type-safe, and should be used in preference to readb() and
friends. The new ioport_map() function makes it possible to
treat I/O ports as if they were I/O memory.
- The NETIF_F_LLTX feature for
net_devices tells the
networking subsystem that the driver code performs its own locking and
does not require that the xmit_lock be taking before
hard_start_xmit() can be called.
- dma_declare_coherent_memory() was added to allow the DMA
functions to hand out memory located on a specific device.
- msleep_interruptible() was added.
- The prototype of kref_put() changed; a pointer to the release() function is now required.
2.6.8 (August 13, 2004)
- The fcntl() method in the file_operations structure,
just added in 2.6.6, was removed. It has been
replaced by two new methods: check_flags() and
dir_notify().
- nonseekable_open() was
added as a way of indicating that a given file is not seekable.
- wait_event_interruptible_exclusive() was added.
- dma_get_required_mask()
was added as a way for drivers to determine the optimal DMA mask.
- Module section information was added under
/sys/module, making it easier use symbolic debuggers with
modules.
- The VFS follow_link() method saw some (compatible) changes. Filesystems should use the new symlink lookup method so that the kernel can, eventually, support a greater link depth. [MS]
(We are still in the process of filling in the earlier API changes - stay tuned).
Acknowledgements
Thanks to the following people who have helped keep this page current:
| [FR] | Farzad Raiyat |
| [GKH] | Greg Kroah-Hartman |
| Michael Hayes | |
| [MS] | Miklos Szeredi |
