LWN Weekly Edition Front pageSecurity Kernel development Distributions Development Linux in the news Announcements ->One big page
This page Previous weekFollowing week |
Kernel developmentRelease status Kernel release status The current 2.6 prepatch is 2.6.24-rc7, released by Linus on January 6. It contains a fair number of fixes and an implementation of /proc/slabinfo for the SLUB allocator (which was discussed in last week's Kernel Page). About the long release cycle, he says "I'll be charitable and claim it's because it's all stabilizing, and not because we've all been in a drunken stupor over the holidays." The short-form changelog can be found in the release announcement; see the long-format changelog for all the details.The mainline git repository contains, as of this writing, a few dozen post-rc7 patches. The current stable 2.6 kernel is 2.6.23.13, released on January 9. This update is only of interest to people using the w83627ehf hardware monitoring driver, but they should be very interested: "I have had a private report that this bug might have caused permanent hardware damage. There is no definitive proof at this point, but unfortunately due to the lack of documentation I really can't rule it out." For older kernels: 2.6.16.58-rc1 was released on January 6 with about a dozen fixes, a few of which are security-related.
Kernel development news Quotes of the week
What guarantees that it doesn't happen before we get to callback? AFAICS,
nothing whatsoever...
-- Al Viro shows how to debug kernel
problems
And if it does happen, we'll get rdev happily freed (by rdev_free(), as ->release() of &rdev->kobj) by the time we get to delayed_delete(). Which explains what's going on just fine.
I consider the fact that I
can spend full-time working on Linux to be a blessing. But if you
don't feel that way, my condolences, and please do what you need to do
so you can stay in your happy place.
-- Ted Ts'o shows how to respond with class
to trolls
2.6.24 - some statistics As of this writing, the 2.6.24 kernel is getting close to a release - though there is likely to be one more -rc version to look at first. The rate of change has slowed significantly, though, and the final regressions are being chased down. So it seems like a suitable time to look at the patches which went into this kernel and where they came from.This is, in many ways, a record-breaking development cycle. Over 10,000 individual changesets have been merged this time around, with a net growth of almost 300,000 lines of code. 950 developers contributed this code; of those, 358 contributed just one patch. By comparison, the previous cycle (2.6.23) merged some 6200 patches from about 860 developers. Given that, it's not surprising that the 2.6.24 cycle has been a little longer than some of its predecessors. Without further ado, here is the list of top contributors to this kernel:
By either method of counting, Thomas Gleixner comes out at the top of the list by virtue of his work on the i386/x86_64 architecture merger. Bringing those architectures together and making the result work well was a huge job; this effort will continue into future development cycles. (For the curious, simply renamed files were not counted as "changed lines" in the generation of these numbers). Note that many of these patches also carry a signoff by Ingo Molnar, but git only stores the name of a single "author" for a changeset. Other contributors of large numbers of changesets in 2.6.24 include Bartlomiej Zolnierkiewicz (lots of IDE driver patches), Adrian Bunk (cleanups all over the kernel tree), Ralf Baechle (MIPS architecture work), Pavel Emelyanov (mostly network and PID namespaces), Tejun Heo (serial ATA and a number of sysfs cleanups), Johannes Berg (wireless networking), and Al Viro (mostly annotation patches and related fixes). If one looks at the number of changed lines, the list of developers changes almost completely: Zhu Yi (iwlwifi driver), Auke Kok (e1000 driver), Michael Buesch (wireless networking and the b43 driver), Ivo van Doorn (rt2x00 wireless driver), Matthew Wilcox (SCSI, especially advansys and sym53c8xx drivers), Adrian Bunk (cleanups and code deletions), Larry Finger (mainly addition of the b43 legacy driver), and David Miller (networking and SPARC64). If one assigns developers' contributions to employers and totals the results, the following numbers emerge (note that these tables have been updated since initial publication to fix an error):
In many ways, these lists look similar to those posted for past kernels. But there are a few things which jump out this time around:
All told, some 130 distinct employers were identified for the contributors to 2.6.24. That is a lot of companies to be working on one body of code. Looking at the Signed-off-by headers of patches is always interesting; if one removes the signoffs added by the authors themselves, what is left is a list of the gatekeepers - those who channel the code into the mainline. The people who signed off on the most patches which they did not write are:
There are not a lot of changes here from previous development cycles. While quite a few developers add signoffs to code and pass it on, they work for a relatively small number of companies - 7 employers account for 70% of the non-author signoffs. Finally, given that we are starting a new year, it is worth taking a quick look back at the entirety of 2007. In 2007, Linus merged just over 30,000 changesets (more than 80 per day, every day) from 1900 developers working for (at least) 200 companies. All told, they changed over 2 million lines of code, growing the kernel by more than 750,000 lines. The kernel developers are, in other words, touching over 5,000 lines of code every day - that is a high rate of change. The top contributors over the course of the year (by changesets) were:
It should be noted that the employer numbers are more approximate than usual. Some developers changed employers in 2007, but LWN, as a matter of policy, does not maintain a database of developers and their employers over time. Still, the picture is relatively constant - the same companies continue to contribute approximately the same percentage of the patches going into the kernel over relatively long periods of time. Overall, the picture that results from all these numbers is one of a widespread and healthy development community. There appears to be no shortage of jobs for kernel developers, but also room for those who work outside of the office. The kernel truly is a common resource, with literally thousands of people working to improve it. And it shows no signs of slowing down anytime soon. Your editor would like to profusely thank Greg Kroah-Hartman for his help in improving these statistics.
The Linux trace toolkit's next generation Instrumenting a running kernel for debugging or profiling is on the wish list of many administrators and developers. Advocates of OpenSolaris like to point to DTrace as a feature that Linux lacks, though SystemTap has started to close that gap. The Linux Trace Toolkit next generation (LTTng) takes a different approach and was recently submitted for inclusion in the kernel (in two patches: arch independent and arch dependent). LTTng relies upon kernel markers to provide static probe points for its kernel tracing activities. It also provides the ability to trace userspace programs and combine that data with kernel tracing data to give a detailed view of the internals of the system. Unlike other tools, LTTng takes a post-processing approach, storing the data away as efficiently as possible for later analysis. This is in contrast to SystemTap and DTrace which have their own mini-languages that specify what to do as each trace point is reached. One of the major design goals of LTTng is to have as little impact on the system as possible, not only when it is actually tracing events, but also when it is disabled. Kernel hackers are quite resistant to debugging solutions that add any significant performance penalty when not in use. In addition, any significant delays while enabled may change the system timing such that the bug or condition being studied does not occur. For this reason, LTTng does not take the path that various dynamic tracing solutions have used and avoids the expense of a breakpoint interrupt by using the static markers. Another major design goal is to provide monotonically increasing timestamp values for events. The original LTT uses timestamps derived from the kernel Network Time Protocol (NTP) time, which can fluctuate somewhat as adjustments are made – sometimes going backward. LTTng uses a timestamp derived from the hardware clocks that will work on various processor architectures and clock speeds. In addition, the timestamps can be correlated between different processors in a multi-processor system. As LTTng gathers its data, it uses relayfs to get the data to a userspace daemon (lttd) that writes the data to disk. The daemon is started from the lttctl command-line tool, which controls the tracing settings in the kernel via a netlink socket. A user wishing to investigate tracing could use lttctl to start and stop a trace; once the trace is complete, the data could be viewed and analyzed. The LTT viewer (LTTV) is the program that is used to analyze the data gathered. It provides both GUI and text-based viewers to interpret the binary data generated by LTTng and present it to the user. Multi-gigabyte files of tracing data are not uncommon when using LTTng, so a tool like LTTV is indispensable for visualization and filtering to allow the user to focus on the events of interest. LTTV has a plugin mechanism that allows users to develop their own display and analysis tools, while using the LTTV framework and filtering capabilities. An advantage of using static probe points – though some may see it as a disadvantage – is that they can be maintained with the kernel code they are targeting. If the kernel markers patch is merged, subsystems can add probe points at places they find interesting or useful and those markers will be carried along in the kernel source; updated as the kernel changes. Other solutions rely on matching an external list of probes with the version of the running kernel, which can result in mismatches and incorrect traces. Also, SystemTap will be able to use any markers that get added to the kernel as is, so users who want the abilities that it provides will also benefit. LTTng is being developed at the École Polytechnique de Montréal with support from quite a few Linux companies. It has the looks of a very well thought out framework that builds upon the tracing work that has been done before. It certainly won't make it into 2.6.24, but it would seem to have a good chance of making it into a future mainline kernel.
RCU part 3: the RCU API [Editor's note: this is the third and final installment in Paul McKenney's "What is RCU?" series. The first and second parts remain available for those who might have missed them. Many thanks to Paul for letting LWN run these articles.]
IntroductionRead-copy update (RCU) is a synchronization mechanism that was added to the Linux kernel in October of 2002. RCU is most frequently described as a replacement for reader-writer locking, but has also been used in a number of other ways. RCU is notable in that RCU readers do not directly synchronize with RCU updaters, which makes RCU read paths extremely fast, and also permits RCU readers to accomplish useful work even when running concurrently with RCU updaters. This leads to the question "what exactly is RCU?", a question that this document addresses from the viewpoint of the Linux kernel's RCU API.
These sections are followed by a references section and the answers to the Quick Quizzes. RCU has a Family of Wait-to-Finish APIsThe most straightforward answer to "what is RCU" is that RCU is an API used in the Linux kernel, as summarized by the pair of tables in this section (the first table shows the wait-for-RCU-readers portions of the API, while the second table shows the publish/subscribe portions of the API). Or, more precisely, RCU is a family of APIs as shown in the first table, with each column corresponding to a member of the RCU API family. If you are new to RCU, you might consider focusing on just one of the columns in the following table. For example, if you are primarily interested in understanding how RCU is used in the Linux kernel, "RCU Classic" would be the place to start, as it is used most frequently. On the other hand, if you want to understand RCU for its own sake, "SRCU" has the simplest API. You can always come back for the other columns later. If you are already familiar with RCU, the following pair of tables can serve as a useful reference.
Quick Quiz 1: Why are some of the cells in the above table colored green? The "RCU Classic" column corresponds to the original RCU implementation,
in which RCU read-side critical sections are delimited by
In the "RCU BH" column, Quick Quiz 2:
What happens if you mix and match?
For example, suppose you use In the "RCU Sched" column, anything that disables preemption
acts as an RCU read-side critical section, and Quick Quiz 3: What happens if you mix and match RCU Classic and RCU Sched? The "Realtime RCU" column has the same API as does RCU Classic, the only difference being that RCU read-side critical sections may be preempted and may block while acquiring spinlocks. The design of Realtime RCU is described in the LWN article The design of preemptible read-copy-update. Quick Quiz 4: What happens if you mix and match Realtime RCU and RCU Classic? The "SRCU" column displays a specialized RCU API that permits
general sleeping in RCU read-side critical sections, as was
described in the LWN article
Sleepable RCU.
Of course,
use of The "QRCU" column presents an RCU implementation with the same
API structure as SRCU, but optimized for extremely low-latency
grace periods in absence of readers, as described in the LWN article
Using Promela and Spin to verify parallel algorithms.
As with SRCU, use of Quick Quiz 5:
Why do both SRCU and QRCU lack asynchronous Quick Quiz 6:
Under what conditions can The Linux kernel currently has a surprising number of RCU APIs and implementations. There is some hope of reducing this number, evidenced by the fact that a given build of the Linux kernel currently has at most three implementations behind four APIs (given that RCU Classic and Realtime RCU share the same API). However, careful inspection and analysis will be required, just as would be required for one of the many locking APIs. RCU has Publish-Subscribe and Version-Maintenance APIsFortunately, the RCU publish-subscribe and version-maintenance primitives shown in the following table apply to all of the variants of RCU discussed above. This commonality can in some cases allow more code to be shared, which certainly reduces the API proliferation that would otherwise occur.
The first pair of categories operate on Linux
Quick Quiz 7:
Why doesn't The second pair of categories operate on Linux's
The final pair of categories operate directly on pointers, and
are useful for creating RCU-protected non-list data structures,
such as RCU-protected arrays and trees.
The Quick Quiz 8:
Normally, any pointer subject to Quick Quiz 9: Are there any downsides to the fact that these traversal and update primitives can be used with any of the RCU API family members? So, What is RCU Really?At its core, RCU is nothing more nor less than an API that supports publication and subscription for insertions, waiting for all RCU readers to complete, and maintenance of multiple versions. That said, it is possible to build higher-level constructs on top of RCU, including the reader-writer-locking, reference-counting, and existence-guarantee constructs listed in the companion article. Furthermore, I have no doubt that the Linux community will continue to find interesting new uses for RCU, just as they do for any of a number of synchronization primitives throughout the kernel. Finally, a complete view of RCU would also include all of the things you can do with these APIs. AcknowledgementsWe are all indebted to Andy Whitcroft, Jon Walpole, and Gautham Shenoy, whose review of an early draft of this document greatly improved it. I owe thanks to the members of the Relativistic Programming project and to members of PNW TEC for many valuable discussions. I am grateful to Dan Frye for his support of this effort. This work represents the view of the author and does not necessarily represent the view of IBM. Linux is a registered trademark of Linus Torvalds. Other company, product, and service names may be trademarks or service marks of others. ReferencesThis section gives a short annotated bibliography describing using RCU, Linux-kernel RCU implementations, background, and historical perspectives. For more information, see Paul E. McKenney's RCU Page. Using RCU
Linux-Kernel RCU Implementations
Background
Historical Perspectives on RCU and Related Mechanisms
Answers to Quick QuizzesQuick Quiz 1: Why are some of the cells in the above table colored green? Answer: The green API members ( Quick Quiz 2:
What happens if you mix and match?
For example, suppose you use Answer: If there happened to be no RCU read-side critical
sections delimited by This vulnerability disappears in -rt kernels, where RCU Classic and RCU BH both map onto a common implementation. Quick Quiz 3: What happens if you mix and match RCU Classic and RCU Sched? Answer: In a non-PREEMPT or a PREEMPT kernel, mixing these
two works "by accident" because in those kernel builds, RCU Classic and RCU
Sched map to the same implementation.
However, this mixture is fatal in PREEMPT_RT builds using the -rt
patchset, due to the fact that Realtime RCU's read-side critical
sections can be preempted, which would permit
In fact, the split between RCU Classic and RCU Sched was inspired by the need for preemptible RCU read-side critical sections. Quick Quiz 4: What happens if you mix and match Realtime RCU and RCU Classic? Answer: That would be up to you, because you would have to code up changes to the kernel to make such mixing possible. Currently, any kernel running with RCU Classic cannot access Realtime RCU and vice versa. Quick Quiz 5:
Why do both SRCU and QRCU lack asynchronous Answer: Given an asynchronous interface, a single task
could register an arbitrarily large number of SRCU or QRCU callbacks,
thereby consuming an arbitrarily large quantity of memory.
In contrast, given the current synchronous
Quick Quiz 6:
Under what conditions can Answer: In principle, you can use
idx = srcu_read_lock(&ssa); synchronize_srcu(&ssb); srcu_read_unlock(&ssa, idx); /* . . . */ idx = srcu_read_lock(&ssb); synchronize_srcu(&ssa); srcu_read_unlock(&ssb, idx); Quick Quiz 7:
Why doesn't Answer: Poisoning the Quick Quiz 8:
Normally, any pointer subject to Answer: One such exception is when a multi-element linked
data structure is initialized as a unit while inaccessible to other
CPUs, and then a single However, unless this initialization code is on an impressively hot
code-path, it is probably wise to use Quick Quiz 9: Are there any downsides to the fact that these traversal and update primitives can be used with any of the RCU API family members? Answer: It can sometimes be difficult for automated code checkers such as "sparse" (or indeed for human beings) to work out which type of RCU read-side critical section a given RCU traversal primitive corresponds to. For example, consider the following: rcu_read_lock(); preempt_disable(); p = rcu_dereference(global_pointer); /* . . . */ preempt_enable(); rcu_read_unlock(); Is the
Patches and updates Kernel trees
Development tools
Device drivers
Documentation
Filesystems and block I/O
Kernel building
Memory management
Architecture-specific
Security-related
Virtualization and containers
Benchmarks and bugs
Miscellaneous
Page editor: Jonathan Corbet |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright © 2008, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds
Powered by Rackspace Managed Hosting.