Brief items
The current 2.6 development kernel is 2.6.25-rc2,
released by Linus on
February 15. We are in the stabilization period now, so, as one would
expect, most of the changes are fixes. This prepatch also contains some
tweaks to the realtime group scheduling interface and the addition of a
multiple-probes capability to kernel markers. Says Linus: "
I'm
optimistic that this release cycle won't be anywhere near the pain of what
24 was, which is why I'm just going to go off for the long weekend and stay
at the beach." See
the
long-format changelog for the details.
As of this writing, just over 300 patches have gone into the mainline
repository since 2.6.25-rc2. They are mostly fixes, but there's also some
new watchdog drivers, a SMACK security module enhancement, and some fairly
large Video4Linux driver updates.
The current -mm tree is 2.6.25-rc2-mm1. Recent changes
to -mm include ext4 online defragmentation support and read-only bind mount
support.
For older kernels: 2.4.36.1 was released on
February 16; it contains a number of low-priority security fixes.
Comments (none posted)
Kernel development news
I claim that there's just not a single valid case of doing
wide-scale changes atomically and departing from the current
to-be-stabilized kernel tree materially. _Every_ large-scale API
change can be done in a staged way, with each subsystem adopting to
the change at their own pace, it just has to be planned well and
tested well enough and has to be executed persistently.
--
Ingo Molnar
Comments (1 posted)

It is not often that one sees kernel developers in suits, so it is
impossible to resist putting up the following picture. Andrew Morton is attending
(along with your editor) the 2008 Linux Developer Symposium in Beijing,
China. This event, co-sponsored by the Linux Foundation and the Chinese
Open Source Promotion Union, is aimed at bringing Chinese developers more
deeply into the global Linux community. Look for a report in LWN next
week.
Full Story (comments: 15)
By Jake Edge
February 20, 2008
The kernel source level debugger, kgdb, has been around for a long time, but
never in the mainline tree. Linus Torvalds is not much of a fan of
debuggers in general and has always resisted the inclusion of kgdb. That
looks like it might be changing somewhat, with the inclusion of kgdb in
2.6.26 now a distinct possibility.
Over the years, Torvalds has made various pronouncements about debuggers,
particularly kernel debuggers, a long message to
linux-kernel in 2000 seems to outline his objections:
I happen to believe that not having a kernel debugger forces people to
think about their problem on a different level than with a debugger. I
think that without a debugger, you don't get into that mindset where you
know how it behaves, and then you fix it from there. Without a debugger,
you tend to think about problems another way. You want to understand
things on a different _level_.
An attempt to sneak kgdb into the mainline via x86 architecture updates
failed, but Torvalds did open the
door a crack towards merging the kgdb changes: "I won't even
consider pulling it unless it's offered as a separate tree, not mixed up
with other things. At that point I can give a look." That has
spawned the kgdb-light effort, spearheaded by Ingo Molnar.
The original hope to get it
included into 2.6.25 has been dashed, but with Molnar rapidly iterating
to address kernel hacker concerns, the amount of complaints seems to be
decreasing. Molnar is up to version 10 of the
kgdb-light patchset in something like three days since the first was
posted. The various linux-kernel threads show a number of very
hopeful developers waiting with bated breath to see if kgdb can finally
make its way into the mainline.
The light version of kgdb still has most of the capabilities of the
original kgdb and any additional, possibly more intrusive, features can be
added later. Molnar is clearly trying to do things the right way, with a
merge of the non-intrusive kgdb functionality that can eventually be used by multiple
architectures. He points out that there are already gdb remote stubs in
three separate architectures in the mainline, continuing:
So we could have done it the same way, by doing cp kernel/kgdb.c
arch/x86/kernel/gdb-stub.c and merging that. Nobody could have said a
_single_ word - we already have lowlevel UART code in early_printk.c
that we could have reused.
But we wanted to do it _right_ and not add an arch/x86/kernel/gdb-stub.c
special hack.
Discussions about the patches have been mostly to point out problems or
areas that need cleaning up. The philosophical objections have been mostly
avoided, quite possibly because Molnar has been scrupulously trying to make
a "no impact" set of patches:
this kgdb series has _obviously_ zero impact on the kernel,
because it just does not touch any dangerous codepath. From this point
on KGDB can evolve in small, well-controlled baby steps, as all other
kernel code as well.
To that end, the patch changes 22 files (rather than the 41 touched by the
original kgdb submission), removing "_all_ critical path
impact" and the low-level serial drivers—as Molnar points out,
kgdb should not be in the driver business. In addition, the "kgdb over
polled consoles" support has been reworked and cleaned up. Various hacks
to get at module symbols have been removed as a better solution for
that problem is needed. So far, no show stopping problems have been
identified, so it really seems to come down to what Torvalds thinks; for that,
we may have to wait until the 2.6.26 merge window opens in April or May.
Comments (none posted)
February 15, 2008
This article was contributed by B. Rathmann (KoalaBR)
[
Editor's note: the following is the first in a two-part article on the
status of the Nouveau project. This installment is an introductory piece
describing the problem; the second part (to appear in one week) looks at
how Nouveau development is being done and its current status.]
Nouveau is an effort to
create a complete open source driver for NVidia
graphics cards for X.org. It aims to support 2D and 3D acceleration from
the early NV04 cards up to the latest G80 Cards and work across all
supported architectures like x86-64, PPC and x86.
The project originated when Stéphane Marchesin set out to de-obfuscate parts
of the NVidia-maintained nv driver. However, NVidia had corporate policies
in place about the nv driver, and had no plans to change them at the
time. So they refused Stéphane's patches.
This left Stéphane with the greatest open source choice:
"fork it"! At FOSDEM in February 2006, Stéphane unveiled his plans for an
open source driver for NVidia hardware called Nouveau. The name was
suggested by his IRC client's French autoreplace feature which suggested
the word "nouveau" when he typed "nv". People liked it, so the name
stuck. The FOSDEM presentation got the project enough publicity to engage
the curiosity of other developers.
Ben Skeggs was one of the first developers to sign up. He had worked on reverse
engineering the R300 (one of ATI graphics chips) shader components and
writing parts of the R300 driver; as a result, he had great experience with graphics
drivers. He initially showed interest in the NV40 shaders only, but he got
caught in the event horizon and has worked on every aspect of the driver
for NV40 and later cards.
The project engaged other developers with short and long term interest. It
also generated a large amount of interest due to a pledge drive that an
independent user started.
However, the project was mainly developed on IRC and it was quite difficult
for newcomers to get any insight into previous development; reading
IRC logs is unpractical at best. With this in mind, KoalaBR decided to
start summarizing development in a series of articles known as the TiNDC
(The irregular Nouveau Development Companion). This series of articles
proved very useful for attracting developers and testers to the
project. TiNDC issues are published every two to four weeks; as of this
writing, the current issue is TiNDC
#34.
Linux.conf.au 2007 saw the first live demo of Nouveau. Dave Airlie had signed up to
give a talk on the subject; he managed to persuade Ben Skeggs that showing a
working glxgears demo would be a great finish to the talk. Ben toiled furiously
with the other developers to get the init code into shape for his laptop
card and the presentation was a great success.
After missing a Google Summer of Code place, X.org granted Nouveau a
Vacation of Code alternative. This saw Arthur Huillet join the team to
complete proper Xv support on Nouveau. Arthur saw the light and continued
with the project once the VoC ended.
In autumn 2007 Stuart Bennett and Maarten Maathuis vowed to get Nouveau's
RandR1.2 into a better shape. Since then a steady stream of patches has
advanced the code greatly.
The project now has 8 regular contributors (Stéphane Marchesin, Ben Skeggs,
Patrice Mandin, Arthur Huillet, Pekka Paalanen, Maarten Maathuis, Peter
Winters, Jeremy Kolb, Stuart Bennett) with many more part time
contributors, testers, writers and translators.
NVidia card families
This article will use the NVidia GPU technical names as opposed to marketing names.
| GPU name | Product name(s) |
| NV04/05 | Riva TNT, TNT2 |
| NV1x | GeForce 256, GeForce 2, GeForce 4 MX |
| NV2x | GeForce 3, GeForce 4 Ti |
| NV3x | GeForce 5 |
| NV4x(G7x) | GeForce 6, GeForce 7 |
| NV5x(G8x) | GeForce 8 |
Where there are "N" and "G" naming the "N" variant (NV4x, NV5x) will be used.
Further information can be found on the Nouveau site.
Graphic Stack Overview
Before jumping into the Nouveau driver, this section provides a short
background on the mess that is the Linux graphics stack.
This stack has a long history dating back to Unix X
servers and the XFree86 project. This history has lead to a situation quite unlike
the driver situation for any other device on a Linux system. The graphics
drivers existed mainly in user space, provided by the XFree86 project, and
little or no kernel interaction was required. The user-space component known
as the DDX (Device-Dependant X) was responsible for initializing the card,
setting modes and providing acceleration for 2D operations.
The kernel also provided framebuffer drivers on certain systems to allow a
usable console before X started. The interaction between these drivers
and the X.org drivers was very complex and often caused many problems
regarding which driver "owned" the hardware.
The DRI project was started to add support for direct rendering of 3D
applications on Linux. This meant that an application could talk to the 3D
hardware directly, bypassing the X server. OpenGL was the standard 3D API, but
it is a complex interface which is definitely too large to
implement in-kernel. GPUs also provided completely different low-level
interfaces. So, due to the complexity of the higher level interface and
nonstandard nature of the hardware APIs, a kernel component (DRM) and a
userspace driver (DRI) were required to securely expose the hardware interfaces
and provide the OpenGL API.
Shortcomings of the current architecture have been noted over the past few
years; the current belief is that GPU initialization, memory management,
and mode setting need to migrate to the kernel in order to provide better
support for features such as suspend/resume, proper cohabitation of X and
framebuffer driver, kernel error reporting, and future graphics card
technologies.
The GPU memory manager implemented by Tungsten Graphics is known as TTM. It was originally designed as a
general VM memory manager but initially targeted at Intel hardware.
On top of this memory manager, a new modesetting architecture for the
kernel is being implemented. This is based on the RandR 1.2 work found in
the X.org server.
GPU architecture
Graphics cards are programmed in numerous ways, but most initialization and
mode setting is done via memory-mapped IO. This is just a set of registers
accessible to the CPU via its standard memory address space. The registers
in this address space are split up into ranges dealing with various
features of the graphics card such as mode setup, output control, or clock
configuration.
A longer explanation can be found on Wikipedia.
Most recent GPUs also provide some sort of command processing ability where
tasks can be offloaded from the CPU to be executed on the GPU, reducing the
amount of CPU time required to execute graphical operations. This
interface is commonly a FIFO implemented as a circular ring buffer into which
commands are pushed by the CPU for processing by the GPU. It is
located somewhere in a shared memory area (AGP memory, PCIGART, or video
RAM). The GPU will also have a set of state information that is used to
process these commands, usually known as a context.
Most modern GPUs only contain a single command processing state
machine. However NVidia hardware has always contained multiple independent
"channels" which consist of a private FIFO (push buffer), a graphics
context and a number of context objects. The push buffer contains the
commands to be processed by the card. The graphics context stores
application specific data such as matrices, texture unit configuration,
blending setup, shader information etc. Each channel has 8 subchannels to
which graphics objects are bound in order to be addressed by FIFO
commands.
Each NVidia card provides between 16 and 128 channels, depending on model;
these are assigned to different rendering-related tasks. Each 3D client has
an associated channel, while some are reserved for use in the kernel and
the X
server. Channels are context-switched by software via an interrupt (on older
cards) or automatically by the hardware on cards after the NV30.
Now what to store within the FIFO? Each NVidia card offers a set of
objects, each of which provide a set of methods related to a given task,
e.g. DMA memory transfers or rendering. Those methods are the ones used by
the driver (or on a higher level, the rendering application). Whenever a
client connects, it uses an ioctl() to create the channel. After that the
client creates the objects it needs via an additional ioctl().
Currently we do have two types of possible clients: X (via the DDX driver)
and OpenGL via DRI/MESA. An accelerated framebuffer using the new
mode setting architecture (nouveaufb) will also be a future client to avoid
conflicts with nvidiafb.
Let's have a look at a small number of objects:
| object name | Description | Available on |
| NV_IMAGE_BLIT | 2D engine, blit image from
one image into another one |
NV03 NV04 NV10 NV20 |
| NV12_IMAGE_BLIT | An enhanced version of the
above | NV11 NV20 NV20
NV30 NV40 |
| NV_MEMORY_TO_MEMORY_FORMAT | DMA memory transfer
| NV04 NV10 NV20 NV30 NV40 NV50 |
From this list, you can see that there are object types which are
available on all cards (NV_MEMORY_TO_MEMORY_FORMAT) while others are only
available on certain cards. For example, each class of card has its own
3D-engine object, such as NV10TCL on NV1x and NV20TCL on NV2x. An object
is identified by a unique number: its "class". This id is 0x5f for
NV_IMAGE_BLIT, 0x9f for NV12_IMAGE_BLIT and 0x39 for
NV_MEMORY_TO_MEMORY_FORMAT. If you want to use functionality provided by a
given object, you must first bind this object to a subchannel. The card
provides a certain number of subchannels which correspond to a certain
number of "active" (or "bound") objects.
A command in the FIFO is made of a command header, followed by one or more
parameters. The command header usually contains the subchannel number, the
method offset to be called, and the number of parameters (a command header
can also define a jump in the FIFO but this is outside the scope of this
document). Each method the object provides has an offset which has to be set in the
command.
In order to limit the number of command headers to be written, thereby
improving performance, NVidia cards will call several subsequent methods in
a row if you provide several parameters.
How do we refer to an object? The data written to the FIFO doesn't hold any
info about that... Binding an object to a subchannel is done by writing
the object ID as an argument to method number 0. For example: 00044000
5c00000c binds object id 5c00000c to subchannel 2. This object ID is used
as a key in a hash table kept in the card's memory which is filled up when
creating objects.
The creation of an object relies on special memory areas.
RAMIN is "instance memory", an area of memory through which the graphics
engines of the card are configured. A RAMIN area is present on all NVIDIA
chipsets in some form, but it has evolved quite a bit as newer chipsets have
been released. Basically, RAMIN is what contains the objects. An object is
usually not big (128 bytes in general, up to a few kilobytes in case of DMA
transfer objects).
| Card-specific RAMIN areas |
| Pre-NV40
|
Area of dedicated internal memory accessible through the card's MMIO
registers.
|
| NV4x
| A 16MiB PCI resource is used to access PRAMIN. This
resource maps over the last
16MiB of VRAM. The first 1MiB of PRAMIN is also accessible through the (now "legacy")
MMIO PRAMIN aperture.
|
|
NV5x
|
A 32MiB PCI resource, which is unusable in the default power-on state of the card. It
can be configured in a variety of different ways through the NV5x virtual memory.
The legacy MMIO aperture can be re-mapped over any 1MiB of VRAM that's desired.
|
There are also a few specific areas in RAMIN that are worth mentioning:
- RAMFC, the FIFO Context Table. It is a global table that stores the
configuration/state of the FIFO engine for each channel. It doesn't exist
in the same way on NV5x, where the FIFO has registers that contain pointers to each
channel's PFIFO state, rather than a single global table.
- RAMHT, the FIFO hash table. A global table, used by PFIFO to locate context
objects, except on NV5x, where each channel has its own hash table.
Additional information can be found on the Nv object
types and Honza Havlicek
pages on the Nouveau site.
Comments (3 posted)
February 20, 2008
This article was contributed by Valerie Henson
When I was but a wee computer science student at New Mexico Tech, a
graduate student in OS handed me an inch-thick print-out and told me
that if I was really interested in operating systems, I had to read
this. It was something about a completely lock-free operating system
optimized using run-time code generation, written from scratch in
assembly running on a homemade two-CPU SMP with a two-word
compare-and-swap instruction - you know, nothing fancy. The print-out
I was holding was Alexia (formerly Henry) Massalin's PhD thesis, Synthesis: An
Efficient Implementation of Fundamental Operating Systems Services
(html version
here). Dutifully, I read the entire 158 pages. At the end, I
realized that I understood not a word of it, right up to and including
the cartoon of a koala saying "QUA!" at the end. Okay, I exaggerate -
lock-free algorithms had been a hobby of mine for the previous few
months - but the main point I came away with was that there was a lot
of cool stuff in operating systems that I had yet to learn.
Every year or two after that, I'd pick up my now bedraggled copy of
"Synthesis" and reread it, and every time I would understand a little
bit more. First came the lock-free algorithms, then the run-time code
generation, then quajects. The individual techniques were not always
new in and of themselves, but in Synthesis they were developed,
elaborated, and implemented throughout a fully functioning UNIX-style
operating system. I still don't understand all of Synthesis, but I
understand enough now to realize that my grad student friend was
right: anyone really interested in operating systems should read this
thesis.
Run-time code generation
The name "Synthesis" comes from run-time code generation - code
synthesis - used to optimize and re-optimize kernel routines in
response to changing conditions. The concept of optimizing code
during run-time is by now familiar to many programmers in part from
Transmeta's processor-level code optimization, used to lower power
consumption (and many programmers are familiar with Transmeta as the
one-time employer of Linus Torvalds.)
Run-time code generation in Synthesis begins with some level of
compile-time optimization, optimizations that will be efficient
regardless of the run-time environment. The result can thought of as a
template for the final code, with "holes" where the run-time data will
go. The run-time code generation then takes advantage of
data-dependent optimizations. For example, if the code evaluates A *
B, and at run-time we discover that B is always 1, then we can generate
more efficient code that skips the multiplication step and run that
code instead of the original. Fully optimized versions of the code
pre-computed for common data values can be simply swapped in without
any further run-time computation. Another example from the thesis:
[...] Suppose that the compiler knows, either through static
control-flow analysis, or simply by the programmer telling it through
some directives, that the function f(p1, ...) = 4 * p1 +
... will be specialized at run-time for constant p1. The compiler can
deduce that the expression 4 * p1 will reduce to a constant, but it
does not know what particular value that constant will have. It can
capture this knowledge in a custom code generator for f that
computes the value 4 * p1 when p1 becomes known and stores it in the
correct spot in the machine code of the specialized function
f, bypassing the need for analysis at run-time.
Run-time code generation in Synthesis is a fusion of compile-time and
run-time optimizations in which useful code templates are created at
compile time that can later be optimized simply and cleanly at run
time.
Quajects
Understanding run-time code generation is a prerequisite for
understanding quajects, the basic unit out of which the Synthesis
kernel is constructed. Quajects are almost but not quite entirely
unlike objects. Like objects, quajects come in types - queue quaject,
thread quaject, buffer quaject - and encapsulate all the data
associated with the quaject. Unlike objects, which contain pointers
to functions implementing their methods, quajects contain the code
implementing their methods directly. That's right - the actual
executable instructions are stored inside the data structure of the
quaject, with the code nestled up against the data it will operate on.
In cases where the code is too large to fit in the quaject, the code
jumps out to the rest of the method located elsewhere in memory. The
code implementing the methods is created by filling in pre-compiled
templates and can be self-modifying as well.
Quajects interact with other quajects via a direct and simple system
of cross-quaject calls: callentries, callouts, and callbacks. The
user of quaject invokes callentries in the quaject, which implement
that quaject's methods. Usually the callentry returns back to the
caller as normal, but in exceptional situations the quaject will
invoke a method in the caller's quaject - a callback. Callouts are
places where a quaject invokes some other quaject's callentries.
Synthesis implements a basic set of quajects - thread, queue, buffer,
clock, etc. - and builds higher-level structures by combining
lower-level quajects. For example, a UNIX process is constructed out
of a thread quaject, a memory quaject, and some I/O quajects.
As an example, let's look at the queue quaject's interface. A queue
has two callentries, queue_put and queue_get. These
are invoked by another quaject wanting to add or remove entries to and
from the queue. The queue quaject also has four callbacks into the
caller's quaject, queue_full, queue_full-1,
queue_empty, and queue_empty-1. When a caller
invokes the queue_put method and the queue is full, the queue
quaject invokes the queue_full callback in the caller's
quaject. From the thesis:
The idea is: instead of returning a condition code for interpretation
by the invoker, the queue quaject directly calls the appropriate
handling routines supplied by the invoker, speeding execution by
eliminating the interpretation of return status codes.
The queue_full-1 method is executed when a queue has
transitioned from full to not full, queue_empty when the queue doesn't
contain anything, and queue_empty-1 when the queue
transitions from empty to not empty. With these six callentries and
callbacks, a queue is implemented in a generic, extensible, yet
incredibly efficient manner.
Pretty cool stuff, huh? But wait, there's more!
Optimistic lock-free synchronization
Most modern operating systems use a combination of interrupt disabling
and locks to synchronize access to shared data structures and
guarantee single-threaded execution of critical sections in general.
The most popular synchronization primitive in Linux is the spinlock,
implemented with the nearly universal test-and-set-bit atomic
operation. When one thread attempts to acquire the spinlock guarding
some critical section, it busy-waits, repeatedly trying to acquire the
spinlock until it succeeds.
Synchronization based on locks works well enough but it has several
problems: contention, deadlock, and priority inversion. Each of these
problems can be (and is) worked around by following strict rules: keep
the critical section short, always acquire locks in the same order,
and implement various more-or-less complex methods of priority
inheritance. Defining, implementing, and following these rules is
non-trivial and a source of a lot of the pain involved in writing code
for modern operating systems.
To address these problems, Maurice Herlihy proposed a system of
lock-free synchronization using an atomic compare-and-swap
instruction. Compare-and-swap takes the address of a word, the
previous value of the word, and the desired new value of the word. It
swaps the previous and new values of the word if and only if the
previous value is the same as the current value. The bare
compare-and-swap instruction allows atomic updates of single pointers.
To atomically switch between larger data structures, a new copy of the
data structure is created, updated with the changes, and the addresses
of the two data structures swapped. If the compare-and-swap fails
because some other thread has updated the value, the operation is
retried until it succeeds.
Lock-free synchronization eliminates deadlocks, the need for strict
lock ordering rules, and priority inversion (contention on the
compare-and-swap instruction itself is still a concern, but rarely
observed in the wild). The main drawback of Herlihy's algorithms is
that they require a lot of data copying for anything more complex than
swapping two addresses, making the total cost of the operation greater
than the cost of locking algorithms in many cases. Massalin took
advantage of the two-word compare-and-swap instruction available in
the Motorola 68030 and expanded on Herlihy's work to implement
lock-free and copy-free synchronization of queues, stacks, and linked
lists. She then took a novel approach: Rather than choose a general
synchronization technique (like spinlocks) and apply it to arbitrary
data structures and operations, instead build the operating system out
of data structures simple enough to be updated in an efficient
lock-free manner.
Synthesis is actually even cooler than lock-free: Given the system of
quajects, code synthesis, and callbacks, operations on data structures
can be completely synchronization-free in common situations. For
example, a single-producer, single-consumer queue can be updated
concurrently without any kind of synchronization as long as the queue
is non-empty, since each thread operates on only one end of the queue.
When the callback for queue empty happens, the code to operate on the
queue is switched to use the lock-free synchronization code. When the
quaject's queue-not-empty callback is invoked, the quajects switch
back to the synchronization-free code. (This specific algorithm is
not, to my knowledge, described in detail in the thesis, but was
imparted to me some months ago by Dr. Massalin herself at one of those
wild late-night kernel programmer parties, so take my description with
a grain of salt.)
The approach to synchronization in Synthesis is summarized in the
following quote:
- Avoid synchronization whenever possible.
- Encode shared data into one or two machine words.
- Express the operation in terms of one or more fast lock-free data
structure operations.
- Partition the work into two parts: a part that can be done
lock-free, and a part that can be postponed to a time when there can
be no interference.
- Use a server thread to serialize the operation. Communications
with the server happens using concurrent, lock-free queues.
The last two points will sound familiar if you're aware of Paul McKenney's
read-copy-update (RCU) algorithm. In Synthesis, thread structures
to be deleted or removed from the run queue are marked as such, and
then actually deleted or removed by the scheduler thread during normal
traversal of the run queue. In RCU, the reference to a list entry is
removed from the linked list while holding the list lock, but the
removed list entry is not actually freed until it can be guaranteed
that no reader is accessing that entry. In both cases, reads are
synchronization-free, but deletes are separated into two phases, one
that begins the operation in an efficient low-contention manner, and a
second, deferred, synchronization-free phase to complete the
operation. The two techniques are by no means the same, but share a
similar philosophy.
Synthesis: Operating system of the future?
The design principles of Synthesis, while powerful and generic, still
have some major drawbacks. The algorithms are difficult to understand
and implement for regular human beings (or kernel programmers, for
that matter). As Linux has demonstrated, making kernel development
simple enough that a wide variety of people can contribute has some
significant payoffs. Another drawback is that two-word
compare-and-swap is, shall we say, not a common feature of modern
processors. Lock-free synchronization can be achieved without this
instruction, but it is far less efficient. In my opinion, reading
this paper is valuable more for retraining the way your brain thinks
about synchronization than for copying the exact algorithms. This
thesis is especially valuable reading for people interested in
low-latency or real-time response, since one of the explicit goals of
Synthesis is support for real-time sound processing.
Finally, I want to note that Synthesis contains many more elegant
ideas that I couldn't cover in even the most superficial detail -
quaject-based user/kernel interface, per-process exception tables,
scheduling based on I/O rates, etc., etc. And while the exact
implementation details are fascinating, the thesis is also peppered
with delightful koan-like statements about design patterns for
operating systems. Any time you're feeling bored with operating
systems, sit down and read a chapter of this thesis.
[ Valerie Henson is a Linux file
systems consultant and proud recipient of a piggy-back ride from
Dr. Alexia Massalin. ]
Comments (27 posted)
Patches and updates
Kernel trees
Core kernel code
Development tools
Device drivers
Documentation
Filesystems and block I/O
Memory management
Architecture-specific
Security-related
Virtualization and containers
Benchmarks and bugs
Miscellaneous
Page editor: Jake Edge
Next page: Distributions>>