<?xml version="1.0" encoding="UTF-8"?>

<rdf:RDF 
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns="http://purl.org/rss/1.0/"
  xmlns:dc="http://purl.org/dc/elements/1.1/"
  xmlns:syn="http://purl.org/rss/1.0/modules/syndication/"
>

  <channel rdf:about="http://lwn.net/headlines/270081/">
    <title>LWN: Comments on "KHB: Synthesis: An Efficient Implementation of Fundamental Operating Systems Services"</title>
    <link>http://lwn.net/Articles/270081/</link>
    <description>
This is a special feed containing comments posted
to the individual LWN article titled &quot;KHB: Synthesis: An Efficient Implementation of Fundamental Operating Systems Services&quot;.

    </description>

    <syn:updatePeriod>hourly</syn:updatePeriod>
    <syn:updateFrequency>2</syn:updateFrequency>
    <items>
      <rdf:Seq>
	<rdf:li resource="http://lwn.net/Articles/343504/rss" />
	<rdf:li resource="http://lwn.net/Articles/343457/rss" />
	<rdf:li resource="http://lwn.net/Articles/283672/rss" />
	<rdf:li resource="http://lwn.net/Articles/276445/rss" />
	<rdf:li resource="http://lwn.net/Articles/274353/rss" />
	<rdf:li resource="http://lwn.net/Articles/271265/rss" />
	<rdf:li resource="http://lwn.net/Articles/271249/rss" />
	<rdf:li resource="http://lwn.net/Articles/271157/rss" />
	<rdf:li resource="http://lwn.net/Articles/271112/rss" />
	<rdf:li resource="http://lwn.net/Articles/271108/rss" />
	<rdf:li resource="http://lwn.net/Articles/271067/rss" />
	<rdf:li resource="http://lwn.net/Articles/270899/rss" />
	<rdf:li resource="http://lwn.net/Articles/270618/rss" />
	<rdf:li resource="http://lwn.net/Articles/270557/rss" />
	<rdf:li resource="http://lwn.net/Articles/270485/rss" />
	<rdf:li resource="http://lwn.net/Articles/270467/rss" />
	<rdf:li resource="http://lwn.net/Articles/270439/rss" />
	<rdf:li resource="http://lwn.net/Articles/270436/rss" />
	<rdf:li resource="http://lwn.net/Articles/270427/rss" />
	<rdf:li resource="http://lwn.net/Articles/270416/rss" />
	<rdf:li resource="http://lwn.net/Articles/270399/rss" />
	<rdf:li resource="http://lwn.net/Articles/270394/rss" />
	<rdf:li resource="http://lwn.net/Articles/270370/rss" />
	<rdf:li resource="http://lwn.net/Articles/270299/rss" />
	<rdf:li resource="http://lwn.net/Articles/270297/rss" />
	<rdf:li resource="http://lwn.net/Articles/270276/rss" />
	<rdf:li resource="http://lwn.net/Articles/270274/rss" />
      
      </rdf:Seq>
    </items>

  </channel>
    <item rdf:about="http://lwn.net/Articles/343504/rss">
      <title>KHB: Synthesis: An Efficient Implementation of Fundamental Operating Systems Services</title>
      <link>http://lwn.net/Articles/343504/rss</link>
      <dc:date>2009-07-27T08:27:43+00:00</dc:date>
      <dc:creator>xoddam</dc:creator>
      <description>
      &lt;div class=&quot;FormattedComment&quot;&gt;
Any concurrency primitive for shared-memory multiprocessors on commodity hardware must for performance reasons be implemented at the cache-coherency level, using message passing (rather than having to hit the underlying slow, shared RAM on every synchronous operation) and is therefore several orders of magnitude more complex in silicon than anything that can be done &quot;with 1 bit&quot;.&lt;br&gt;
&lt;p&gt;
See &lt;a href=&quot;http://lwn.net/Articles/252125/&quot;&gt;http://lwn.net/Articles/252125/&lt;/a&gt; (scroll down to section 3.3.4, Multi-processor support).&lt;br&gt;
&lt;p&gt;
CAS and the spicier double-word CAS are obviously 'trickier' in some sense than LL/SC but both must be implemented by grabbing exclusive access over a cache line (maybe two in the case of double-word, I suppose).  The underlying mechanics are the same.&lt;br&gt;
&lt;/div&gt;

      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/343457/rss">
      <title>KHB: Synthesis: An Efficient Implementation of Fundamental Operating Systems Services</title>
      <link>http://lwn.net/Articles/343457/rss</link>
      <dc:date>2009-07-26T16:28:12+00:00</dc:date>
      <dc:creator>asdlfiui788b</dc:creator>
      <description>
      &lt;div class=&quot;FormattedComment&quot;&gt;
Perhaps this is trolling but... I used to think that lock free algorithms based on CAS were a good idea, until I saw how the Alpha handled this.  Load-Linked / Store-Conditional is more natural for hardware and also a damn sight more useful as it prevents the ABA problem (I've played with some quiescence algorithms for memory allocators to get round this but it's still ugly and difficult to implement).  If you're interested in this sort of thing I'd strongly recommend looking up LL/SC and this kind of problem.  While you're at it, look at Hans Boehm's paper on implementing LL/SC with 1 bit and a trivial amount of circuitry; it's so much easier that CAS and pretty much trivial to implement.&lt;br&gt;
&lt;p&gt;
&lt;p&gt;
&lt;/div&gt;

      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/283672/rss">
      <title>re: Synthesis: Operating system of the future?</title>
      <link>http://lwn.net/Articles/283672/rss</link>
      <dc:date>2008-05-24T03:23:03+00:00</dc:date>
      <dc:creator>olecom</dc:creator>
      <description>
      &lt;div class=&quot;FormattedComment&quot;&gt;&lt;pre&gt;
As a matter of fact eight-year developer of PowerPC Linux port Cort Dougan has citations of
Massalin's work in his research and application publications.

Bottom line, as far as i can see, can be read here:

Cort Dougan: Good Programmers are not Lazy. unpublished draft.
&lt;a rel=&quot;nofollow&quot; href=&quot;http://hq.fsmlabs.com/~cort/papers/lazy/lazy.nohead.html&quot;&gt;http://hq.fsmlabs.com/~cort/papers/lazy/lazy.nohead.html&lt;/a&gt;

A must read.

Ouch, fsmlabs and software patents? Microsoft has patent on double-click, so what? Now Novel,
RH and others have all that protecting-patents dance. Thus, sw patents in this case are
irrelevant. About sw freedom and money, please read the article.

&lt;/pre&gt;&lt;/div&gt;

      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/276445/rss">
      <title>Great writeup!</title>
      <link>http://lwn.net/Articles/276445/rss</link>
      <dc:date>2008-04-03T19:50:14+00:00</dc:date>
      <dc:creator>PaulMcKenney</dc:creator>
      <description>
      &lt;div class=&quot;FormattedComment&quot;&gt;&lt;pre&gt;
Let's see, page 139:123...

Fully coherent instruction caches do exist on some systems.  Replicated registers for fast
interrupt handling were present in some 8080-based CPUs back in the late 70s and early 80s,
but have fallen out of favor, though one might get the same effect on modern multithreaded
hardware by reserving one thread to process interrupts on behalf of the other, but not sure
that this would be worthwhile.  Double-compare-and-swap is indeed useful, but doesn't exist on
the systems I have access to, despite its appearing in the 68K line quite some time ago.
Large caches, large cacheline sizes, and very high memory bandwidths have reduced the urgency
of partial-register saving, and again, in some cases you could assign processes to hardware
threads -- but perhaps this will become more compelling in the future.  Some CPUs have vector
operations that do some of the multi-byte operations (e.g., SSE, 3DNow, Altivec, ...), and
perhaps compilers will become more aggressive about implementing them.  Some of the fancier
bitwise operations are indeed useful in some contexts, so who knows.

But I do apologize for my extreme bigotry in favor of hardware features that I can use today
across a wide variety of CPU types.  ;-)

The operating systems I work with -do- have priorities, so I don't get to ignore them.
Perhaps more sophisticated scheduling policies will come into vogue.  There are certainly
people working on this.  In any case, stock Linux running on commodity microprocessors does
context switches orders of magnitude faster than the one-millisecond figure called out on page
142:126 -- old-style Moore's law at work.

Hardware cache-coherence protocols coupled with high software memory contention really can
result in priority-inversion-like effects.  I have seen this, logic analyzer and everything.
Would you care to outline your objections in more detail?
&lt;/pre&gt;&lt;/div&gt;

      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/274353/rss">
      <title>Great writeup!</title>
      <link>http://lwn.net/Articles/274353/rss</link>
      <dc:date>2008-03-20T19:24:02+00:00</dc:date>
      <dc:creator>olecom</dc:creator>
      <description>
      &lt;div class=&quot;FormattedComment&quot;&gt;&lt;pre&gt;
&lt;font class=&quot;QuotedText&quot;&gt;&amp;gt; NBS algorithms rely heavily on hardware arbitration, which is usually&lt;/font&gt;
&lt;font class=&quot;QuotedText&quot;&gt;&amp;gt; unaware of process priorities.&lt;/font&gt;

There are no priorities and scheduler is I/O rate-based, PLL-managed.
On pdf.page 139 (123) find some ideas for hardware and open yor eyes
(pdf.page 142 (126).

&lt;font class=&quot;QuotedText&quot;&gt;&amp;gt; This can result in priority-inversion-like effects when the hardware&lt;/font&gt;
&lt;font class=&quot;QuotedText&quot;&gt;&amp;gt; gives the contended cache line to the low-priority process.&lt;/font&gt;

Oh, read whole dissertation, please.
_____

&lt;/pre&gt;&lt;/div&gt;

      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/271265/rss">
      <title>KHB: Synthesis: An Efficient Implementation of Fundamental Operating Systems Services</title>
      <link>http://lwn.net/Articles/271265/rss</link>
      <dc:date>2008-02-28T20:59:02+00:00</dc:date>
      <dc:creator>nix</dc:creator>
      <description>
      &lt;div class=&quot;FormattedComment&quot;&gt;&lt;pre&gt;
Insane?

Oh.

So, er, I should be ashamed that I did almost exactly that (via a custom 
allocator layered atop big mmap()ed hunks) in C about a year ago? (Well, 
it was more like a twisted sort of RCU: I'd construct a new instance that 
was a copy of the old one with a zero reference count, then changing 
pointers in the new instance's VMT-analogue, translating the object's data 
into a new representation at the same time. The old instance got 
reference-zapped when nobody was executing methods via those pointers 
anymore.)

(yes, there were pressing reasons. It was a bundle of fun, as well. :) )

&lt;/pre&gt;&lt;/div&gt;

      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/271249/rss">
      <title>The Kernel Hacker's Bookshelf: Synthesis: An Efficient Implementation of Fundamental Operating Systems Services</title>
      <link>http://lwn.net/Articles/271249/rss</link>
      <dc:date>2008-02-28T19:11:43+00:00</dc:date>
      <dc:creator>renox</dc:creator>
      <description>
      &lt;div class=&quot;FormattedComment&quot;&gt;&lt;pre&gt;
&lt;font class=&quot;QuotedText&quot;&gt;&amp;gt;&amp;gt; When the callback for queue empty happens, the code to operate on the queue is switched to
use the lock-free synchronization code. When the quaject's queue-not-empty callback is&lt;/font&gt;
invoked, the quajects switch back to the synchronization-free code.
&amp;gt;
&lt;font class=&quot;QuotedText&quot;&gt;&amp;gt;Whoa...  that ran shivers down my spine.  What an outrageously cool idea.&lt;/font&gt;

Uh? I don't think that it's a particulary original idea..
But implementing this efficiently on the other hand, this seems quite hard to do!
I have some difficulties downloading the PDF, so I cannot read the paper yet..
&lt;/pre&gt;&lt;/div&gt;

      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/271157/rss">
      <title>Lock-free operations</title>
      <link>http://lwn.net/Articles/271157/rss</link>
      <dc:date>2008-02-28T11:48:46+00:00</dc:date>
      <dc:creator>forthy</dc:creator>
      <description>
      &lt;p&gt;There are a number of ways to do efficient lock-free operations even 
without resorting to a locked operation in some circumstances. Take e.g. 
the single-writer, single-reader queue with a finite size: All you need 
is a circular buffer and two pointers. The writer only writes to the end 
of the queue and updates only the high pointer after writing. The reader 
reads from the start of the queue and updates the low pointer after 
reading.&lt;/p&gt;

&lt;p&gt;Or allocation/freeing of objects in a multi-processor system: Each 
core frees only those objects it allocated itself. Other cores can 
post &quot;free this object&quot; into a single-writer, single-reader queue, or 
mark objects as free and have them scanned while idle. You can even use a 
linked list if you keep at least one deleted object alive to avoid the 
empty-linked-list queue condition.&lt;/p&gt;

&lt;p&gt;Another point is certainly serialization and deferring actions. One 
thing that comes in mind here is the user-kernel communication. In Linux, 
the user invokes a context switch into the kernel for every system call, 
and to make this context switch light-weight, the system call uses the 
user process, so in fact there are many kernel threads, which will 
require synchronization (in early Linux days, there was the BKL, so the 
kernel was serialized). If you change the syscall interface to a command 
queue (think of the X server), you can probably afford a more 
heavy-weight context switch, and less kernel threads. This however 
significantly changes the interface paradigm to the kernel - Unix-like 
syscalls do not fit well. Even the X server has too many functions which 
require round-trips. The upside of this change is that a network 
transparent kernel interface becomes feasible - and performance in 
clusters would improve significantly.&lt;/p&gt;
      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/271112/rss">
      <title>KHB: Synthesis: An Efficient Implementation of Fundamental Operating Systems Services</title>
      <link>http://lwn.net/Articles/271112/rss</link>
      <dc:date>2008-02-27T22:51:02+00:00</dc:date>
      <dc:creator>pphaneuf</dc:creator>
      <description>
      &lt;p&gt;My way is ugly. That way is &lt;em&gt;insane&lt;/em&gt;.

&lt;p&gt;Still, I wish it was possible to manipulate vptrs and vtbls more in C++. Things like having two different implementations of a virtual method with the same signature inherited from two parents would make sense, for example (say, the &quot;draw&quot; methods you get while deriving from both Drawable and Cowboy).
      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/271108/rss">
      <title>KHB: Synthesis: An Efficient Implementation of Fundamental Operating Systems Services</title>
      <link>http://lwn.net/Articles/271108/rss</link>
      <dc:date>2008-02-27T21:58:34+00:00</dc:date>
      <dc:creator>zlynx</dc:creator>
      <description>
      &lt;div class=&quot;FormattedComment&quot;&gt;&lt;pre&gt;
Your solution is not as ugly as another I've seen in C++ where the programmer used placement
new to reconstruct the object with a different derived type (but the same base type) in-place.

Now, that was scary.
&lt;/pre&gt;&lt;/div&gt;

      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/271067/rss">
      <title>KHB: Synthesis: An Efficient Implementation of Fundamental Operating Systems Services</title>
      <link>http://lwn.net/Articles/271067/rss</link>
      <dc:date>2008-02-27T17:30:08+00:00</dc:date>
      <dc:creator>pphaneuf</dc:creator>
      <description>
      &lt;p&gt;The trick of changing between two algorithms is pretty awesome, and something more or less like that could easily be done with many pieces of code in the kernel. I did similar things in user-space C++ code, by using callbacks for some things, where I basically encoded part of the state in the function pointer itself (in state A, point at do_state_A, in state B, point at do_state_B, something like that), which was kind of ugly in C++, as it basically was hand-coded virtual methods, so that I could change them at runtime (and per object).

&lt;p&gt;But in the kernel, we're &lt;em&gt;already&lt;/em&gt; implementing virtual methods by hand, and on top of it, we don't use per-class virtual method tables, but rather embed them in each objects already, which is &lt;em&gt;exactly&lt;/em&gt; what we need! It's just that they are usually initialized once at object creation, we'd just need to use some atomic CAS to update a function pointer when some algorithm changes.

&lt;p&gt;The trick part is when more than one method needs to change. Damn, I actually expect this would be common, too. Argh.

&lt;p&gt;Oh well, it's pretty cool!
      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/270899/rss">
      <title>Callbacks and quajects</title>
      <link>http://lwn.net/Articles/270899/rss</link>
      <dc:date>2008-02-26T16:55:25+00:00</dc:date>
      <dc:creator>pdc</dc:creator>
      <description>
      &lt;div class=&quot;FormattedComment&quot;&gt;&lt;pre&gt;
I think you would need to come up with some new programming language syntax that encapsulates
the special conventions of quajects and the locking primitives (sort of how Occam encapsulates
the CSP-style synchronization primitives of the Transputer), doing all the bookkeeping and
self-modifying code invisibly for you. Without a higher-level language to write programs in
you presumably are left with hand-edited and fragile assembly-language code...
&lt;/pre&gt;&lt;/div&gt;

      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/270618/rss">
      <title>The Kernel Hacker's Bookshelf: Synthesis: An Efficient Implementation of Fundamental Operating Systems Services</title>
      <link>http://lwn.net/Articles/270618/rss</link>
      <dc:date>2008-02-24T13:42:31+00:00</dc:date>
      <dc:creator>nix</dc:creator>
      <description>
      &lt;div class=&quot;FormattedComment&quot;&gt;&lt;pre&gt;
Ah, but with Synthesis, the kernel can recompile iself! :)
&lt;/pre&gt;&lt;/div&gt;

      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/270557/rss">
      <title>Great writeup!</title>
      <link>http://lwn.net/Articles/270557/rss</link>
      <dc:date>2008-02-22T18:06:59+00:00</dc:date>
      <dc:creator>PaulMcKenney</dc:creator>
      <description>
      &lt;div class=&quot;FormattedComment&quot;&gt;&lt;pre&gt;
Having this write-up, to say nothing of an online copy of Massalin's dissertation, would have
saved me much time back when I was doing my own dissertation.  ;-)  Good to see both now
available!!!

One of the key strengths of Massalin's work is the focus on determining what techniques work
well in given situations, and then matching up techniques with the corresponding situations.
&quot;Use the right tool for the job!&quot;

Although lock-free techniques can be quite valuable for situations requiring real-time
response, as can other non-blocking-synchronization (NBS) techniques, these techniques are not
a panacea.  NBS algorithms rely heavily on hardware arbitration, which is usually unaware of
process priorities.  This can result in priority-inversion-like effects when the hardware
gives the contended cache line to the low-priority process.
&lt;/pre&gt;&lt;/div&gt;

      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/270485/rss">
      <title>The Kernel Hacker's Bookshelf: Synthesis: An Efficient Implementation of Fundamental Operating Systems Services</title>
      <link>http://lwn.net/Articles/270485/rss</link>
      <dc:date>2008-02-22T10:46:20+00:00</dc:date>
      <dc:creator>mingo</dc:creator>
      <description>
      &lt;div class=&quot;FormattedComment&quot;&gt;&lt;pre&gt;
Yes. Especially as today's CPUs move towards annotating cache lines and doing certain
optimizations of the generic functions by observing their usage and annotating specific
instructions. Creating _more_ code goes into the opposite direction.
&lt;/pre&gt;&lt;/div&gt;

      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/270467/rss">
      <title>The Kernel Hacker's Bookshelf: Synthesis: An Efficient Implementation of Fundamental Operating Systems Services</title>
      <link>http://lwn.net/Articles/270467/rss</link>
      <dc:date>2008-02-22T02:30:35+00:00</dc:date>
      <dc:creator>tcoppi</dc:creator>
      <description>
      &lt;div class=&quot;FormattedComment&quot;&gt;&lt;pre&gt;
Thanks for this very insightful writeup, now I have yet another thing to read and become
immersed in instead of doing real work :P
&lt;/pre&gt;&lt;/div&gt;

      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/270439/rss">
      <title>The Kernel Hacker's Bookshelf: Synthesis: An Efficient Implementation of Fundamental Operating Systems Services</title>
      <link>http://lwn.net/Articles/270439/rss</link>
      <dc:date>2008-02-21T21:42:30+00:00</dc:date>
      <dc:creator>vaurora</dc:creator>
      <description>
      &lt;div class=&quot;FormattedComment&quot;&gt;&lt;pre&gt;
&lt;font class=&quot;QuotedText&quot;&gt;&amp;gt; After realizing they were anything but maintainable, I chucked lock-free algorithms on my&lt;/font&gt;
&lt;font class=&quot;QuotedText&quot;&gt;&amp;gt; shelf of stuff that is academically fascinating but unrealistic for the real world.  Should
I&lt;/font&gt;
&lt;font class=&quot;QuotedText&quot;&gt;&amp;gt; take them back down and re-examine them?&lt;/font&gt;

My personal opinion is that lock-free algorithms are not a good generic synchronization
technique, and are definitely very very complex and difficult to understand.  However, in
certain specific cases, lock-free can be simple, elegant, and a huge performance advantage
over traditional approaches.  It's much like RCU - you wouldn't want to use RCU for every
synchronization problem, but when it comes to highly-shared read-mostly data in the hot path
(e.g., the dcache), it's worth the trouble.

It's kind of like my advice on choosing a file system: Use ext3 unless it's not
fast/big/whatever enough for you, in which case use XFS.  My recommendation is use locks
unless they have too much contention/complexity/whatever, in which case look into lock-free.

&lt;font class=&quot;QuotedText&quot;&gt;&amp;gt; &amp;gt; When the callback for queue empty happens, the code to operate on the queue is switched to&lt;/font&gt;
&lt;font class=&quot;QuotedText&quot;&gt;&amp;gt; &amp;gt; use the lock-free synchronization code. When the quaject's queue-not-empty callback is&lt;/font&gt;
&lt;font class=&quot;QuotedText&quot;&gt;&amp;gt; &amp;gt; invoked, the quajects switch back to the synchronization-free code.&lt;/font&gt;
&lt;font class=&quot;QuotedText&quot;&gt;&amp;gt; &lt;/font&gt;
&lt;font class=&quot;QuotedText&quot;&gt;&amp;gt; Whoa...  that ran shivers down my spine.  What an outrageously cool idea.&lt;/font&gt;

I know... The whole damn dissertation is like that.
&lt;/pre&gt;&lt;/div&gt;

      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/270436/rss">
      <title>Advanced reading</title>
      <link>http://lwn.net/Articles/270436/rss</link>
      <dc:date>2008-02-21T21:26:32+00:00</dc:date>
      <dc:creator>vaurora</dc:creator>
      <description>
      For people who want to get into detailed analysis of the applicability of the ideas in Synthesis, a good place to start is &lt;a href=http://valhenson.org/synthesis/SynthesisOS/ch7.html&gt;Section 7.5, &quot;Objections.&quot;&lt;/a&gt; It's organized around the following questions:
&lt;p&gt;

&lt;blockquote&gt;
&lt;ul&gt; 

&lt;li&gt; &quot;How much of the performance improvement is due to my ideas, and how much is due to writing in assembler, and tuning the hell out of the thing?&quot;

&lt;li&gt; &quot;Self-modifying data structures are troublesome on pipelined machines, and code generation has problems with machines that don't allow finegrained control of the instruction cache. In other words, Synthesis techniques are dependent on hardware features that aren't present in all machines, and, worse, are becoming increasingly scarce.&quot;

&lt;li&gt; &quot;Does this matter? Hardware is getting faster, and anything that is slow today will probably be fast enough in two years.&quot; [*snicker* - Ed.]

&lt;li&gt; &quot;Why is Synthesis written in assembler? How much of the reason is that you wanted no extraneous instructions? How much of the reason is that code synthesis requires assembler? How much of Synthesis could be re-written in a high-level language?&quot;

&lt;/ul&gt;
&lt;/blockquote&gt;
      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/270427/rss">
      <title>Code size and cache thrash</title>
      <link>http://lwn.net/Articles/270427/rss</link>
      <dc:date>2008-02-21T21:18:42+00:00</dc:date>
      <dc:creator>vaurora</dc:creator>
      <description>
      The issue of code size and instruction cache usage is a very important one - one I didn't have space to talk about in this article, but which is addressed in the original paper/dissertation/book.  The executive summary is that if you only do inline code generation when it makes sense and share code otherwise, the code size is fine.  In fact, for certain types of functions, the code size is reduced by inlining the code directly instead of setting up all the arguments and calling the shared function code. (gcc can do this statically at compile time, Synthesis can do it at run-time too.)
&lt;p&gt;

For more detail, see &lt;a href=http://valhenson.org/synthesis/SynthesisOS/ch3.html&gt;section 3.4.1&lt;/a&gt;.  I feel like I read another section addressing this but I can't find it right now...
      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/270416/rss">
      <title>The Kernel Hacker's Bookshelf: Synthesis: An Efficient Implementation of Fundamental Operating Systems Services</title>
      <link>http://lwn.net/Articles/270416/rss</link>
      <dc:date>2008-02-21T19:33:53+00:00</dc:date>
      <dc:creator>ikm</dc:creator>
      <description>
      &lt;div class=&quot;FormattedComment&quot;&gt;&lt;pre&gt;
The bubble sort vs. quicksort would be an incorrect analogy. The idea here is to inject
several optimized versions of the same code. My arguing was that one generic version might
perform better than a multitude of specialized copies of the same code.
&lt;/pre&gt;&lt;/div&gt;

      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/270399/rss">
      <title>The Kernel Hacker's Bookshelf: Synthesis: An Efficient Implementation of Fundamental Operating Systems Services</title>
      <link>http://lwn.net/Articles/270399/rss</link>
      <dc:date>2008-02-21T19:10:39+00:00</dc:date>
      <dc:creator>bronson</dc:creator>
      <description>
      &lt;div class=&quot;FormattedComment&quot;&gt;&lt;pre&gt;
More code does not imply slower.  Think bubble sort vs. quicksort.

Besides, it sounds like the recompilations only happen at exceptional times.  I highly doubt
that they result in the sort of cache thrashing that you imply.
&lt;/pre&gt;&lt;/div&gt;

      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/270394/rss">
      <title>The Kernel Hacker's Bookshelf: Synthesis: An Efficient Implementation of Fundamental Operating Systems Services</title>
      <link>http://lwn.net/Articles/270394/rss</link>
      <dc:date>2008-02-21T19:03:45+00:00</dc:date>
      <dc:creator>bronson</dc:creator>
      <description>
      &lt;div class=&quot;FormattedComment&quot;&gt;&lt;pre&gt;
I played around with lock-free algorithms a year ago and came away with a rather dim view of
them.  It's easy to write a provably correct lock-free queue when you assume infinite memory
(something that most papers seem to do).  It's a lot messier when you need to ensure the safe
disposal of list items when you're done with them.

You mention he's using an atomic CAS.  Does he explicitly solve CAS's ABBA problem?  It's
easily solved if you assume infinite memory of course.  If you're on a real-world system, if
you're not careful, you can easily open yourself up to situations where CAS assures you that a
value is unchanged when in reality you lost out a long time ago and the ring has just wrapped.
(yes, it's not hard to solve if you're watching for it, but always having to worry about this
sort of subtlety is what makes working lock-free such a pain)

After realizing they were anything but maintainable, I chucked lock-free algorithms on my
shelf of stuff that is academically fascinating but unrealistic for the real world.  Should I
take them back down and re-examine them?

&lt;font class=&quot;QuotedText&quot;&gt;&amp;gt; When the callback for queue empty happens, the code to operate on the queue is switched to
use the lock-free synchronization code. When the quaject's queue-not-empty callback is
invoked, the quajects switch back to the synchronization-free code.&lt;/font&gt;

Whoa...  that ran shivers down my spine.  What an outrageously cool idea.

&lt;/pre&gt;&lt;/div&gt;

      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/270370/rss">
      <title>The Kernel Hacker's Bookshelf: Synthesis: An Efficient Implementation of Fundamental Operating Systems Services</title>
      <link>http://lwn.net/Articles/270370/rss</link>
      <dc:date>2008-02-21T18:06:59+00:00</dc:date>
      <dc:creator>ikm</dc:creator>
      <description>
      &lt;div class=&quot;FormattedComment&quot;&gt;&lt;pre&gt;
The approaches described here seem to completely ignore the fact that the increased amount of
code would result in increased cache misses, possibly penalizing the program more than
benefiting it. Instead of using the special-case function f(p) for p == 1, it might be faster
to just call the generic one, just because it's in the cache already. Cache is everything
nowadays, so the algorithms should be tailored for cache efficiency. Having more code hardly
helps here.

Furthermore, it seems to be unclear when to optimize and for what data  the time spent
creating new code, the memory consumed to hold it, and, once again, the cache penalties
resulting from having it executed once in a while should all be worth it. The whole approach
feels like it can be simplified to the idea of 'having a compiler inside of the running
program used to recompile its parts in runtime', but somehow I fail to see too many benefits
in that. To my understanding, if one wants to optimize, the better way to do is to think more
about caches/cpus interactions. The &quot;network channels&quot; apporach is one nice example of that.
&lt;/pre&gt;&lt;/div&gt;

      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/270299/rss">
      <title>The Kernel Hacker's Bookshelf: Synthesis: An Efficient Implementation of Fundamental Operating Systems Services</title>
      <link>http://lwn.net/Articles/270299/rss</link>
      <dc:date>2008-02-21T13:54:21+00:00</dc:date>
      <dc:creator>nix</dc:creator>
      <description>
      &lt;div class=&quot;FormattedComment&quot;&gt;&lt;pre&gt;
I'm not sure whether to thank you or curse you for posting this fascinating article (and
linking to the thesis). I've been infected now. I'm not going to be able to write hardly
anything in future without borrowing ideas from this.

I shall have to spread the pain by telling everyone I have ever met about it. :)

&lt;/pre&gt;&lt;/div&gt;

      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/270297/rss">
      <title>Mine too..</title>
      <link>http://lwn.net/Articles/270297/rss</link>
      <dc:date>2008-02-21T13:51:16+00:00</dc:date>
      <dc:creator>nix</dc:creator>
      <description>
      &lt;div class=&quot;FormattedComment&quot;&gt;&lt;pre&gt;
It's definitely more elegant. It's more like the Emacs hooks-everywhere scheme than exception
handling, though.

(I've seen one program that approaches things in the same way as this does with its quaject
callbacks, and that's the ERC IRC client, which has a fairly conventional server core, but all
the actual processing is done by invoking hooks with names constructed from the type of the
server response; said hooks can then send messages back again, or whatever. Of course ERC
doesn't have all the *other* nifty stuff in Synthesis.)

(btw, isn't the cartoon at the end a picture of a quala, not a koala? --- sorry.)

&lt;/pre&gt;&lt;/div&gt;

      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/270276/rss">
      <title>Mine too..</title>
      <link>http://lwn.net/Articles/270276/rss</link>
      <dc:date>2008-02-21T11:39:11+00:00</dc:date>
      <dc:creator>deleteme</dc:creator>
      <description>
      &lt;div class=&quot;FormattedComment&quot;&gt;&lt;pre&gt;
But I like those quaject callbacks I think it's more beautiful than doing exception throwing
and signal passing. But again maybe that's just my ignorance showing.. ;-)
&lt;/pre&gt;&lt;/div&gt;

      
      </description>
    </item>
    <item rdf:about="http://lwn.net/Articles/270274/rss">
      <title>The Kernel Hacker's Bookshelf: Synthesis: An Efficient Implementation of Fundamental Operating Systems Services</title>
      <link>http://lwn.net/Articles/270274/rss</link>
      <dc:date>2008-02-21T11:09:20+00:00</dc:date>
      <dc:creator>leighbb</dc:creator>
      <description>
      &lt;div class=&quot;FormattedComment&quot;&gt;&lt;pre&gt;
Ow, my brain hurts.  I'll stick to the occasional kernel recompile.

&lt;/pre&gt;&lt;/div&gt;

      
      </description>
    </item>
</rdf:RDF>

