The second day of the summit began with a two-hour panel of microprocessor
architects, who used the opportunity to talk about what is coming and to
communicate their thoughts to the developer community. The panelists were:
- Ravi Arimilli, IBM PowerPC
- Richard Brunner, AMD Opteron
- Ken Shoemaker, Intel IA-64
- Michael Fetterman, Intel IA-32
Each of the panelists took a bit of time to talk to the group about where
his particular processor line is going, and what it means for operating
system kernels.
First up was Michael Fetterman. After talking about upcoming x86
processors, he made the point that, in the near future, most or all
processors will use hyperthreading (or "symmetric multithreading")
technology. SMP will not be just for big systems anymore. As processors
pick up hyperthreading, and multiple processors are put onto the same chip,
schedulers will have to become more "interference aware." Different
processors will share different resources with each other, and schedulers
will want to make decisions which make the best use of the shared resources
while keeping processes from interfering with each other. It is likely to
be an interesting challenge.
Another thing to be aware of is the increasing use of variable performance
technology. It is normal for modern processors to slow themselves down in
response to overheating or, in the laptop case, as a way of extending
battery life. It is no longer true that all clock cycles are the same.
Some processor features, says Michael, are not used to their fullest
potential currently. One of those features is page attributes.
Modern IA-32 processors have better control over page
behavior via these attributes, but even Michael said that it's "not clear"
if they are really useful. Performance counters were
also mentioned; in the future, perhaps the processor performance counters
could be used for scheduling decisions.
Future IA-32 processors will also have a pair of synchronization
instructions which could be useful. The "monitor" and "mwait" instructions
halt the current thread until a given region of memory is modified. These
instructions are intended to solve a problem that comes up with spinlocks
on a hyperthreaded system: one thread may be spinning on a lock and
blocking another thread which is trying to release that lock. These
instructions might be able to provide spinlock functionality without the
need to actually spin.
The 4G/4G patch was brought up again with regard to the cost of flushing
the translation buffer on x86 (and other) processors. Michael said that
future processors might include a process-aware TLB, so that context
switches no longer would require a TLB flush. That is not a near-future
development, however. (Richard Brunner noted that the AMD Opteron
architecture has this feature now).
Next at the podium was Mr. Brunner, talking about the AMD64 architecture.
He expressed his gratitude toward Linux, which was the first operating
system to support this processor. His talk noted that the integration
process - where functionality of (once) peripheral devices is brought onto
the processor die - would continue.
Richard noted that a fair amount of effort goes into trying to work around buggy
BIOS code which fails to properly set up the (undocumented) processor MSR
registers properly. He requested: rather than expend that energy, please
complain to AMD and give them a chance to apply a cluestick to the BIOS
vendor. He also promised, to general applause, that AMD would document all
of its MSR bits in the near future.
He complained about the lack of a standard driver versioning scheme in the
kernel. When a vendor complains about a particular device not working, AMD
has no way of knowing which driver version they are working with. Rusty Russell
noted that creating a MODULE_VERSION() macro would be easy, and
that he would do it. It is harder, however, to create a standard form for
version numbers, force developers to use it, and to ensure that it gets
updated when the driver is patched.
Richard also made the point that digital rights management schemes cannot
be ignored - they are coming. What needs to be done is to come up with a
way that people wanting DRM can have it without messing things up for
everybody else.
The biggest problem, however (according to Richard) is that "Taiwan,
Inc. doesn't get it." There is still no end to compatibility problems with
motherboards, BIOSes, devices, etc. He suggested that, at the next summit,
the processor architect panel should be replaced with a panel of the
biggest motherboard vendors, who could be encouraged to be a little more
Linux-friendly.
Ken Shoemaker's IA-64 talk pointed out that the number of threads per
socket will increase greatly as hyperthreading increases and more
processors are put onto each die. Might it be possible, he asked, that
some of those threads could be put to more creative uses than simply
running more user processes? He didn't really have any suggestions for
what those uses could be, however.
Ken also suggested using the performance counters for run-time
optimization, and talked about variable performance issues. Itanium
processors, too, will throttle themselves if they get too hot. Power
management is not just for laptops anymore. Finally, Ken said that Linux
should be making better use of large pages. Memory sizes are growing
faster than translation lookup buffer sizes; without better large page use,
performance will suffer.
Ravi Arimilli's talk was mostly concerned with the great features of the
Power4 and (upcoming) Power5 processors; he had little in the way of
suggestions for the Linux developers.
Linus brought up the problem of getting different types of processors into
the hands of developers and users. He suggested that the vendors might
want to put a bit more attention into producing low-end systems that users
will actually want to buy. Jon 'maddog' Hall said that the various
processor architectures wild also benefit from paying more attention to
gcc. The architects responded uniformly with complaints about how
difficult it is to work with the gcc team. They all understand their
interest in having gcc work will with their processors, but actually
getting patches into the gcc code base is difficult.
Making greater use of the performance counters came up again, with the
kernel developers pointing out that these counters tend to be hard to work
with. They are often poorly documented, different for every processor
model, and highly complex. Things may get better with some counters, but
there seems to be a certain resistance to standardizing the counters and
turning them into "architecture" features. Once something becomes part of
the architecture, it cannot be changed again, and must be carried forward
indefinitely. This seemed to be an especially sore point for Michael
Fetterman, the IA-32 designer.
(
Log in to post comments)