|This article brought to you by LWN subscribers|
Subscribers to LWN.net made this article — and everything that surrounds it — possible. If you appreciate our content, please buy a subscription and make the next set of articles possible.
Brendan Gregg currently works as the lead performance engineer at Joyent, a cloud computing provider that (among other things) maintains the Solaris-derived operating system SmartOS. SmartOS is a relative newcomer, but Gregg has a long history with performance on other Solaris systems, too. For his SCALE 12x keynote, Gregg discussed what the Linux and Solaris camps can learn from one another—including both positive and negative lessons—where measuring performance is concerned.
The basic question, he began, is that of trying to understand why OSes differ; why does one application perform differently when running on a Linux-based system than on an Illumos-based one? Illumos, he explained for the unfamiliar, is the open-source derivative of OpenSolaris. SmartOS uses the Illumos kernel, and Joyent offers both SmartOS and Linux-based images to customers. So in some ways his talk was a kernel-to-kernel comparison, but in many cases, other pieces of the system architecture had a bigger impact on performance, and he distinguished between the two categories of differences.
Gregg began his comparison by showing a one-line Perl script that looped 100,000,000 times, setting a string variable on each iteration. One might think that such a simple program would not perform significantly differently on two different Unix-like OSes, he said, but in fact one OS was 14% faster than the other (which one, he did not give away, though he joked "if this was your system, wouldn't you want to know?"). But such a one-liner is actually pretty complex to analyze for performance, he said. Between a Linux system and a SmartOS system, there could be different versions of Perl, different compilers used to build Perl, different optimization options used by the compilers, different system libraries, and different background tasks—any combination of which might cause the performance difference. It could also be the kernel: setting the string involves memory I/O; the kernel determines memory placement of the code; the kernel can control CPU clock speed; the kernel could be affected by handling interrupts; the kernel could even cost time by migrating the process to different CPUs.
The question that customers want answered when trying to correct such a performance difference is where the root of the 14% difference lies. As a performance engineer, he wants to see if he can even trace the difference to its root (which is not always observable), determine if it actually is a difference between the kernels, and then determine if it is fixable. These questions are not easy to answer, he said—in a lot of ways, being asked to compare Linux and SmartOS is like being asked to compare the US and Australia (where Gregg is from): there are so many differences and similarities (small and large) that it is hard to enumerate them all in a meaningful way.
He can list some of the big differences that could impact performance, however. Linux usually has more up-to-date packages, it gets considerably more testing through its larger community, there are more (and better) device drivers available, and it is far more configurable. At a technical level, he cited Linux's read-copy-update (RCU), futexes, and the dynamic tick mechanism as important. Some of SmartOS's positives include its mature Zones virtualization system, the ZFS filesystem, the DTrace tracing framework, the wide array of symbols that are exposed by default for profiling tools, excellent CPU scalability, and the fact that the Solaris kernel is regularly tested and analyzed on "large, large multi-core" machines. In addition, he said, there are many small differences in tunables and features, although these differences (and their impact) tend to change frequently.
Gregg then cautioned the audience that, as he entered into the "what A can learn from B" portion of the talk, the content might not be suitable for those suffering from Not Invented Here syndrome or who are easily trolled.
What Solaris can learn from Linux distributions included the non-technical differences already touched on, such as the well-stocked and frequently updated package repositories. Solaris is often unfairly blamed for SmartOS performance problems that end up getting traced back to old MySQL or OpenSSL packages, he said.
But discussing the technical lessons occupied most of the session. He cited Linux's likely()/unlikely() mechanism for branch prediction, which turns into compiler hints. Solaris has no equivalent so far and, he noted, if anyone is building the Solaris kernel with profile feedback (which could be even better), he has never encountered it. Linux's tickless kernel also improves performance, he said: Solaris still has a clock() routine, and he occasionally encounters performance problems involving a 10ms latency (the default frequency of clock()) that becomes 1ms latency if he changes the clock() frequency to 1000Hz.
Solaris also swaps entire processes, a feature that has been in the code since the PDP-11/20 days. Back then, swapping processes made sense, when the maximum process size was 64KB. Support for paging was added later, he said, and perhaps it is time to drop process swapping entirely. On the flip side, Solaris has a virtual memory limit, while Linux allows more memory to be allocated than can be stored, relying on the out-of-memory (OOM) killer to free up space. Solaris engineers cannot imagine implementing such a feature, he said—it may be fine for Linux running on phones, the argument goes, but not for servers. It is also a cautionary tale for Solaris, he noted, because a lot of new code does not check for ENOMEM on Linux.
An interesting case is Linux's SLUB allocator. It is a simplified version of Solaris's SLAB allocator, he said, and its improvements seem good enough that Solaris should consider merging it back in. Solaris also lacks a "lazy" translation lookaside buffer (TLB) mode, which on Linux gives noticeable performance improvements over regular TLB mode. Linux's System Activity Report (sar) is another "awesome" feature, he said, with more options and more statistics than its Solaris counterparts—and fewer bugs. Solaris should consider learning from both lazy TLB and sar, he said—although he noted that, internally, lazy TLB was a "war starter" among Solaris engineers.
Lastly, Gregg noted that although Solaris Zones were a mature and reliable virtualization mechanism, Zones can only run one kernel. KVM, on the other hand, can run multiple guest OSes. The Joyent team had ported KVM to the Illumos kernel, he said, "so Solaris is already learning from Linux," but Oracle has not merged KVM in upstream.
There are also quite a few things Linux can learn from Solaris, Gregg continued, both in terms of things to do and of things not to do. The ZFS filesystem is great, he said; "it has more performance features than you can shake a stick at." And Linux has learned from it, although license incompatibility means it cannot be merged in directly. But Btrfs and the ZFS on Linux project are doing well. Similarly, Solaris's Zones virtualization is high-performance, and in recent years Linux has picked up a lot of the same concepts for itself, like LXC containers, control groups, and Docker.
A "cautionary tale" from Solaris is STREAMS, the kernel messaging module first introduced in the "rarely-discussed" Unix 8th Edition. Solaris utilizes STREAMS for its TCP/IP stack, which resulted in poor performance that Gregg said was responsible for many of the "Slowlaris" jokes of years past.
On the other hand, he said, Solaris is much easier to analyze for performance problems because compilers on Linux strip out symbols by default. Thus, profiler output is usually filled with inscrutable hex codes. Similarly, compilers drop frame pointers, so stacks are hard to profile. Those who care about performance should "stop the madness," he said, and use options like -fno-omit-frame-pointer. Similarly, prstat -mLc on Solaris provides excellent thread-state analysis. There is no microstate accounting in Linux, he said, which makes analysis more difficult. Linux could learn from Solaris's tooling, perhaps adding more features to htop. SmartOS (although not upstream Solaris) also has a virtual filesystem iostat tool called vfsstat that can reveal lock contention, resource control throttling, and other discrepancies between what a process asks from the VFS system and what performance it ultimately sees.
Arguably the biggest performance-analysis tool Solaris has going for it is DTrace, which is programmable, real-time, and supports both dynamic and static tracing. It can solve "virtually any" performance issue, he said, and it is reliable enough to run on production systems. There are now two Linux implementations of DTrace, of course, but Gregg argued that the biggest lesson Linux needs to learn from Solaris's DTrace success is that "production safety is feature number one." DTrace needs to be free from risk of freezes or kernel panics, he said, and be an everyday tool like top.
Several other projects may offer similar functionality, he said, such as perf events and ktap, although none is quite as ready. perf_events is not programmable, he said; ktap looks impressive so far, but not all of its features are ready for production yet. SystemTap also looks impressive and is the most feature-filled of the options, although he has found it problematic to use on any systems other than Red Hat (although in fairness, he said, Red Hat is developing it, so that is the developer's focus). Finally, he pointed out LTTng. He apologized, however, that he has not had time to properly use LTTng yet, so he could not offer an informed opinion.
Gregg also directed some words of advice to Oracle in particular. He finds DTrace to be one of Solaris's greatest strengths, he said, but Oracle's Solaris team needs to learn that all dynamic tracing is crippled without source code. Oracle can hand customers some scripts to execute, but the customers cannot write their own. If DTrace4Linux achieves feature parity, he said, it will be better than DTrace on Oracle Solaris.
As he wrapped up, Gregg noted that Solaris also has one other lesson to teach Linux: the value of a culture that demands performance. Solaris has long had good performance analysis tools because it was popular with high-paying customers who demanded answers; out of necessity Solaris adapted to be able to provide them. Perhaps Linux needs the same motivator. Too often, he said, Linux performance is debugged with top, strace, and tcpdump—that leaves too many areas uncovered.
As to the ultimate question everyone wants a simple answer to (which is faster, Solaris or Linux?), Gregg called it a crapshoot. Everyone asks about out-of-the-box performance, he said, but out of the box he routinely sees performance differences between Linux and SmartOS systems from 5% to 500%. More importantly, "out of the box" is an irrelevant question. What matters is the performance that you see on your own system, and your ability to tune it until you are satisfied.
Copyright © 2014, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds