User: Password:
|
|
Subscribe / Log in / New account

What Linux and Solaris can learn from each other

This article brought to you by LWN subscribers

Subscribers to LWN.net made this article — and everything that surrounds it — possible. If you appreciate our content, please buy a subscription and make the next set of articles possible.

By Nathan Willis
February 26, 2014
SCALE 2014

Brendan Gregg currently works as the lead performance engineer at Joyent, a cloud computing provider that (among other things) maintains the Solaris-derived operating system SmartOS. SmartOS is a relative newcomer, but Gregg has a long history with performance on other Solaris systems, too. For his SCALE 12x keynote, Gregg discussed what the Linux and Solaris camps can learn from one another—including both positive and negative lessons—where measuring performance is concerned.

The basic question, he began, is that of trying to understand why OSes differ; why does one application perform differently when running on a Linux-based system than on an Illumos-based one? Illumos, he explained for the unfamiliar, is the open-source derivative of OpenSolaris. SmartOS uses the Illumos kernel, and Joyent offers both SmartOS and Linux-based images to customers. So in some ways his talk was a kernel-to-kernel comparison, but in many cases, other pieces of the system architecture had a bigger impact on performance, and he distinguished between the two categories of differences.

Performance, not made simple

Gregg began his comparison by showing a one-line Perl script that looped 100,000,000 times, setting a string variable on each iteration. One might think that such a simple program would not perform significantly differently on two different Unix-like OSes, he said, but in fact one OS was 14% faster than the other (which one, he did not give away, though he joked "if this was your system, wouldn't you want to know?"). But such a one-liner is actually pretty complex to analyze for performance, he said. Between a Linux system and a SmartOS system, there could be different versions of Perl, different compilers used to build Perl, different optimization options used by the compilers, different system libraries, and different background tasks—any combination of which might cause the performance difference. It could also be the kernel: setting the string involves memory I/O; the kernel determines memory placement of the code; the kernel can control CPU clock speed; the kernel could be affected by handling interrupts; the kernel could even cost time by migrating the process to different CPUs.

[Gregg]

The question that customers want answered when trying to correct such a performance difference is where the root of the 14% difference lies. As a performance engineer, he wants to see if he can even trace the difference to its root (which is not always observable), determine if it actually is a difference between the kernels, and then determine if it is fixable. These questions are not easy to answer, he said—in a lot of ways, being asked to compare Linux and SmartOS is like being asked to compare the US and Australia (where Gregg is from): there are so many differences and similarities (small and large) that it is hard to enumerate them all in a meaningful way.

He can list some of the big differences that could impact performance, however. Linux usually has more up-to-date packages, it gets considerably more testing through its larger community, there are more (and better) device drivers available, and it is far more configurable. At a technical level, he cited Linux's read-copy-update (RCU), futexes, and the dynamic tick mechanism as important. Some of SmartOS's positives include its mature Zones virtualization system, the ZFS filesystem, the DTrace tracing framework, the wide array of symbols that are exposed by default for profiling tools, excellent CPU scalability, and the fact that the Solaris kernel is regularly tested and analyzed on "large, large multi-core" machines. In addition, he said, there are many small differences in tunables and features, although these differences (and their impact) tend to change frequently.

Gregg then cautioned the audience that, as he entered into the "what A can learn from B" portion of the talk, the content might not be suitable for those suffering from Not Invented Here syndrome or who are easily trolled.

What Solaris can learn from Linux

What Solaris can learn from Linux distributions included the non-technical differences already touched on, such as the well-stocked and frequently updated package repositories. Solaris is often unfairly blamed for SmartOS performance problems that end up getting traced back to old MySQL or OpenSSL packages, he said.

But discussing the technical lessons occupied most of the session. He cited Linux's likely()/unlikely() mechanism for branch prediction, which turns into compiler hints. Solaris has no equivalent so far and, he noted, if anyone is building the Solaris kernel with profile feedback (which could be even better), he has never encountered it. Linux's tickless kernel also improves performance, he said: Solaris still has a clock() routine, and he occasionally encounters performance problems involving a 10ms latency (the default frequency of clock()) that becomes 1ms latency if he changes the clock() frequency to 1000Hz.

Solaris also swaps entire processes, a feature that has been in the code since the PDP-11/20 days. Back then, swapping processes made sense, when the maximum process size was 64KB. Support for paging was added later, he said, and perhaps it is time to drop process swapping entirely. On the flip side, Solaris has a virtual memory limit, while Linux allows more memory to be allocated than can be stored, relying on the out-of-memory (OOM) killer to free up space. Solaris engineers cannot imagine implementing such a feature, he said—it may be fine for Linux running on phones, the argument goes, but not for servers. It is also a cautionary tale for Solaris, he noted, because a lot of new code does not check for ENOMEM on Linux.

An interesting case is Linux's SLUB allocator. It is a simplified version of Solaris's SLAB allocator, he said, and its improvements seem good enough that Solaris should consider merging it back in. Solaris also lacks a "lazy" translation lookaside buffer (TLB) mode, which on Linux gives noticeable performance improvements over regular TLB mode. Linux's System Activity Report (sar) is another "awesome" feature, he said, with more options and more statistics than its Solaris counterparts—and fewer bugs. Solaris should consider learning from both lazy TLB and sar, he said—although he noted that, internally, lazy TLB was a "war starter" among Solaris engineers.

Lastly, Gregg noted that although Solaris Zones were a mature and reliable virtualization mechanism, Zones can only run one kernel. KVM, on the other hand, can run multiple guest OSes. The Joyent team had ported KVM to the Illumos kernel, he said, "so Solaris is already learning from Linux," but Oracle has not merged KVM in upstream.

What Linux can learn from Solaris

There are also quite a few things Linux can learn from Solaris, Gregg continued, both in terms of things to do and of things not to do. The ZFS filesystem is great, he said; "it has more performance features than you can shake a stick at." And Linux has learned from it, although license incompatibility means it cannot be merged in directly. But Btrfs and the ZFS on Linux project are doing well. Similarly, Solaris's Zones virtualization is high-performance, and in recent years Linux has picked up a lot of the same concepts for itself, like LXC containers, control groups, and Docker.

A "cautionary tale" from Solaris is STREAMS, the kernel messaging module first introduced in the "rarely-discussed" Unix 8th Edition. Solaris utilizes STREAMS for its TCP/IP stack, which resulted in poor performance that Gregg said was responsible for many of the "Slowlaris" jokes of years past.

On the other hand, he said, Solaris is much easier to analyze for performance problems because compilers on Linux strip out symbols by default. Thus, profiler output is usually filled with inscrutable hex codes. Similarly, compilers drop frame pointers, so stacks are hard to profile. Those who care about performance should "stop the madness," he said, and use options like -fno-omit-frame-pointer. Similarly, prstat -mLc on Solaris provides excellent thread-state analysis. There is no microstate accounting in Linux, he said, which makes analysis more difficult. Linux could learn from Solaris's tooling, perhaps adding more features to htop. SmartOS (although not upstream Solaris) also has a virtual filesystem iostat tool called vfsstat that can reveal lock contention, resource control throttling, and other discrepancies between what a process asks from the VFS system and what performance it ultimately sees.

Arguably the biggest performance-analysis tool Solaris has going for it is DTrace, which is programmable, real-time, and supports both dynamic and static tracing. It can solve "virtually any" performance issue, he said, and it is reliable enough to run on production systems. There are now two Linux implementations of DTrace, of course, but Gregg argued that the biggest lesson Linux needs to learn from Solaris's DTrace success is that "production safety is feature number one." DTrace needs to be free from risk of freezes or kernel panics, he said, and be an everyday tool like top.

Several other projects may offer similar functionality, he said, such as perf events and ktap, although none is quite as ready. perf_events is not programmable, he said; ktap looks impressive so far, but not all of its features are ready for production yet. SystemTap also looks impressive and is the most feature-filled of the options, although he has found it problematic to use on any systems other than Red Hat (although in fairness, he said, Red Hat is developing it, so that is the developer's focus). Finally, he pointed out LTTng. He apologized, however, that he has not had time to properly use LTTng yet, so he could not offer an informed opinion.

Gregg also directed some words of advice to Oracle in particular. He finds DTrace to be one of Solaris's greatest strengths, he said, but Oracle's Solaris team needs to learn that all dynamic tracing is crippled without source code. Oracle can hand customers some scripts to execute, but the customers cannot write their own. If DTrace4Linux achieves feature parity, he said, it will be better than DTrace on Oracle Solaris.

As he wrapped up, Gregg noted that Solaris also has one other lesson to teach Linux: the value of a culture that demands performance. Solaris has long had good performance analysis tools because it was popular with high-paying customers who demanded answers; out of necessity Solaris adapted to be able to provide them. Perhaps Linux needs the same motivator. Too often, he said, Linux performance is debugged with top, strace, and tcpdump—that leaves too many areas uncovered.

As to the ultimate question everyone wants a simple answer to (which is faster, Solaris or Linux?), Gregg called it a crapshoot. Everyone asks about out-of-the-box performance, he said, but out of the box he routinely sees performance differences between Linux and SmartOS systems from 5% to 500%. More importantly, "out of the box" is an irrelevant question. What matters is the performance that you see on your own system, and your ability to tune it until you are satisfied.


(Log in to post comments)

What Linux and Solaris can learn from each other

Posted Feb 27, 2014 10:21 UTC (Thu) by renox (subscriber, #23785) [Link]

I've read his slides and I was really impressed by the quality of his presentation, thanks for making this nice article on it.

What Linux and Solaris can learn from each other

Posted Feb 27, 2014 12:23 UTC (Thu) by mabshoff (guest, #86444) [Link]

Seconded, of all the illumos people he always came away the most realistic about its chances competing against Linux and FreeBSD. He does great presentations and makes great points about Solaris/illumos as well as Linux regarding strength and weaknesses. Some of the other community members in the illumos community do not reach quite the same level of honesty IMHO.

The one thing I disagree with him is the integration of KVM into the illumos kernel. I cannot fanthom how anybody would come the to conclusion that combining CDDL and GPL V2 code into one piece of work is legal. The FSF and SFLC is pretty adamant that it isn't legal, if you read the OpenSolaris mailing list archives you will see that then Sun would absolutely not touch GPL V2 kernel code from Linux like fuse.

Even Oracle seems to have some reservations about the legality of its DTrace port to Linux with portions of it GPL V2 while the DTrace module remains CDDL. While you can argue along the lines of the OpenAFS kernel module in that case, the reverse, i.e. merging KVM code under the GPL V2 into illumos, seems to be a GPL violation IMHO. IANAL and all that and hope to avoid setting off an epic licensing flame war, but we will see how that goes.

Cheers,

Michael

What Linux and Solaris can learn from each other

Posted Feb 27, 2014 19:39 UTC (Thu) by paulmlewis (subscriber, #72886) [Link]

This LWN article from a ways back has a bit about the licenses

https://lwn.net/Articles/459754/

What Linux and Solaris can learn from each other

Posted Feb 27, 2014 20:20 UTC (Thu) by mabshoff (guest, #86444) [Link]

> This LWN article from a ways back has a bit about the licenses
> https://lwn.net/Articles/459754/

Yeah, I read that one a while back and Cantrill's argument is quite weak, i.e. OpenSolaris' kernel is not a derivative of Linux on one hand and on the other hand they did not use the export_gpl'ed symbols in the kvm code, so there is no derived work and the license does not matter/apply. With that kind of argument I can put any code base together, add some glue code and argue away.

The comment thread on 'More DTrace envy' at [1] sheds some light on Cantrill's position in 2008 regarding DTrace, the CDDL and GPL compatibility and the comment thread quickly devolvs into a flame fest. I used to think he was a decent guy until I learned that it was him uttering the famous 'Have you ever kissed a girl' quib to David Miller [2]. That was actually after I heard him calling the Linux fs development 'clown college' in his IIRC LISA 11 talk [3] which already made me question his judgement. There is more, but I will stop myself since there is no point beating this guy up, he has already more than enough on his plate.

Cheers,

Michael

[1] http://lwn.net/Articles/287906/
[2] https://groups.google.com/forum/#!msg/comp.sys.sun.hardwa...
[3] http://www.youtube.com/watch?v=-zRN7XLCRhc

What Linux and Solaris can learn from each other

Posted Feb 28, 2014 11:29 UTC (Fri) by khim (subscriber, #9252) [Link]

export_gpl is just a hint. You still need full-blown audit to make sure you've not included too much GPLed code (from headetr and other sources) in your module to make it violation of GPL.

What Linux and Solaris can learn from each other

Posted Feb 28, 2014 13:21 UTC (Fri) by mabshoff (guest, #86444) [Link]

> export_gpl is just a hint. You still need full-blown audit to make sure you've not included too much GPLed code (from headetr and other sources) in your module to make it violation of GPL.

Yep, I agree. It is all some magical smokescreen some companies/people put up since the symbol was not exported GPL it is fine to use by some proprietary or incompatibally licensed code. So far no one has 'won' against the GPL in US or German courts and nearly all cases settled, and on top of that the vast majority of maintainers, kernel contributers as well as Linus have been clear as day that proprietary modules are illegal and/or immoral aside from special cases like OpenAFS.

I personally think that the day the KVM port was merged into illumos was the day the project died from a commercial standpoint since no larger company would touch that code base due to potential GPL violation lawsuits. If you look at oloh statistics you have around 10 different people commit code to the illumos kernel down from 120 or so back in the day when Sun still contributed to OpenSolaris. Joyent isn't doing too well either from a commercial stand point, so I think they will just slowly fade away over the next couple years.

Cheers,

Michael

What Linux and Solaris can learn from each other

Posted Mar 3, 2014 12:37 UTC (Mon) by nix (subscriber, #2304) [Link]

A 'full-blown audit' will not help, since this is a matter of law, not code: the only way to tell if you have included 'too much' is to take the thing to court and see what the outcome is. This is generally not an appealing course of action!

What Linux and Solaris can learn from each other

Posted Mar 6, 2014 15:17 UTC (Thu) by kmacleod (guest, #88058) [Link]

You're looking at GPL or CDDL source code leaking into the other.

It's simpler and far more blatant than that.

SmartOS combines the work of illumos (CDDL) and KVM (GPLv2) and distributes them as a whole, where the work as a whole is not on the terms of the GPL.

What Linux and Solaris can learn from each other

Posted Mar 6, 2014 21:49 UTC (Thu) by khim (subscriber, #9252) [Link]

This still can be declared a "mere aggregation" case if, e.g., the KVM part can actually be used with Linux kernel, not just with Solaris kernel. This still will be extremely tricky case and I can understand why companies will not want to deal with that mess, but it's still possible that the end result is legit.

What Linux and Solaris can learn from each other

Posted Mar 7, 2014 14:20 UTC (Fri) by kmacleod (guest, #88058) [Link]

"Mere aggregation" doesn't apply in this case, it's not just sharing the distribution medium.

The illumos KVM module is built exclusively for an illumos kernel ("Building illumos KVM requires several recent additions to illumos"[1]) and the SmartOS distribution combines the KVM kernel module with the illumos kernel as part of its featured product claims.

This is no different than building a GNU Readline library (GPL-only, not LGPL), building other packages against it, distributing the combined whole, and then claiming that it's "mere aggregation" that the GNU Readline dynamically-loaded library happens to be on the same distribution medium as the programs that link against it at runtime. The distributor is required to distribute the combined whole of GNU Readline+readline programs under the terms of the GPL, or not ship GNU Readline.

[1] https://github.com/joyent/illumos-kvm , Divergences from KVM

What Linux and Solaris can learn from each other

Posted Mar 7, 2014 17:50 UTC (Fri) by sfeam (subscriber, #2841) [Link]

[linking to readline] - Many people do indeed claim that it is permissible to distributed non-GPL binaries that link to libreadline or other GPL dynamically-loaded libraries. The FSF doesn't agree, but that can be attributed as much to wishful thinking as to reasoned analysis. The argument does not necessarily hinge on "mere aggregation" however. If the act of linking to a shared library does not create a derived work then the question of aggregation is moot. You are on stronger ground when you argue that the application in question is "built exclusively for ..." such linkage, but that comes down arguing about individual cases rather than a general rule that you cannot freely link to dynamically loaded GPL code.

What Linux and Solaris can learn from each other

Posted Mar 7, 2014 18:22 UTC (Fri) by khim (subscriber, #9252) [Link]

You are mixing two significantly different cases:
1. A case where you distribute your progarm (linked with libreadline or otherwise) separately from libreadline itself.
2. A case where you distribute your program and libreadline as a ready-to-use bundle.

In the latter case the compilation (compilation in copyright terms) of the libreadline and your program is quite obviosly derived from both libreadline and your program thus only “mere aggregation” claim is under discussion. In the former case situation is much less clear because a thing which is distributed does not include any GPL-licensed code and thus it's not clear if GPL is in play or not (you need to dig much deeper to determine if process which produced binary have contaminated your binary with GPL-licensed code or not).

SmartOS distributes both KVM and Solaris-derived kernel which means that it must rely on "mere aggregation" case and it's hard to do that when modules were specifically altered to work together.

I kind of assumed that KVM module is not actually distributed by SmartOS but must be installed separately (similarly to how nVidia does it with it's drivers), but if, indeed, they just ship binaries for both pieces and both pieces are altered to work together then it's quite a sizable legal mine in the foundation.

What Linux and Solaris can learn from each other

Posted Mar 7, 2014 20:42 UTC (Fri) by vonbrand (guest, #4458) [Link]

I believe you are mixing up the licences on the pieces and the license on the collection. I.e, if everybody here writes a story, and our esteemed editor collects them all to publish, each of us has a right on our story, and he has a right on the collection (selection, order, ...), but not on the stories.

And the readline case is further murkied by the editline BSD clone of the library, so the claim that any program using readline is derived from it is moot.

What Linux and Solaris can learn from each other

Posted Mar 7, 2014 21:11 UTC (Fri) by khim (subscriber, #9252) [Link]

That's true but you miss subtler difference. When someone writes a story using settings of a different story then it may be legal or not, it's tricky question. But if your story is included in large compilation then your license must be compatible with all other licenses from all authors of said compilation. For example if Alice and Bob had an affair and then a fallout then Alice may very well pass the rights to Carol with an attachment “it's Ok to publish that story in any collection except if Bob's stories are also there”. Note that Bob's stories can be absolutely original and totally unrelated to Alice's stories. Yet if the only license Carol has is that “I can't stand Bob anymore” license then she could not combine these two stories. GPL license is like that: it states that all the source for all the code in the program (complete corresponding machine-readable source code) must be distributed under GPL-compatible terms. But it puts a stipulation: “mere aggregation of another work not based on the Program” is unrestricted. If you distribute GPLed code and CDDLed code together then of course the compilation if derived work from both pieces! Yes, there are additional license for the compilation, sure, but first and foremost licenses for all parts of the compilation must be compatible!

What Linux and Solaris can learn from each other

Posted Feb 27, 2014 23:11 UTC (Thu) by rodgerd (guest, #58896) [Link]

I'd have been more impressed if he hadn't resorted to some nonsense early on, characterising Solaris as "running on mainframes" and "scaling to many CPUs" around slide 40.

I run Linux on *actual* mainframes. I know the Sine Nomina guys have run up some old copies of OpenSolaris on mainframe hardware. But that's actual mainframes, not E- and M- class boxes.

And as far as CPUs go, well, I've yet to see a shipping Solaris box with as large a single system image as SGI have been running Linux on for years.

Also, given my experience with Sun support and a chunk of the Sun sysadmin community, I find it hard to credit the notion around slide 100 than there's a community culture of rigor and excellence around SunOS/Solaris that doesn't exsit elsewhere. I work in the Australasian region, so maybe that explains the difference, though.

What Linux and Solaris can learn from each other

Posted Feb 28, 2014 0:38 UTC (Fri) by davecb (subscriber, #1574) [Link]

I've worked for multiple years each on both Linux-on-system-390 and Solaris-on-M-series, and am impressed with both. For both raw performance and price-performance, though, Ms and to a degree the later Ts win out.

I think that's an economics, question, though!

The article and I agree: both can benefit from cross-fertilization at the technical level.

--dave

What Linux and Solaris can learn from each other

Posted Feb 28, 2014 13:34 UTC (Fri) by mabshoff (guest, #86444) [Link]

> I'd have been more impressed if he hadn't resorted to some nonsense early on, characterising Solaris as "running on mainframes" and "scaling to many CPUs" around slide 40.

Even Sun compared their high end gear to Mainframes and most people not drinking the Sun coolaid just rolled their eyes.

> I run Linux on *actual* mainframes. I know the Sine Nomina guys have run up some old copies of OpenSolaris on mainframe hardware. But that's actual mainframes, not E- and M- class boxes.

Hehe, that port like all the others (arm for example) all went away quickly and aside from some single people trying to keep illumos alive on Sparc it is all x86-64 these days. And even there it is only Intel since the illumos KVM port was never ported to AMD's virtualization extension. That tells you something about the depth of their community right there.

> And as far as CPUs go, well, I've yet to see a shipping Solaris box with as large a single system image as SGI have been running Linux on for years.

You should not mention those pesky facts either on the register or on any sun and/or OpenSolaris focused mailing list since people will incorreclty claim that those 6.1k CPU monsters are clusters and so not comparable to all the Sun goodness, even though they are all NUMA machines. And those huge SGI monsters are deployed in those large configs in the field judging from patches and problem reports you see on lkml.

> Also, given my experience with Sun support and a chunk of the Sun sysadmin community, I find it hard to credit the notion around slide 100 than there's a community culture of rigor and excellence around SunOS/Solaris that doesn't exsit elsewhere. I work in the Australasian region, so maybe that explains the difference, though.

Another one of those Sun coolaid factors IMHO, i.e. many people in the illumos community can to this day not accept that Linux stole Solaris' lunch despite it being inferior back in the day. Today I think it is pretty clear that the technological advantage depends on which aspect of the system you depend on, i.e. I would prefer ZFS over btrfs if I had to deploy some storage box today, but give it a couple more years and that will also change. But then again, Solaris 11.1 had some serious bug in the dedup code IIRC that required a patch that you had to had a service contract for and those aren't free. So running Solaris without a service contract would make me rather queasy.

Cheers,

Michael

What Linux and Solaris can learn from each other

Posted Feb 28, 2014 15:10 UTC (Fri) by raven667 (subscriber, #5198) [Link]

Well, that's all well and good but lets not be easily trolled and forget that the point isn't Linux vs. Solaris advocacy but how to chase down performance and diagnostic information on different systems. If you look at Brandon Gregg's website there is a bunch of information on the Linux tools and how they relate to the system, one gets the impression that he is equally competent on both systems, even though he had a hand in designing DTrace on Solaris, because performance tuning is about the process and not some magic bullet tool. He has a page on his site about how to apply process to isolate performance issues by analyzing the computer schematic for one of the Apollo vehicle computer systems, something he knew nothing about, to show how the process can be used to bisect problems without previous competency in the particular system you are working on.

What Linux and Solaris can learn from each other

Posted Feb 28, 2014 18:26 UTC (Fri) by mabshoff (guest, #86444) [Link]

> Well, that's all well and good but lets not be easily trolled and forget that the point isn't Linux vs. Solaris advocacy but how to chase down performance and diagnostic information on different systems.

I do not think that was Brandon's intend to troll anybody by those slides mentioning mainframes or 'community excellence.' It is just a standard slide thrown into many of the presentations of various illumos community members and I more or less filter that out when I read their slides. As mentioned down thread I do think Brandon is doing an excellent job and is quite fair to Linux even though his first love is Solaris ;).

Cheers,

Michael

What Linux and Solaris can learn from each other

Posted Feb 28, 2014 19:45 UTC (Fri) by raven667 (subscriber, #5198) [Link]

I don't think it was his intent to troll anyone either, it's just that the subject matter invites everyone to chime in with their "Linux is faster" or "Solaris is faster", "Slowlaris is slow", "Linux ate my baby", etc. etc. comments which are distracting digressions.

What Linux and Solaris can learn from each other

Posted Mar 1, 2014 6:01 UTC (Sat) by k8to (subscriber, #15413) [Link]

I view it as an accidental troll.

What Linux and Solaris can learn from each other

Posted Feb 28, 2014 15:15 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

>Another one of those Sun coolaid factors IMHO, i.e. many people in the illumos community can to this day not accept that Linux stole Solaris' lunch despite it being inferior back in the day.

Linux wasn't even _inferior_ in some areas. Everyone should read the famous: http://cryptnet.net/mirrors/texts/kissedagirl.html

What Linux and Solaris can learn from each other

Posted Feb 28, 2014 18:40 UTC (Fri) by mabshoff (guest, #86444) [Link]

> Linux wasn't even _inferior_ in some areas. Everyone should read the famous: http://cryptnet.net/mirrors/texts/kissedagirl.html

Funnily enough I linked that one exchange earlier and I probably should have made myself clearer by either putting inferior in quotes or making the point that it was the perspective of most of the Solaris engineers at Sun as well as Sun sysadmins back in the wild 90s before the dot.com crash. To shine some light on that here is my favourite quote from Valerie Aurora's blog [1] discussing Sun's intenal culture and contrasting it with the 'real' world:

“Run Solaris on my desktop? Are you f—ing kidding me?“

You should really read that blog post, it makes me smile and recall my contact with my former coworkers back at university when I worked tech support in a math department that spend a whole lot of money on Sun hardware to get not too great performance compared to the Linux boxen we also had. I started teasing them around 2003 that Sun would be bought by IBM since Sun's stock had fallen so much and they were not exactly thrilled by those jokes. I am not sure what they did when by a cat's whisker Sun was bought by Oracle instead of IBM a few years later, but I gather they are less than thrilled.

Cheers,

Michael

[1] http://blog.valerieaurora.org/2010/02/13/sleeping-with-th...

What Linux and Solaris can learn from each other

Posted Feb 28, 2014 17:58 UTC (Fri) by vonbrand (guest, #4458) [Link]

We migrated from Solaris to Linux back in 2000. Linux was so much fastr it wasn't funny. And the reason for migrating was a gaping security hole, unfixed for years.

What Linux and Solaris can learn from each other

Posted Mar 3, 2014 5:31 UTC (Mon) by dlang (subscriber, #313) [Link]

back in the day (around 2000) I really annoyed my Sun sales rep when he notices a redhat sticker on a Sun sparc box in my datacenter and asked about it. I demonstrated that for the monitoring task it was doing (mrtg) it was _much_ faster running Linux than it was running Solaris.

I never looked back.

What Linux and Solaris can learn from each other

Posted Feb 27, 2014 19:16 UTC (Thu) by madscientist (subscriber, #16861) [Link]

I suspect the frame pointer thing is a result of Linux's roots in older 32bit Intel chips, which are notoriously register-poor, vs. Solaris's roots in SPARC where that wasn't such a concern. Having an extra register available to the compiler on register-poor architectures can make a big difference in some situations.

I would agree that it would be nice if people building 64bit Intel applications did not omit frame pointers by default; I doubt that the performance difference there is that noticeable.

What Linux and Solaris can learn from each other

Posted Feb 28, 2014 11:26 UTC (Fri) by khim (subscriber, #9252) [Link]

Such explanation sounds logical but then you recall that i386 mode (with it's 8 registers) by default uses frame pointers while x86-64 mode (with 16 registers) does not use them (unless alloca is involved) and it suddenly becomes much less believable.

What Linux and Solaris can learn from each other

Posted Mar 3, 2014 12:33 UTC (Mon) by nix (subscriber, #2304) [Link]

The explanation is correct, nonetheless (though the proof of such is scattered in various bits of the GCC list archives).

Brendan is quite right that "Solaris is much easier to analyze for performance problems because compilers on Linux strip out symbols by default". The lack of rtld_db is also painful: if you want to reliably look up symbols in a ptrace()d child (handling things like symbol interposition and dlmopen() properly) you have to dig painfully into glibc internals that are not ABI-stable and in fact change offset whenever support for a new ELF dynamic type is added to glibc. Rather than using rtld_db, Roland suggests using a Python GDB plugin shipped with glibc to dig inside those internals: this is great iff the ptracer happens to be GDB but useless for everything else.

Solaris also has an .ldynsym non-loaded section which contains the symbol table for ELF executables and is not stripped by strip(1). With that, tracers can give backtraces for ELF executables that are as useful as those available for shared libraries. It costs only a few KiB: I'm appalled that Linux doesn't have it. (Yet. I may have to add it and try to get it upstream...)

What Linux and Solaris can learn from each other

Posted Mar 4, 2014 15:58 UTC (Tue) by fuhchee (guest, #40059) [Link]

"Solaris also has an .ldynsym non-loaded section which contains the symbol table for ELF executables and is not stripped by strip(1). [...] I'm appalled that Linux doesn't have it."

See https://fedoraproject.org/wiki/Features/MiniDebugInfo and https://sourceware.org/gdb/onlinedocs/gdb/MiniDebugInfo.html for information about the .gnu_debugdata bits in executables. (systemtap supports it too.)

What Linux and Solaris can learn from each other

Posted Mar 4, 2014 19:32 UTC (Tue) by dlang (subscriber, #313) [Link]

actually, my reaction to this was that I'm appalled that strip leaves such garbage still in the file by default :-)

It may be that the cost is worth it for you, but it should be a parameter to strip to leave it on rather than the default.

sounds like a bug in the solaris strip program to me ;-)

What Linux and Solaris can learn from each other

Posted Mar 6, 2014 11:44 UTC (Thu) by nix (subscriber, #2304) [Link]

The cost is *tiny*: the section isn't loaded except by debuggers, all the big stuff is gone, almost all you have is a symbol table, which is in any case present in all your shared libraries anyway, and I haven't seen you protesting about that. Plus it's LZMA-compressed! (Actually it's an LZMA-compressed ELF executable, which seems annoying to use to me, but I guess makes it possible to extend the set of debugging information available without going through the trouble of defining a new ELF section every time.)

A random program I just plucked out: groff. 98KiB unstripped debuginfo: <4KiB debugdata. 100KiB program. A few percent, and that's only in executables, not shared libraries, and is only disk space: the memory consumption of this section is zero. Every new release of every distro grows by much more than that anyway, and that's mostly stuff which *is* loaded. Because it's LZMA-compressed even C++ programs don't have huge debugdata: the huge repetitive symbol names get very heavily scrunched by LZMA.

The presence of this section, and its preservation by strip(1), is *not* a bug: it's the whole reason for the section's existence in the first place. Dynamic, pervasive tracing and bug-reporting tools *need* function name info out of a symbol table for useful backtraces. Having that for shared libraries but not executables is a useless historical wart that is better off gone. Hiving it off into the debuginfo packages is useless: you don't want to be forced to download a new package whenever abrt notes a program crash, or SystemTap or DTrace ustack() fires for some process: in the latter case having a package download at that point would add utterly intolerable latency to the tracing process. It's supposed to be noninvasive, remember? Installing a package on the fly is anything but!

Annoying though the representation of .gnu_debugdata is, it is there, which is gratifying. I plan/hope to implement support for this in Oracle's DTrace for Linux (though it obviously won't do anything visible unless the executables in the distro have support for it). Executable backtraces at last, without having to go through the nearly-impossible task of implementing support for a Sun invention like .ldynsym and pushing it upstream :)

What Linux and Solaris can learn from each other

Posted Mar 6, 2014 13:41 UTC (Thu) by dlang (subscriber, #313) [Link]

I don't doubt that it's useful, but I'm also sure that if you didn't strip the binary at all you would have even more useful data, and with current storage, unstripped binaries wouldn't be _that_ much larger anyway.

My argument is that strip is supposed to remove all such debugging data from the binary, so having something there that it misses is a bug.

I have no problem with leaving this data there (unless I'm trying to cram the entire distro into the flash on a router, then I want to remove every byte I can to fit into the 16MB flash), I just am saying that having debugging data missed by strip entirely is strip not doing it's advertized job.

think about tools that remove metadata from text documents or PDFs, all that data is there because it's useful under some conditions, but the purpose of the tool is to sanitize the files by removing that data. If there is another copy of some of the data that's not removed, that's a bug in the tool, no matter how useful that data can be under the right conditions.

What Linux and Solaris can learn from each other

Posted Mar 7, 2014 11:42 UTC (Fri) by nix (subscriber, #2304) [Link]

strip was, when it was written, supposed to remove all debugging data from the binary. Times have changed: disk space is much cheaper, shared libraries (with their symbol tables) have become pervasive without anyone complaining about the space used by the symbol tables, and many tools want to be able to map addresses to symbols in executables too, without requiring users to take special steps like installing (enormous) debuginfo packages.

If you're protesting about this, why aren't you also protesting that strip doesn't remove the symbol table from shared libraries, and the exception info too? After all, the exception unwinding info is DWARF2, which is a format meant for debugging -- it just so happens that part of the C++ runtime library can use it to identify things that need destructors calling on them. Why does that make it 'not debugging data', while abrt and systemtap and dtrace using .gnu_debugdata remains 'debugging data'?

(This is not, practically, the equivalent of PDF metadata. If your function names are secret you're in trouble anyway the moment you make your secret sauce into a shared library, or use __func__.)

Obviously if you have insane space constraints on embedded platforms on which no pervasive tracers are expected to run you will probably build your own toolchain, and you will not build in this section. But for 99.99% of uses, .gnu_debugdata seems like an excellent addition.

What Linux and Solaris can learn from each other

Posted Mar 7, 2014 15:30 UTC (Fri) by raven667 (subscriber, #5198) [Link]

I don't think he was disagreeing with you that this information was useful, he was just surprised that strip didn't remove it. As you point out, the space constraints are different now, maybe stripping binaries at all just doesn't make as much sense as it used to, at least for some use cases, embedded and VM servers probably still benefit from minimal disk usage because every byte of storage costs money, desktops and stand alone servers less so.

What Linux and Solaris can learn from each other

Posted Mar 7, 2014 20:17 UTC (Fri) by dlang (subscriber, #313) [Link]

exactly.

I am not at all opposed to distros choosing not to strip binaries, or to only partially strip binaries.

I am opposed to adding debug info with the intention to make it non-stipable.

instead of doing this, modify strip. Make a default run of strip not remove everything, but have a --really-strip option that removes everything for those who want it.

While disk space is less of an issue, doing deep debugging of systems is also far less common (especially as a percentage of the systems running)

What Linux and Solaris can learn from each other

Posted Mar 7, 2014 20:21 UTC (Fri) by fuhchee (guest, #40059) [Link]

strip(1) already has several levels of operation. These were not changed for the minidebuginfo work. Instead, the new stuff works by stripping out all the debug/symbol info into a separate file, then using objcopy to add the special compressed section back.

What Linux and Solaris can learn from each other

Posted Mar 8, 2014 22:16 UTC (Sat) by nix (subscriber, #2304) [Link]

That's kind of a weird way for it to work, but I suppose if you're trying to get an LZMA-compressed ELF file (rather than just an extra symbol table, as Solaris does) that's probably the simplest approach.

(It is clear that the section is useless if it gets stripped into the debuginfo packages: the thing's only worth having if the 'average executable' has it.)

What Linux and Solaris can learn from each other

Posted Mar 6, 2014 11:45 UTC (Thu) by nix (subscriber, #2304) [Link]

So the question now becomes, how did I miss this going past on the binutils list? Clearly I read too fast and skipped over it.

Thanks for the heads-up!

What Linux and Solaris can learn from each other

Posted Mar 4, 2014 15:52 UTC (Tue) by fuhchee (guest, #40059) [Link]

What Brendan is saying is that we should slow down our operating system, because some tools are incapable of doing proper backtraces.

I don't buy that.

What Linux and Solaris can learn from each other

Posted Feb 28, 2014 2:03 UTC (Fri) by NightMonkey (subscriber, #23051) [Link]

I do think he should upgrade his Linux toolset to properly compare. Tools like 'atop' and 'dstat' (vs. vmstat) and even collectd (vs. SYSTAT) are really much better than top.

Cheers.

What Linux and Solaris can learn from each other

Posted Mar 1, 2014 0:43 UTC (Sat) by marcH (subscriber, #57642) [Link]

> Gregg began his comparison by showing a one-line Perl script that looped 100,000,000 times, setting a string variable on each iteration. One might think that such a simple program would not perform significantly differently on two different Unix-like OSes, he said, but in fact one OS was 14% faster than the other [...]. But such a one-liner is actually pretty complex to analyze for performance, he said.

Indeed, it's actually surprising that the performance difference is that small considering the huge number of moving parts involved in such a high level language example like this.

This example is interesting but a bit artificial. I mean, if a single line of Perl running 100,000,000 times in a loop is actually the main bottleneck of your application, then it's very likely to be very soon "re-designed" into something completely different from a performance perspective.

I do realize this example is just used as an introduction and not much more.

What Linux and Solaris can learn from each other

Posted Mar 7, 2014 4:51 UTC (Fri) by acoopersmith (subscriber, #72107) [Link]


Copyright © 2014, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds