Kernel prepatch 4.0-rc7

[Posted April 7, 2015 by corbet]

Linus has released 4.0-rc7 after a delay of a couple of days for the holiday. "But it's still pretty small, and things are on track for 4.0 next weekend. There's a tiny chance that I'll decide to delay 4.0 by a week just because I'm traveling the week after, and I might want to avoid opening the merge window. We'll see how I feel about it next weekend."

Kernel prepatch 4.0-rc7

Posted Apr 7, 2015 15:05 UTC (Tue) by kloczek (guest, #6391) [Link] (63 responses)

Sees everything is no upside down.

Oracle in latest SRUs released quite heavy bag of ZFS improvements (which should be released IMO as mayor release or should be released as next GA) and Linux in next 4.0 version seems will release only hand of minor cleanups/bug fixes/improvements.

O tempora! O mores!

Looking on number of significant changes in Solaris I'm sure about Solaris future but Linux seems to be more and more weakening and/or evolving to OS working inside KVM cage or as KVM manager without any significant new technological bits (specially on storage area).
As core Linux developers are more older still there is no as well clear idea how to pass current work on Linux to next generation of developers.
We cannot as well count on Torvalds dynasty as Linux has only daughters ;)

Sooner or later it will cause major Linux fork or decline caused by taking over more and more market shares by other OSes.
Really sad conclusion ..

Kernel prepatch 4.0-rc7

Posted Apr 7, 2015 15:42 UTC (Tue) by dskoll (subscriber, #1630) [Link] (12 responses)

We cannot as well count on Torvalds dynasty as Linux has only daughters ;)

Umm, WTF? I know you put in a smiley, but that's pretty misogynistic.

Kernel prepatch 4.0-rc7

Posted Apr 7, 2015 15:48 UTC (Tue) by fandom (subscriber, #4028) [Link] (9 responses)

You mean that the rest of the comment made sense?

Kernel prepatch 4.0-rc7

Posted Apr 7, 2015 18:29 UTC (Tue) by alvieboy (guest, #51617) [Link] (8 responses)

I read comment four times, and I still cannot make any sense of it.

What I extract from it is that our fellow commenter believes that age plays an important role in our reasoning and ability to improve things, and as age advances we become less able to do this (and other) sort of technical work.

I could not disagree more.

I can give you a quick example from another area: Manoel de Oliveira, a Portuguese filmmaker, passed away 2th April 2015. He was 106 years old, and was the oldest active filmmaker in the world. We has still making movies, and had plans for quite a few more. He was very appreciated in the area.
According to you, he should have stopped while aged 40 or 50, which is actually less than half of his life.
That is stupid.

Kernel prepatch 4.0-rc7

Posted Apr 7, 2015 23:02 UTC (Tue) by kloczek (guest, #6391) [Link] (7 responses)

> According to you, he should have stopped while aged 40 or 50, which is actually less than half of his life.
> That is stupid

Of course that this is stupid and did not wrote anything which is enough close to this.
However as Linux history starts 25 years ago now problem of next generation of Linux *core* developers will be really more obvious.

PS. Problems with using analogies is that that they cannot be used straight on proving something. Analogy can be used only to change for example point of view of some aspects in discussion.
And yes .. exceptions exists. However I've not been writing about some Linux isolated cases but about something like general problem standing on future Linux development path.

Kernel prepatch 4.0-rc7

Posted Apr 7, 2015 23:19 UTC (Tue) by viro (subscriber, #7872) [Link] (6 responses)

*sniff* I smell a bored sociology wank^Wmajor

If you want to contribute and _have_ something to contribute - go ahead and do that, to whichever project you want. If not, who the hell needs you or your opinion, pardon the bluntness? It's like "sociology of science" clowns - nice for party small talk (Feyerabend, blah, blah, post-normal, wank, wank, paradigm shift, yadda, yadda...) and for excusing one's, er, devotion to academical politics instead of research, absolutely worthless for everything else...

Kernel prepatch 4.0-rc7

Posted Apr 8, 2015 2:43 UTC (Wed) by kloczek (guest, #6391) [Link] (5 responses)

> If you want to contribute and _have_ something to contribute - go ahead and do that, to whichever project you want.

Sorry but I have no spare few year (or more) of my life to spend on developing something so big like ZFS or DTrace.

Most of the Linux developers have no enough time because as full time developers they are working on own JFDIs.
I call it "running with empty barrels syndrome". Most of the Linux developers are so busy that they have no time to start working on long term project which may (on may not) produce some new technologies.
This is main reason why Linux it is the system which is trying only catch up with changes around and almost none of new technologies in more than last decade has been developed on top of the Linux. Seems only exception is KVM.

Kernel prepatch 4.0-rc7

Posted Apr 8, 2015 18:33 UTC (Wed) by flussence (guest, #85566) [Link] (4 responses)

>This is main reason why Linux it is the system which is trying only catch up with changes around and almost none of new technologies in more than last decade has been developed on top of the Linux.

Linux isn't trying to chase dead businesses and no amount of astroturfing will resurrect them.

Kernel prepatch 4.0-rc7

Posted Apr 8, 2015 23:09 UTC (Wed) by kloczek (guest, #6391) [Link] (3 responses)

> Linux isn't trying to chase dead businesses and no amount of astroturfing will resurrect them.

Really?
So why so many man/hours have been spent on developing KTrace, LLT, LTTng, SystemTap an now perf?
Why people started working on btrfs trying to mimic some ZFS abilities?
What about containerization which trying to mimic Solaris zones?

I'm betting that probably in next year or two some Linux developers will discover something which will be similar to kernel zones.

Whole network layer redesign done in Sol10 is delivered on Linux in pieces making everything more or less similar to pile of yeast.

Kernel prepatch 4.0-rc7

Posted Apr 8, 2015 23:28 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (2 responses)

> So why so many man/hours have been spent on developing KTrace, LLT, LTTng, SystemTap an now perf?
Because of a recognized need to get some kind of tracing/sampling solution?

> Why people started working on btrfs trying to mimic some ZFS abilities?
Because people realized that Linux needs a good modern FS? You might also note that BTRFS design is significantly better.

> What about containerization which trying to mimic Solaris zones?
What about Solaris zones that mimic IBM zOS partitions?

Virtualization has a loooooong history, it's far older than SunOS itself.

Kernel prepatch 4.0-rc7

Posted Apr 9, 2015 0:27 UTC (Thu) by kloczek (guest, #6391) [Link] (1 responses)

> Because of a recognized need to get some kind of tracing/sampling solution?

Really? How it is possible that something like DTrace is needed on Linux?

BTW: seems at the end of this year Oracle will offer Oracle Linux with fully integrated DTrace in @base KickStart profile.
Interesting what RedHat will tell on this? :)

> Because of a recognized need to get some kind of tracing/sampling solution?

Really better?
As long as btrfs will not be redesign to use free lists instead allocation structures it will be not possible to be enough close to ZFS.
Try to read https://rudd-o.com/linux-and-free-software/ways-in-which-...

> What about Solaris zones that mimic IBM zOS partitions?

IBM zOS partitions (LPARs) is more similar to LDOMs on T and M USparc hardware and has nothing to do with zones.
LPARs need hardware support. Solaris kernel zones can work with completely without HW support (in last few months has been introduced kernel zones visualization using storage controllers hardware visualization).
BTW: when Linux ill be able to such things?

BTW: with LDOMs you can partition CPU cache (something which is not possible to have on Intel HW .. so far)

> Virtualization has a loooooong history, it's far older than SunOS itself.

Of course .. but LDOMs and LPARs bits are not about visualization but more about partitioning.
Zones as well are not about visualization but more about separation and limiting privileges .. and partitioning modern quite power full HW.

Kernel prepatch 4.0-rc7

Posted Apr 9, 2015 1:35 UTC (Thu) by kloczek (guest, #6391) [Link]

Ehhh I must apologize .. it is to late and I'm doing to many typos per sentence :->
Good night :-)

Kernel prepatch 4.0-rc7

Posted Apr 7, 2015 16:53 UTC (Tue) by kloczek (guest, #6391) [Link] (1 responses)

Maybe you didn't notice that it was kind of soft joke as I've added ";)"

Problem is not in almost pure male Linux core development community but in lack of will to solve growing problem of adding new blood to the Linux Top SAS (Self Adoration Society :))

Kernel prepatch 4.0-rc7

Posted Apr 7, 2015 18:16 UTC (Tue) by scientes (guest, #83068) [Link]

His daughters will not be kernel programmers. He jokes "I will have to adopt." Source:

https://www.youtube.com/watch?v=5PmHRSeA2c8

Kernel prepatch 4.0-rc7

Posted Apr 7, 2015 21:14 UTC (Tue) by nix (subscriber, #2304) [Link] (43 responses)

Yeah. Nothing new's being done in the storage area for Linux, as the existence of *two entire conferences* devoted to the new stuff being done in the storage area for Linux *in the last month alone* makes quite obvious.

Have you been asleep for, I don't know, the last ten years or something? Or perhaps 500 years, since you seem to have dropped out of a Europe still shackled to the Salic law...

Kernel prepatch 4.0-rc7

Posted Apr 7, 2015 22:34 UTC (Tue) by kloczek (guest, #6391) [Link] (42 responses)

OK. Let's assume for few minutes that I just wake up.

Could you please write short list what these two conferences announced as new Linux features?

Before you will send your list just please compare weight of your bag with points on https://blogs.oracle.com/roch/entry/rearc

Kernel prepatch 4.0-rc7

Posted Apr 8, 2015 7:00 UTC (Wed) by niner (subscriber, #26151) [Link] (1 responses)

So, they fixed some architectural problems with ZFS's caching infrastructure. But where's the innovation? What are the new features?

Kernel prepatch 4.0-rc7

Posted Apr 8, 2015 9:44 UTC (Wed) by kloczek (guest, #6391) [Link]

With latest SRUs ARC is able to keep records in compressed form and it causes that effectively your memory dedicated to be used by ARC is bigger multiplied by avg compress ratio of your zfs pool data. What it means? You can stay on cheaper hardware with growing needs much longer than before. At the end it generates lower costs.
The same is possible to do with zfs send/receive. For example on making backups you don't need to decompress everything during making off site copy of your data. It saves CPU power, memory and network bandwidth.
Most of these latest improvements opens new areas of using zfs.
And just one clarification: all of these areas are completely not reachable by any Linux technologies.

Kernel prepatch 4.0-rc7

Posted Apr 8, 2015 7:05 UTC (Wed) by jospoortvliet (guest, #33164) [Link] (39 responses)

You really believe the handful of people working on Solaris have a chance in hell to keep up with Linux development? I won't say they don't do something interesting here of there - all the while falling behind everywhere else. That does not have to be a big deal - Solaris has a narrow focus, Linux is everywhere. But that Solaris is good Qt one thing does not mean it is remotely relevant in anything else...

Kernel prepatch 4.0-rc7

Posted Apr 8, 2015 9:15 UTC (Wed) by kloczek (guest, #6391) [Link] (38 responses)

> You really believe the handful of people working on Solaris have a chance in hell to keep up with Linux development?

It is not matter of believe. NFS, DTrace, ZFS .. all this you can use on Linux and all these technologies have been developed not on Linux but on Solaris and you can use them now.

> That does not have to be a big deal - Solaris has a narrow focus, Linux is everywhere

What you mean narrow focus? Please try to compare revenue generated by support of the Solaris and this generated by Linux.

> Linux is everywhere.

That is in really far to even be a partially true.
Probably is everywhere where you are looking.
Linux *dominates* on system with 1-2 CPU sockets but is still fat to be on all these kits. Everything above and below this area is not Linux dominion.

> But that Solaris is good Qt one thing does not mean it is remotely relevant in anything else

Solaris as same as Linux on desktops virtually does not exist. Please show me one company which relays on income generated by supporting Linux desktops.
Linux lost his chance to be desktop systems when ALSA was chosen.

Kernel prepatch 4.0-rc7

Posted Apr 8, 2015 9:41 UTC (Wed) by tao (subscriber, #17563) [Link] (32 responses)

"Everything above and below this area is not Linux dominion."

Out of the top 500 super-computers in the world, 485 (!) of them run Linux. One runs a mix of Linux and something else. One runs Windows. The rest runs AIX.

None of them run Solaris...

Kernel prepatch 4.0-rc7

Posted Apr 8, 2015 10:07 UTC (Wed) by kloczek (guest, #6391) [Link] (31 responses)

> Out of the top 500 super-computers in the world, 485 (!) of them run Linux

These are not supercomputers but HPC clusters using usually horizontally scaled 2 CPU sockets kits. Effectively these supercomputers are running sometimes thousands of Linux/Win/AIX systems.
Even few kits with few hundredths CPUs in single machine cannot run single Linux kernel on all these CPUs. These boxes are working under hypervisor or partitioning software which allows you to run any system inside partition. Effectively these systems are working like set of 1-2 CPU machines with faster inteconnects.

Few months ago on Solaris was fixed bug present on systems with >=32TB RAM (terabytes). Please show me example of running single Linux kernel on kit with such amount of RAM.
Nevertheless number of workloads requiring such big memory in single image system at the moment is very small (we are talking probably about only few hundredths such systems across whole world). Linux is not ready for such scale and so far only system able to run on such-big-ass-kits is Solaris.
On such big scale it is not only problem with Linux but with hardware as well. Intel CPUs cannot be used here not because they have some address space limitations but because memory bandwidth to memory subsystem is to low.
Ty to compare maximum memory bandwidth of biggest Intel CPUs and Sparc M7 CPU.

Kernel prepatch 4.0-rc7

Posted Apr 8, 2015 13:58 UTC (Wed) by zdzichu (subscriber, #17118) [Link] (10 responses)

Here's you example: https://www.sgi.com/products/servers/uv/uv_2000_20.html

Kernel prepatch 4.0-rc7

Posted Apr 8, 2015 22:11 UTC (Wed) by kloczek (guest, #6391) [Link] (9 responses)

This hardware is highly optimized for only very specific HPC workload where application are running on NUMA like architecture and main method of exchanging data between nodes is not by sharing memory for example between threads running on different physical CPUs but exchanging data over UV MPI oriented and optimized interconnect between internal nodes.
Each CPU/memory node has per CPU 8 DIM slots so your application is limited by those 8 DIMMs.

To utilize this hardware you must have MPI oriented application which additionally needs to be recompiled to be linked with delivered by SGI MPI libraries.

So if anyone will probably ask: can I have one of those kits to run my in memory MySQL DB? Answer will be: No.

Kernel prepatch 4.0-rc7

Posted Apr 8, 2015 22:36 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (8 responses)

> So if anyone will probably ask: can I have one of those kits to run my in memory MySQL DB? Answer will be: No.
You most certainly can. SGI's MPI library is a nice-to-have thingie that simply utilizes the cache coherence protocols efficiently than most of the regular software. But it's by no means essential.

And ANY system of that scale is NUMA-ish, because you simply can't have one central controller overseeing all the action (simple light-speed delay in long conductors becomes an issue at this scale!).

Kernel prepatch 4.0-rc7

Posted Apr 9, 2015 0:32 UTC (Thu) by kloczek (guest, #6391) [Link] (7 responses)

> You most certainly can. SGI's MPI library is a nice-to-have thingie that simply utilizes the cache coherence protocols efficiently than most of the regular software. But it's by no means essential.

Theoretically? of course :)
In practice .. so from where I ca download MySQL source code redesign to use MPI on accessing to shared innodb pool?

Kernel prepatch 4.0-rc7

Posted Apr 9, 2015 1:51 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (6 responses)

SGI machines are single-image Linux machines. So simply go and download MySQL. All the RAM is completely cache coherent and accessible to applications.

Moreover, you can run multiple apps in parallel without any problems, NUMALink will simply migrate the required RAM pages onto the nodes with the CPUs used.

You need NUMA awareness if you want really heavy communication between multiple concurrent threads/processes.

Kernel prepatch 4.0-rc7

Posted Apr 9, 2015 3:53 UTC (Thu) by kloczek (guest, #6391) [Link] (5 responses)

> Moreover, you can run multiple apps in parallel without any problems, NUMALink will simply migrate the required RAM pages onto the nodes with the CPUs used.

OKi doki .. let's say I have 1TB in memory database with only one table. Such memory region is not possible to fit within UV node memory (8 DIMMs per node).
So on doing full table scan this operation cause that my DB process will be jumping between NUMA nodes or interconnects will be constantly in use and here will be biggest bottleneck. Probably each interconnect transaction takes few thousands of CPU cycles (much more than L2 cache transaction).
Remember that usually HPC interconnects are optimized for lowest possible latency (not for maximum bandwidth).

Interesting how big will be ratio between time spend on doing memory scan and waiting on delivery exact pages over interconnect to exact node?

(googling) .. ok i found: http://www.hpcresearch.nl/euroben/Overview/web12/altix.php

"The distinguising factor of the UV systems is their distributed shared memory that can be up to 8.2 TB. Every blade can carry up to 128 GB of memory that is shared in a ccNUMA fashion through hubs and the 6$^{\rm 6th}$ generation of SGI's proprietary NumaLink6. A very high-speed interconnect with a point-to-point bandwidth of 6.7 GB/s per direction; doubled with respect to the former NumaLink5."

WOW .. "6.7 GB/s per direction"!!!
So .. really UV interconnect is not about bandwidth :)

IIRC one socket latest v3 Intel CPU can do memory scan with something like 150 or 250 GB/s.

Current Sparc M7 can do this using single CPU socket which can handle up to 2TB RAM per socket. Each socket has 16 cores and each core can run up to 8 CPU threads. Each M7 can do on own node memory scan with up to 1TB/s per CPU socket.

$ echo "1024/6.7" | bc
152

So such operation on UV 2000 can be (theoretically) up to 150 times slower than on single socket M7 kit.

Let's forget temporary that M7 CPU can do memory scan on database compressed using columnar compression using special CPU subsystem accelerator for such operations which means that my 1TB in memory table can take less than 1TB RAM (usually compression ration like 6-7 is OK but sometimes it can be even 20 or 30).
Less physical memory used means that warming up database can be done way faster only by factor of compression ratio (let's forget about this that each UV node has only one x16 PIC slot which probably will be real pain in a*s on warming up DB).

During such scans on M7 CPU caches are not used which means that such in memory operation will not kick off my DB app code and other data from CPU caches .. it means -> less other hiccups.

Still looks like investing in M7 HW may have way more sense by factor 10 if not close to 100 per buck or quid if someone will be using TB scale in memory DBs. All above is not relevant to Linux or Solaris dilemmas :)

MPI can easy solve saturating interconnects by spreading such memory scan across multiple NUMA nodes. However there is no mysql or Oracle DB using MPI API on accessing to SGA or innodb memory pool :->

I know that I've made few assumptions in above but probably these calculations are not far from real numbers :)

Kernel prepatch 4.0-rc7

Posted Apr 9, 2015 9:41 UTC (Thu) by tao (subscriber, #17563) [Link] (4 responses)

It's fascinating how you're conflating operating system with hardware.

What's relevant is:

Out of the top 500 supercomputers, *none* of them use Solaris. Zip. Zilch. Nada. Hell, even Windows is more popular than Solaris on such hardware.

According to you Solaris performs much better than Linux on identical hardware. Hence it'd make sense for all those supercomputers to run Solaris rather than Linux. Why don't they?

Surely by now at least a few of them should've discovered that Solaris is oh so much better than Linux, right?

Kernel prepatch 4.0-rc7

Posted Apr 9, 2015 11:16 UTC (Thu) by kloczek (guest, #6391) [Link] (3 responses)

> According to you Solaris performs much better than Linux on identical hardware. Hence it'd make sense for all those supercomputers to run Solaris rather than Linux. Why don't they?

Because my workloads are usually DB generated IO hog workloads.

Again: using "supercomputer" name here is very misleading.
Example: Top500 tests are not testing interconnects. Bunch of 1k boxes connected over RS-232 cables and over IB will generate almost the same index.

HPC installations are usually relatively highly optimized for exact CPU intensive well localized workload. You can do enough good such optimization if you can customize some parts of the base OS or user space env. However you have quite big probability that such optimization cannot be repeated in other customers HPC env (because other customers may have different HPC needs). Most of the HPCs envs on OS/app layer are supported by on site eng team and all what they need is descent hardware support.
Strongest aspects of Solaris like HA, security and high data integrity are usually not on top of HPC priorities as some calculations are running many times (software specific factors).
Security? Whole HPC env is usually secured on "physical layer" (receptions, guards, bars, doors, physically separated from outside world network segments etc).
HA? During longest calculations very often is used checkpointing. If in the middle of the processing some data will be total power failure or single node will burn whole computations can be quickly continued from last checkpoint state. Few hours or even days downtime is not a big deal as almost all of every hour costs of running whole env are powering and cooling costs.

HPC is all about doing massive computations as cheaply as it is only possible by using any possible tricks which involves quite often quite big customization. Software support and/or help of professional OS support team are not on top of external help needs priorities list. Try to observe linux kernel lists how often HPC guys need some help (I don't remember even one case in more than last decade)

Part of whole single HPC ecosystem is usually some set of systems with IO hog workloads (backups, data import/export/(pre/post)processing etc). It is quite usual case that such parts here are working under Solarises (ZFS is very welcome friend here).
However on overall Top500 indexes calculations such parts are not taken.

Kernel prepatch 4.0-rc7

Posted Apr 11, 2015 18:05 UTC (Sat) by Wol (subscriber, #4433) [Link] (1 responses)

> > According to you Solaris performs much better than Linux on identical hardware. Hence it'd make sense for all those supercomputers to run Solaris rather than Linux. Why don't they?

> Because my workloads are usually DB generated IO hog workloads.

So why not ditch relational and move from a (inappropriate) theory based database to an engineering based database that actually scales? :-)

Cheers,
Wol

Kernel prepatch 4.0-rc7

Posted Apr 14, 2015 14:03 UTC (Tue) by mathstuf (subscriber, #69389) [Link]

As I've told you before, point me to a well-licensed implementation and an application which uses it as a backend and your Pick databases suddenly get interesting (as an end-user). Currently, I have zero interest in them since I see them in use ~nowhere (though I admit all I do with databases currently is set them up for other apps to use). Has the no-Free-implementation situation changed at all?

Kernel prepatch 4.0-rc7

Posted Apr 14, 2015 11:30 UTC (Tue) by nix (subscriber, #2304) [Link]

You're talking about mainframes (all about interconnects and I/O rates). Supercomputers are something completely different.

Kernel prepatch 4.0-rc7

Posted Apr 8, 2015 17:40 UTC (Wed) by Wol (subscriber, #4433) [Link] (19 responses)

What was the first OS to run native 64-bit on SPARC hardware? That's right, LINUX!

Cheers,
Wol

Solaris and Linux

Posted Apr 8, 2015 19:17 UTC (Wed) by vonbrand (subscriber, #4458) [Link] (18 responses)

Around 2000 we migrated our ageing Suns from Solaris to Linux. The difference in performance was not funny (gave the machines a couple of years life extension).

I remember some discussions here about Solaris, IllumnOS, and related software. None for quite some time now (except for trolls around ZFS). In the same vein, I've seen nothing about Oracle Linux either here. Looks to me that all that is dead as a doornail.

Solaris and Linux

Posted Apr 8, 2015 19:22 UTC (Wed) by dlang (guest, #313) [Link] (2 responses)

back when RedHat supported Sparc (long before they abandoned the desktop and created RHEL) I was running it on a Sun box, with the RedHat sticker on it. My Sun sales rep took me to task about it, and I gave him a simple demo how much faster Linux was to boot and get running on that box (running a large MRTG instance), his jaw hit the floor.

Solaris "scaled" better than Linux, in that a 64 cpu box was closer to 64x better than a 1 cpu box, but the base performance of Linux was so much better than until you got to a massive box, Linux would still outperform Solaris. And over time Linux has gained scalability, far beyond anything Solaris ever ran.

Solaris and Linux

Posted Apr 9, 2015 13:15 UTC (Thu) by pr1268 (guest, #24648) [Link] (1 responses)

I gave him a simple demo how much faster Linux was to boot and get running on that box (running a large MRTG instance), his jaw hit the floor.

I find that surprising, seeing how Sun designed and built the hardware, OS kernel, a matching suite of shell utilities, and even the compiler for all this. Perfect vertical integration! (Except for third-party userspace apps.)

You'd think that Sun would have optimized all this to Timbuktu and back given their knowledge of the architecture.

Or, I could propose a conspiracy theory that Sun intentionally crippled performance in an attempt to get their customers to upgrade frequently. Or perhaps everything was rushed out the door by the marketing department. ;-)

Solaris and Linux

Posted Apr 9, 2015 18:03 UTC (Thu) by dlang (guest, #313) [Link]

no conspiracy, just that they had a huge emphasis on 'linear scalability' to 64 cpus, so they were looking at the curve and were happy with how it looked.

At the time, Linux wouldn't perform well with 8 CPUs and would have been horrible on 64 CPUs

but on 1-2 cpus, Linux absolutely trounced Solaris

As Linux has matured, there has been ongoing emphasis in keeping performance good on small systems while improving it on large systems. When something new in introduced, Linus is very vocal in demanding that it not hurt performance when it's not needed/used.

Yes, the smallest systems have been getting larger. I don't like that. but the range between the smallest systems that work well and the largest keeps growing dramatically.

Solaris and Linux

Posted Apr 8, 2015 21:36 UTC (Wed) by kloczek (guest, #6391) [Link] (14 responses)

> Around 2000 we migrated our ageing Suns from Solaris to Linux. The difference in performance was not funny (gave the machines a couple of years life extension).

> I remember some discussions here about Solaris, IllumnOS, and related software. None for quite some time now (except for trolls around ZFS). In the same vein, I've seen nothing about Oracle Linux either here. Looks to me that all that is dead as a doornail.

ZFS has been introduced at November 2005. Whatever you been doing 5 years earlier you not been able to evaluate ZFS on Sun hardware.
In 2005 all Sun USparc hardware on which Linux been working was on EOS or very close to EOS.

If you are talking about compare Linux vs Solaris on very old hardware which was not supported by Solaris 10 it is really pure bollocks.
Just try to take 5 years (or more) old disks and try to complete any x86 hardware to run few benchmarks.
What is the sense of doing this? I have completely no idea.

Solaris and Linux

Posted Apr 8, 2015 22:09 UTC (Wed) by vonbrand (subscriber, #4458) [Link] (13 responses)

Just that I won't believe that Solaris suddenly got performant after that little affair.

Solaris and Linux

Posted Apr 8, 2015 22:33 UTC (Wed) by kloczek (guest, #6391) [Link] (12 responses)

> Just that I won't believe that Solaris suddenly got performant after that little affair

From Solaris 10 express development cycle (pre GA 10) every microbench showing that something is slower on Solaris than on Linux is treated by Solaris support as *critical bug* and nothing has changed up to now.

First Solaris 10 GA has been released at January 2005.
Seems you've lost almost 10 years of new features of Solaris.

In last 10 years of using Solaris I had many examples where Solaris running on the same x86 commodity hardware with paid support (even on non-Oracle/Sun HW) was cheaper than free Linux only because was possible to stay longer on the same hardware or in case Linux was necessary to buy more powerfull hardware.

If you are thinking that for example ZFS is worthless I can only tell you that few months age I've migrated on the same hardware some MySQL DB and only by switching from ext4 to ZFS was possible to observe drop down of physical IOs/s by factor up to 3 (not 3% but up to three times less).
Such example is not quite unique.

Try to have look on:

https://www.youtube.com/watch?v=HRnGZYEBpFg&list=PLH8...

You can start from:

https://www.youtube.com/watch?v=TrfD3pC0VSs&index=6&...

FYI: at the moment OpenSolaris/IlumOS on many areas is behind latest Oracle Solaris.

Solaris and Linux

Posted Apr 8, 2015 22:50 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (11 responses)

> From Solaris 10 express development cycle (pre GA 10) every microbench showing that something is slower on Solaris than on Linux is treated by Solaris support as *critical bug* and nothing has changed up to now.
Yeah, sure. I worked with Slowlaris and ZFS back in 2008-2009 and it definitely was waaaay slower than Linux in many aspects. In particular, we had very concurrent workloads and Linux scheduling was vastly superior.

And of course, let's not forget the historical: http://cryptnet.net/mirrors/texts/kissedagirl.html which sums it up perfectly.

Solaris and Linux

Posted Apr 9, 2015 0:04 UTC (Thu) by kloczek (guest, #6391) [Link] (10 responses)

> Yeah, sure. I worked with Slowlaris and ZFS back in 2008-2009 and it definitely was waaaay slower than Linux in many aspects. In particular, we had very concurrent workloads and Linux scheduling was vastly superior.

Sorry do you really want to say that IO scheduling can beat using free list instead allocation structures or using COW semantics?
So maybe you will be able to explain why ext4 has now COW?
Do you really understand impact of using COW semantics?

Maybe some quotes:

https://storagegaga.wordpress.com/tag/copy-on-write/

"btrfs is going to be the new generation of file systems for Linux and even Ted T’so, the CTO of Linux Foundation and principal developer admitted that he believed btrfs is the better direction because “it offers improvements in scalability, reliability, and ease of management”."

If you don't know btrfs is using COW semantics by default.

Just below above is next paragraph:

"For those who has studied computer science, B-Tree is a data structure that is used in databases and file systems. B-Tree is an excellent data structure to store billions and billions of objects/data and is able to provide fast data retrieval in logarithmic time. And the B-Tree implementation is already present in some of the file systems such as JFS, XFS and ReiserFS. However, these file systems are not shadow-paging filesystems (popularly known as copy-on-write file systems).

You see, B-Tree, in its native form of implementation, is very incompatible with COW file systems. In fact, the implementation was thought of impossible, until someone by the name of Ohad Rodeh came along. He presented a paper in Usenix FAST ’07 which described the challenges of combining the B-Tree concept with shadow paging file systems. The solution, as he described it, was to introduce insert/remove key into the tree structure, and removing the dependency of intra-leaf linking"

And second one few years earlier:

https://lkml.org/lkml/2008/9/27/217

"What do you mean by "copy on write", precisely? Do you mean at the
file level, directory level, or the filesystem level?
***We don't have any plans to implement "copy on write" in ext4***, although
you can create a copy-on-write snapshot at the block device level
using LVM/devicemapper. For many things (database backups, etc.) this
is quite suitable."

How it was possible that Ted Tso changed his mind about COW in last few years????
Maybe you don't care about using COW but core fs Linux developer does.

In real scenarios LVM snaphoting is not enough because every new LVM snapshot slows down interaction with snapshoted block device. In case ZFS you can create as many snapshots as you can ans still performance will be the same. Such effect is combination of using free lists and COW.

BTW: try to read https://rudd-o.com/linux-and-free-software/ways-in-which-...

If you want to continue this discuss please .. tell more about what you been really testing. I'm really interested what exactly you done and what kind of results you had :)

> And of course, let's not forget the historical: http://cryptnet.net/mirrors/texts/kissedagirl.html which sums it up perfectly

Of course you did't notice that above link points to text which has line:

Date: 1996/10/29

Do want to say that you been testing ZFS or Solaris 10 in 1996?
Solaris 10 GA it is Jan 2005 .. ~8 years after this email has been send.

Solaris and Linux

Posted Apr 9, 2015 0:18 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (8 responses)

> Sorry do you really want to say that IO scheduling can beat using free list instead allocation structures or using COW semantics?
Errr... I haven't really parsed the meaning of this.

We had a project doing lots and lots of IO interspersed with heavy multithreading. Lot's of this IO was O_DIRECT, so it didn't care a fig about COW.

And most certainly, ZFS is not the _fastest_ FS. Ext4 or XFS are usually faster in many benchmarks than either ZFS or btrfs simply because they need to do much less work for a typical IO request, doubly so for many metadata-heavy workloads.

> Do want to say that you been testing ZFS or Solaris 10 in 1996?
There's a reason why Solaris disappeared from Top500. Think about it.

Solaris and Linux

Posted Apr 9, 2015 1:09 UTC (Thu) by kloczek (guest, #6391) [Link] (4 responses)

> We had a project doing lots and lots of IO interspersed with heavy multithreading. Lot's of this IO was O_DIRECT, so it didn't care a fig about COW.

Seems you don't understand COW. It needs to be plugged below VFS layer on block allocation stage.

Using COW is causing that random VFS layer reads will cause random reads on block layer as well.
However it is not the case in case write IOs. COW can transform random VFS write operation to sequential write IOs or smaller number of such IOs. With clever IO scheduling you are able to reduce number of physical write IOs.
THIS is main advantage as on even SSDs. Doing less bigger write IOs instead of batches small IOs gives you better performance (reducing number interrupts as well).

> We had a project doing lots and lots of IO interspersed with heavy multithreading. Lot's of this IO was O_DIRECT, so it didn't care a fig about COW.

ZFS on lower layers does not care about MTs sources of IOs. From this point of view ZFS is multithread agnostics (from app point of view). In the same time ZFS as in kernel space application internally is multithreaded and able much better balance or spread even single stream of IOs across pooled storage using threads.

O_DIRECT was designed for in-place filesystems to allow IO to bypass the filesystem layer and caching. Generally bypassing ZFS caching is probably most stupidest thing which may happen if someone don't understand ZFS or don't understand what exact application is doing.

However if you really understand your application and really know what you are doing and want to obey zfs cashing you can do this without magic .. by change per volume primarycache=none or primarycache=metadata only. Everything OOTB :)

Solaris and Linux

Posted Apr 9, 2015 1:30 UTC (Thu) by kloczek (guest, #6391) [Link] (2 responses)

BTW using ZFS is possible to ignore O_DIRECT.
For example during initial import of mysql database from text dump you can switch volume settings to sync=disabled which can make such import waaaay faster :)

Above feature was implemented by my friend when we been working together in the same company.
http://milek.blogspot.co.uk/2010/05/zfs-synchronous-vs-as...

BTW claiming that other FSes are the same fast as ZFS as none of them can concatenate write IOs is really not true.

Solaris and Linux

Posted Apr 9, 2015 1:53 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

> BTW claiming that other FSes are the same fast as ZFS as none of them can concatenate write IOs is really not true.
WTF is "io concatenation"?

If you mean "IO request coalescing" then Linux can do it since 2.4 days.

Solaris and Linux

Posted Apr 9, 2015 2:28 UTC (Thu) by kloczek (guest, #6391) [Link]

> If you mean "IO request coalescing" then Linux can do it since 2.4 days.

No this is not about this.

If something on VFS layer will be doing two update operations in two different files and those files will be using blocks in separated locations (ie. one fort of block device and second one at the end of the same bdev) COW on block layer will cause that none of these two regions will be used or overwritten during doing random updates inside of these files and new space will be allocated and written using single (bigger) IO.
This allows reduce physical layer IO bandwidth.

Again: consequence of using COW is high possibility transforming random write IOs workload to sequential writes characteristics. Less seeks and high possibility of concatenate VFS write IOs on doing IOs on block dev layer.

http://faif.objectis.net/download-copy-on-write-based-fil...

Solaris and Linux

Posted Apr 9, 2015 1:59 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

> However it is not the case in case write IOs. COW can transform random VFS write operation to sequential write IOs or smaller number of such IOs. With clever IO scheduling you are able to reduce number of physical write IOs.
That's not an advantage if applications themselves can do it. Applications like, you know, Oracle databases.

> ZFS on lower layers does not care about MTs sources of IOs. From this point of view ZFS is multithread agnostics (from app point of view). In the same time ZFS as in kernel space application internally is multithreaded and able much better balance or spread even single stream of IOs across pooled storage using threads.
No, nothing is "multithread agnostic". ZFS has its own complicated metadata that needs additional locking compared to simplistic filesystems like ext4/ext3.

Surely, CoW and other tricks in ZFS give ability to easily make snapshots, do checksumming and other tricks.

Except that quite often I don't care about them - right now I'm tuning a Hadoop cluster and ext4 with disabled barriers and journaling totally wins over XFS and btrfs.

Solaris and Linux

Posted Apr 9, 2015 1:44 UTC (Thu) by kloczek (guest, #6391) [Link] (2 responses)

> There's a reason why Solaris disappeared from Top500. Think about it.

Try to estimate how much income is generated by all these HPC systems for all software vendors.

Solaris and Linux

Posted Apr 9, 2015 2:42 UTC (Thu) by rodgerd (guest, #58896) [Link] (1 responses)

If you're going to talk financial success, you'll need to explain why Solaris is so successful that Sun went broke and Oracle refuse to break out their Solaris sales on their earnings reports. Which, incidentally, they've been failing to meet for the last year or so.

Solaris and Linux

Posted Apr 9, 2015 4:10 UTC (Thu) by kloczek (guest, #6391) [Link]

> If you're going to talk financial success

No I'm not. If I can discuss anything here it can be only financial aspects of supporting Solaris or other OS on HPC platforms from both consumers and hardware/software vendors point of views.

> you'll need to explain why Solaris is so successful that Sun went broke and Oracle refuse to break out their Solaris sales on their earnings reports.

I have no idea about real reasons of above but I know that from Sun time number of developers involved in Solaris development grow few times. I don't think that Oracle hired more developers to work on Solaris only "just for fun". Using only this fact I don't think that you suspicions are correct/relevant.

Solaris and Linux

Posted Apr 10, 2015 0:22 UTC (Fri) by cesarb (subscriber, #6266) [Link]

> ***We don't have any plans to implement "copy on write" in ext4***

> How it was possible that Ted Tso changed his mind about COW in last few years????

As far as I know, there are still no plans to implement copy-on-write in ext4. So no, he didn't change his mind.

Kernel prepatch 4.0-rc7

Posted Apr 8, 2015 11:59 UTC (Wed) by mathstuf (subscriber, #69389) [Link] (3 responses)

We make software used by national labs and industry to visualize results from super computers while the simulation runs. I can't remember a single instance of anyone asking for Solaris support at all while Linux is everywhere. And I don't know the details, but it has definitely been asked how our stuff scales up to thousands of cores in a single kernel instance (particularly in memory usage because disks are slow and memory is tight; maybe Solaris/SPARC is better there, but that's not what they had).

You sound a lot like a shill paid to say these kinds of things ignoring all kinds of evidence to the contrary. Money made from support certainly isn't something I'd compare kernel suitability with personally. It seems to me like comparing fish and then proclaiming yours is the best because it has bigger gills than the others. It's not something anyone who compares fish ever really cares about.

Kernel prepatch 4.0-rc7

Posted Apr 8, 2015 17:43 UTC (Wed) by Wol (subscriber, #4433) [Link] (1 responses)

> Money made from support certainly isn't something I'd compare kernel suitability with personally.

Surely the money spent on support has an INVERSE correlation with suitability for the job? After all, if it works as intended, you DON'T NEED support - you only need to pay megabucks for support when you're trying to drive screws with a hammer :-) (or store real-world data in an RDBMS :-)

Cheers,
Wol

Kernel prepatch 4.0-rc7

Posted Apr 8, 2015 18:02 UTC (Wed) by rahulsundaram (subscriber, #21946) [Link]

Well, the depends on your notion of support. If you are calling only to troubleshoot problems, then yes, you shouldn't need that support much for a quality product but support can also mean getting guidance on how best to do something, pushing for new features that you want, understanding the roadmap etc and then a good product only enhances the need for support because you are using it in many places in potentially innovative ways.

Kernel prepatch 4.0-rc7

Posted Apr 14, 2015 11:35 UTC (Tue) by nix (subscriber, #2304) [Link]

You sound a lot like a shill paid to say these kinds of things ignoring all kinds of evidence to the contrary.

He's doing such a bad job that even I don't believe him, and I work for Oracle. Hm, maybe he's a shill paid by the competition to make Oracle look bad :P (yeah right, far more likely he's just, ahem, over-eager).

Kernel prepatch 4.0-rc7

Posted Apr 14, 2015 11:26 UTC (Tue) by nix (subscriber, #2304) [Link]

It is not matter of believe. NFS, DTrace, ZFS .. all this you can use on Linux and all these technologies have been developed not on Linux but on Solaris and you can use them now.

NFS was developed on SunOS, not Solaris. Most of the development and new-features effort on NFS is coming out of the Linux world by now (including the Linux part of Oracle :) ).

Kernel prepatch 4.0-rc7

Posted Apr 7, 2015 21:52 UTC (Tue) by pr1268 (guest, #24648) [Link] (5 responses)

While you're bashing Linux developers because of their age, and seeing how you appear to be an Oracle fanboy, then please know that Larry Ellison is 70 years old.

Kernel prepatch 4.0-rc7

Posted Apr 7, 2015 22:01 UTC (Tue) by kloczek (guest, #6391) [Link] (4 responses)

It is significant difference between Mr. Torvalds and Mr. Ellison.
Ellison has nothing to do with development of any software owned by Oracle :)

Kernel prepatch 4.0-rc7

Posted Apr 7, 2015 22:27 UTC (Tue) by tao (subscriber, #17563) [Link] (1 responses)

OK, shoot: how old is the lead developer of Solaris?

Kernel prepatch 4.0-rc7

Posted Apr 7, 2015 22:49 UTC (Tue) by kloczek (guest, #6391) [Link]

That is the problem that most of the Sun time Solaris developers are not involved in active working on Solaris.
In mean time Solaris development changed significantly. Now on many Solaris kernel space projects are working more developers than on whole kernel when Solaris was owned by Sun.

In commercial environment replacing some developers is more about business decision than political one. Linux kernel development relays much more on political decisions than technical one as many core developers spend most if not all career working only on Linux kernel.

Kernel prepatch 4.0-rc7

Posted Apr 14, 2015 11:37 UTC (Tue) by nix (subscriber, #2304) [Link] (1 responses)

Ellison has nothing to do with development of any software owned by Oracle :)

That's actually wrong. There is public record that this is one reason why he stepped down from the chairmanship -- to get back closer to the coalface again.

Kernel prepatch 4.0-rc7

Posted Apr 14, 2015 11:37 UTC (Tue) by nix (subscriber, #2304) [Link]

Argh. I meant from the CEOship of course.

I wish there was an edit function on LWN...