LWN.net Logo

KS2010: Scalability

By Jonathan Corbet
November 3, 2010
2010 Kernel Summit
Scalability is an ongoing challenge for any operating system; the recent shift toward increasing numbers of processors has only made that challenge worse. Paul McKenney led a Kernel Summit session on where we stand with scalability and where things are likely to go. He started with a straw poll asking where the members of the audience thought the logical limit of scalability would be. Most said that 64 CPUs would be the limit, but responses were all across the map from 4096 on the high end to one bowtied joker who said the limit should be zero.

Paul talked for a little while about the various attributes of systems affecting scalability - NUMAness or the types of peripherals available, for example - and the different kinds of workloads which might be encountered. Scalability means different things in different situations. Linux currently has a long list of concurrency control mechanisms (he showed a full slide) and various data structures which are meant to be scalable. Then he asked: what do we need now?

James Bottomley asserted that we still have not hit the scalability wall, that the reductionist, lock-splitting approach continues to work. Christoph Hellwig responded that the wall has indeed been hit in the filesystem area, necessitating different approaches. So, for example, they have had to create separate threads to deal with certain kinds of work in a serialized manner.

Back around 2000 or so, Ted Ts'o said, there was a large, coordinated effort to improve the scalability of the system, then things stopped for a while. People are beginning to pay attention to the problem again, but the work is much more ad hoc, without a visible grand plan. He suggested that all maintainers should be looking at how well their subsystems will work on systems with dozens of cores. Those systems are out there and becoming more common; even if the maintainer doesn't see the problems, users will. Steve Rostedt said that the realtime preemption patches can be a good way to make scalability problems more visible without the need to track down a 64-core system to test kernels on.

How much farther can scalability work go? Arjan van de Ven fears that we will hit the locking cliff at some point. Ben Herrenschmidt thinks that things are going fairly well, and that improvements in instrumentation have made finding scalability problems easier. There's not much that we have to change in how we are doing things. Tony Luck says that, with 64 cores and 128 threads, every benchmark suffers; there are lots of "little BKLs" and each application hits a different one. It can be hard to find them all before customers do.

Linus is relatively sanguine about the whole picture. He said that silicon technology is getting closer to the absolute physical limits, so we'll not have to worry about CPU scaling for a whole lot longer. So it's not worth doing a whole lot on the kernel side; unless some wild new technology comes along, there are limits to how much more scalability work will be required.

Next: Minisummit reports


(Log in to post comments)

KS2010: Scalability

Posted Nov 3, 2010 22:55 UTC (Wed) by cesarb (subscriber, #6266) [Link]

> He said that silicon technology is getting closer to the absolute physical limits, so we'll not have to worry about CPU scaling for a whole lot longer.

Wouldn't getting closer to the physical limits make things _worse_, because instead of just making a faster CPU, people start having to add more and more separate cores? That to me looks like needing even more scaling, not less. Even more if the latency between the cores increases because there are so many of them.

KS2010: Scalability

Posted Nov 4, 2010 2:34 UTC (Thu) by PaulMcKenney (subscriber, #9624) [Link]

You can argue this any numbers of ways:
  1. As you say, if we really need exponentially increasing numbers of CPUs, then silicon limitations will require more sockets and thus more latency.
  2. If the limits are economic, then most people will buy as large a system as makes economic sense for their workload. This has historically resulted in very many very cheap small systems and a very few very expensive large systems, effectively limiting system size more severely than the pure technical limits.
  3. There is the question of whether the limits are good or bad, in other words, do you favor increased CPU counts on the one hand or existing software on the other?

KS2010: Scalability

Posted Nov 4, 2010 12:40 UTC (Thu) by dgm (subscriber, #49227) [Link]

Fear not. Costs increase witht the number of sockets a system can support, quickly reaching the point where it's not economically viable except for very specialized (and thus expensive) systems.

KS2010: Scalability

Posted Nov 4, 2010 12:45 UTC (Thu) by nix (subscriber, #2304) [Link]

Yes, costs increase with the number of sockets... which is why chip manufacturers are adding extra cores instead. Hence the problem.

KS2010: Scalability

Posted Nov 11, 2010 22:11 UTC (Thu) by jnareb (subscriber, #46500) [Link]

There is also speed of communication limit, which coupled together with miniaturization limits (which means that length of communication channel would grow with number of cores) makes increasing number of cores a moot point. Adding cores would not increase performance, except in rare cases where synchronization is not needed.

Copyright © 2010, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds