User: Password:
|
|
Subscribe / Log in / New account

Single-threading not considered harmful

Single-threading not considered harmful

Posted May 7, 2011 2:10 UTC (Sat) by quotemstr (subscriber, #45331)
Parent article: Scale Fail (part 1)

When the author admonishes us for using "single-threading", we naturally suppose that we should use "multi-threading" instead, but because a "multi-threaded program" is commonly understood to be composed of many shared-memory lightweight processes, following this advice will in fact tempt us to create programs that become expensive to scale. In a sense, multi-threading, not single-threading, is the true enemy of scalability.

In any massively parallel system, it's the communication between processing nodes that ultimately limits the size and performance of the system. When we express concurrency using multiple threads, we naturally use use the memory shared by these threads as the communication medium. But because shared memory scales poorly, the cost of using ever-larger coherent-memory systems quickly overwhelms any possible benefit.

Having run into this wall, we transition to a communication medium that scales much better, although (or because) it offers fewer features and less coherency compared to shared memory; examples include databases, clustered filesystems, and specialized message queues. After this expensive and painful process, costs again increase linearly with capacity: processing nodes can be spread across multiple machines instead of having to share a single increasingly powerful machine. Because the communication medium is no longer shared memory, the possibility of multiple threads sharing a single process becomes irrelevant, and we see that the work we invested in using this kind of threading was wasted.

So to avoid these ends, let's avoid these beginnings: avoid multi-threading. Use single-threaded programs, which are easier to design, write, and debug than their shared-memory counterparts. Instead, use multiple processes to extract concurrency from the hardware. Choose a communication medium that works just as well on a single machine as it does between machines, and make sure the individual processes comprising the system are blind to the difference. This way, deployment becomes flexible and scaling becomes simpler. Because communication between processes by necessity has to be explicitly designed and specified, modularity almost happens by itself.

I believe the author had these points in mind when he wrote his article, but by denouncing "single-threading", he risks sending some readers down an unproductive path. Concurrency is the ultimate goal, and it's usually achieved best by a set of cooperating single-threaded programs. The word "thread" refers to a concept that resides at a level of abstraction not appropriate for this discussion, and its use can only muddle our thinking.


(Log in to post comments)

Single-threading not considered harmful

Posted May 7, 2011 8:10 UTC (Sat) by NAR (subscriber, #1313) [Link]

Welcome to Erlang.

Single-threading not considered harmful

Posted Jul 4, 2011 11:40 UTC (Mon) by csamuel (✭ supporter ✭, #2624) [Link]

Or MPI which lets you span nodes and is frequently used for highly parallel High Performance Computing (HPC) codes.

It also works well within a single system; I've seen a particular HPC crash simulation code which came in both SMP and MPI variants and even within a single multi core system the MPI version scaled better than the SMP version.

Single-threading not considered harmful

Posted May 7, 2011 16:58 UTC (Sat) by Aliasundercover (subscriber, #69009) [Link]

My complaint with threading is correctness more than performance. With all resources shared between all threads the scope of an error is the whole application. Use more explicit sharing and the really dangerous errors get confined to the smaller portions of the program actually dealing with the sharing.

Performance scaling? Sure, shared memory doesn't scale to the largest jobs. For that you need methods that cross machines. Those methods don't scale to the smallest latencies. For that you need shared memory.

Still, I agree. Threading is too much the method of fashion more because everyone is doing it than technical merit. Of course everyone doing it means you get to use libraries other people wrote and spend less time swimming up stream.

Single-threading not considered harmful

Posted May 12, 2011 0:34 UTC (Thu) by jberkus (subscriber, #55561) [Link]

Yes, sorry for the confusion of terminology. "Parallel programming" is what I want to encourage. I'm a little unclear on what the official term for non-parallel programming is (is there one?) so I used "single-threaded".

Single-threading not considered harmful

Posted May 12, 2011 1:57 UTC (Thu) by Trelane (subscriber, #56877) [Link]

serial programming.

Single-threading not considered harmful

Posted May 12, 2011 15:47 UTC (Thu) by kamil (subscriber, #3802) [Link]

No, it's sequential programming.

Single-threading not considered harmful

Posted May 12, 2011 15:52 UTC (Thu) by Trelane (subscriber, #56877) [Link]

Single-threading not considered harmful

Posted May 19, 2011 9:11 UTC (Thu) by renox (subscriber, #23785) [Link]

I think that 'single-threading' is correct the issue is that when we think about multi-threading we only think about multi-threading within one process but multi-processing is also multi-threading (because each process contains at least one thread).

Maybe multi/single-tasking would be better terms?

Single-threading not considered harmful

Posted May 22, 2011 10:47 UTC (Sun) by kplcjl (guest, #75098) [Link]

Actually, I don't see multi-threading as anything but handling single processes multiple concurrent times. It doesn't matter if you are talking about 5 services doing a single different thing in each or if a single service is setting up multiple threads to do the same process several times at once.

The problem is recognizing when a dedicated single process is the best answer to your problem or when multiple threads on the same server would be best. Generally, when your problem is data driven, the multiple thread solution would be better and when your problem is cpu intensive, a single thread dedicated to solving the problem combined with a data storage mechanizm so multiple servers can attack several individual problems concurrently may be the way to go.

This article seems to cover the situation where one of those solutions fit and therefore it becomes the "best" solution for every situation. It doesn't seem to apply to dumb coding mistakes that make good design go bad. For instance, I saw one case where it should have used "-" instead of "+" in one place of a mathematical equation. It took me a week and a half to convince the manager there was a mistake in the equation.

Single-threading not considered harmful

Posted May 22, 2011 11:08 UTC (Sun) by kplcjl (guest, #75098) [Link]

I misspoke, when I said "multiple concurrent times". On a single core nothing works that way. I was thinking of the OS scheduling the core's use to make it appear it is doing concurrent processing to the human onlooker as well as multiple cores doing true concurrent work.

Single-threading not considered harmful

Posted May 20, 2011 19:48 UTC (Fri) by mikernet (guest, #75071) [Link]

You can achieve both multi-threading and multi-process/multi-machine scalability, and should for a large application. A single thread serving many concurrent users will be very latent. Spinning up a seperate process for each user is incredibly resource heavy, processes are much heavier than threads.

Your application should be multi-threaded so that all the processor cores and idle time on any one machine are fully utilized. It should also be able to scale across machines so you can increase total processing capacity easily.

Single-threading not considered harmful

Posted May 20, 2011 22:44 UTC (Fri) by raven667 (subscriber, #5198) [Link]

I don't have any test data in front of me but I'm not sure your statement is entirely true. On Linux threads are not significantly lighter than processes because processes are extremely light weight. Both are scheduled the same in the kernel and with copy on write memory management there isn't that much difference for the in-kernel housekeeping between creating a thread or forking (cloning) a process.

On a modern multi-socket multi-core machine each socket is its own largely independent computer and the whole machine is a NUMA cluster. That means that each process is assigned to a particular node and that's where its memory lives, splitting memory between nodes or bouncing a process between different nodes reduces performance. My guess is that threading will scale weirdly when you get beyond what can be handled by all the cores in one socket whereas a multi-process model can keep more memory local to the socket the process is running on.

I would suggest starting with a multi-process model because you get better fault tolerance with memory protection then consider threading if that doesn't test out for concurrent performance


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds