Interview with Donald Knuth (InformIT)
The success of open source code is perhaps the only thing in the computer field that hasn't surprised me during the past several decades. But it still hasn't reached its full potential; I believe that open-source programs will begin to be completely dominant as the economy moves more and more from products towards services, and as more and more volunteers arise to improve the code."
Posted Apr 27, 2008 16:55 UTC (Sun)
by jordanb (guest, #45668)
[Link] (5 responses)
Posted Apr 27, 2008 18:02 UTC (Sun)
by vonbrand (subscriber, #4458)
[Link] (2 responses)
The same ideas in web are also in "document in the source" systems like doxygen.
Sure, Knuth is of the "write it all by myself" people, and so he has no use for "resusable code". It is somewhat surprising that he thinks so highly of open source in this light. But academia is open source in a way (publish to get your ideas reviewed/corrected/built upon by others), so...
Posted Apr 28, 2008 8:07 UTC (Mon)
by rsidd (subscriber, #2582)
[Link]
Sure, Knuth is of the "write it all by myself" people, and so he has no use for "resusable code". It is somewhat surprising that he thinks so highly of open source in this light.
If you read the fine article, he says he prefers "re-editable code" to "reusable code". From that point of view, open source makes perfect sense.
If all you want is reusable code, you can link against a proprietary library, pay whatever licence fees are required, swallow their bugs. You don't need open source.
Posted Apr 28, 2008 17:31 UTC (Mon)
by felixfix (subscriber, #242)
[Link]
Posted May 6, 2008 1:23 UTC (Tue)
by jschrod (subscriber, #1646)
[Link] (1 responses)
Posted May 6, 2008 6:25 UTC (Tue)
by nix (subscriber, #2304)
[Link]
Posted Apr 27, 2008 21:14 UTC (Sun)
by khim (subscriber, #9252)
[Link] (10 responses)
For the last quarter of century hardware designers are fighting supreme adversary. Basically as CPUs are going faster "hot spot" where computation can actually occur is going smaller and smaller. In contemporary CPU it's tiny pimple on a die - and there are no way to make it bigger! First caches were introduced, then a lot of speculations, etc. Finally we've reached the point where you can not actually speed up typical programs by going from 50mm2 to 100mm2 - and mechanical and marketing limitation mean that "normal" CPU must have die of 100mm2-200mm2. At this point there are nothing left except introduction of multicore architecture. So no, multicores are not fad and they are not going away: if someone will invent clever new way to speedup linear program another 50% by adding hundred million transistors - we'll go from four cores to two - and then back to four. Number of transistors on die are growing but number of transistors in "hot spot" is essentially fixed - multicores are admission of defect: hardware people can not give us anything else. You can choose one core, two cores, four cores and so on - but they all will be of more-or-less the same speed! If you can use them - well, it's great, if you can not - tough look.
Posted Apr 28, 2008 4:46 UTC (Mon)
by nevyn (guest, #33129)
[Link] (5 responses)
Yes, it's going to be really hard to make HW faster in the future. All the HW people are saying that the big obvious gains have been found, and there are no more to come (for serialized instructions). Indeed there hasn't been a must have CPU upgrade in the last year or two, although they have got faster and in some cases the commonality of dual cores has been a boon. However that doesn't mean you can wave a magic wand and say "all software will be multi-threaded, and run correctly". Now on the other hand we've had the possibility of doing multi-tasking for at least 20 years (via. fork() + large Unix boxes), mmap with a unified cache might be a bit more recent. pthread_create() is a bit more recent still, and having it all be accessible via Linux is even more recent. But it's fair to say that you could "fairly easily" get access to 2 CPU boxes 10 years ago.
But with 10 years lead time the SW has basically made zero progress, it's still just as hard and just as error prone to write C+pthread code. Now maybe, due to dual core by default, there will be some breakthrough in the next 10 years ... or maybe we'll all magically start (re-)writing in erlang/whatever. But personally I doubt it, I find it much easier to believe that the answer from the SW people will be "128 core CPUs are irrelevant, start getting used to things not getting faster". The combustion engine didn't get significantly faster forever, and the world didn't end ... I imagine the same will be true here.
Posted Apr 28, 2008 5:54 UTC (Mon)
by khim (subscriber, #9252)
[Link] (2 responses)
We had no progress for 50 years with multi-core algorithms because there was no need: hardware people did most of the work. Now they've stopped doing it. And of couse there huge number of programs which can benefit from 128 cores - at least in theory. Which ones will be rewritten depends on speed of said program: sure, you can rewrite ls to be multicore-aware but in real world few cases of ls usage will be accelerated so no, there are nothing to gain, but with convert (from ImageMagick)... it's different story. There are tons of programs which can (and will) be multithreaded and more still which are not really needed but will be used anyway (how many people need games? yet today's GPU is mostly result of this pseudo-need). It's just for a long time software people had the luxury of faster and faster CPUs every few years and had no real pressing need to use SMP. Today - they are forced to use SMP. Different situation and it'll lead to different outcome.
Posted Apr 28, 2008 6:44 UTC (Mon)
by drag (guest, #31333)
[Link] (1 responses)
Posted Apr 28, 2008 7:29 UTC (Mon)
by ekj (guest, #1524)
[Link]
The loads that -matter- are those where you would like to do more, but are limited by CPU. Or where you'd like to do what you do today quicker.
Offcourse in principle you -always- want to go quicker, but if "ls" already spends 0.042s waiting for I/O and 0.002s doing CPU-work, then it really is of no practical importance if those 0.002s could be efficiently 128-way parallellized.
I think that -most- loads that matter can be parallellized easily. In some cases TRIVIALLY. Can you give a few examples of real-world cases where waiting for the CPU is a real concern, but the problem is not parallellizable ?
I know that the things where I spend time waiting for my CPU are easily parallellizable:
The lightspeed-limit bites for IO too offcourse, particularily the type where I'm doing IO off some device in Australia.
Posted Apr 28, 2008 11:28 UTC (Mon)
by smitty_one_each (subscriber, #28989)
[Link]
Posted Apr 28, 2008 16:08 UTC (Mon)
by ncm (guest, #165)
[Link]
Posted Apr 28, 2008 7:17 UTC (Mon)
by ekj (guest, #1524)
[Link] (2 responses)
Posted Apr 28, 2008 11:34 UTC (Mon)
by khim (subscriber, #9252)
[Link] (1 responses)
It's hard enough dealing with the multiple layers we have on todays CPUs, making an awful LOT more layers would -not- simplify design. Before we'll do this we'll need technology for at least two layers. Currently we are using one and only one transistor-layer in our CPUs. Sure there are talks about 7, 9 even 15 layers - but these are metalization layers - all transistors (which do actual work) are in single layer... You can easily glue many dies together - but again, it's multicores, not bigger hot-spot...
Posted Apr 28, 2008 11:56 UTC (Mon)
by ekj (guest, #1524)
[Link]
Posted Apr 28, 2008 16:34 UTC (Mon)
by iabervon (subscriber, #722)
[Link]
Posted Apr 28, 2008 1:15 UTC (Mon)
by jamesm (guest, #2273)
[Link] (1 responses)
Posted Apr 28, 2008 6:20 UTC (Mon)
by flewellyn (subscriber, #5047)
[Link]
Posted Apr 28, 2008 16:29 UTC (Mon)
by forthy (guest, #1525)
[Link] (1 responses)
Contrary to what Don Knuth thinks, TeX has a lot of inherent
parallelism, at least the typesetting process (let's forget the
compilation process of \def definitions - if this needs to speed up,
precompiling and caching is sufficient; .sty files don't change that
often). First of all, TeX typesets paragraphs pretty isolated. Each paragraph
can be typesetted independent from all others - in parallel. TeX tries
several times to typeset a paragraph with different spacing - this can
also happen in parallel; whoever is best wins. What I agree with him is that the "pipe" abstraction for parallel
processes is proably the best we have found so far. Verilog/VHDL
(synchronous parallelism) is difficult, shared memory is easier to write
but even more difficult to debug (asynchronous parallelism, race
conditions, deadlock, livelock, etc.). The pipe abstraction however has
been used successful even by simple shell script programmers. The tape-sorting stuff probably is outdated, but if you look at how
algorithms with pipes work, you end up with the same principles as with
tapes: your input and your output are basically sequential. Maybe some of
these algorithms can be reused for pipe-sorting.
Posted Apr 28, 2008 21:05 UTC (Mon)
by jordanb (guest, #45668)
[Link]
Posted Apr 28, 2008 16:30 UTC (Mon)
by zooko (guest, #2589)
[Link] (2 responses)
I generally agree with Knuth that the current excitement about multiple cores and new parallelism techniques is unfounded. But rather than argue theory, can we generate numbers? Does anyone have a machine in which they can swap out a CPU for another of like architecture but a different number of cores? Then we can measure the times to do things that we actually care about -- things like boot-up time, time to launch applications, speed of web browsing, responsiveness when using local interactive applications like text editors.
My guess is that these measurements would be slightly better or slightly worse (yes -- worse) when the multiple-core CPU is in, but probably they would be within the margin of error.
This wouldn't prove, of course, that some future software invention wouldn't make the system use multiple cores to advantage (nothing can disprove that hypothesis), but it would give us a way to find out if such an invention ever does work.
Other ways to measure besides "human holding stop-watch" might include LatencyTop or the recently announced Phoronix Test Suite.
In the meantime all of those people that are buying multiple core CPUs for their workstations, laptops, personal servers, etc. are probably just throwing away money.
My blog entry: https://zooko.com/log-2008.html#d2008-04-28-multiple_cores_considered_wasteful
Regards,
Zooko
Posted Apr 28, 2008 20:28 UTC (Mon)
by drag (guest, #31333)
[Link]
Posted Apr 29, 2008 6:44 UTC (Tue)
by graydon (guest, #5009)
[Link]
Interview with Donald Knuth (InformIT)
I'm generally a fan of document-driven and document-centered development (and agree with
everything he says about XP). I've never managed to get into WEB though. The main problem is
that the source files all look like a mess. It's as if the documentation and the source had a
really messy head-on collision. I can't imagine them being anything other than a maintenance
nightmare.
Of course, what he says about not seeing the point of test suites suggests that he really
doesn't spend much time dealing with maintainability issues, and sees development as a very
personal thing, which probably explains the weird restrictions he has on TeX.
Interview with Donald Knuth (InformIT)
Interview with Donald Knuth (InformIT)
Interview with Donald Knuth (InformIT)
There is a difference between reuseable and editable. Reuseable code has to be so thoroughly
documented that you never need to look inside it to know exactly what happens under corner
cases. I have run into this problem so often that I feel, like Knuth, that editable source is
better. Perl's LWP is a great example. I love it, it usually does just what I want, but then
there are conditions where I don't know what it does, and I have to inspect the source or poke
around in the debugger to find out exactly what is going on. If it were closed, I would be
stuck. If it were proprietary but with source available, like Microsoft's useless "shared
source", I would still be stuck if it didn't do the right thing or had a bug.
Interview with Donald Knuth (InformIT)
If you read the interview again, you might note that he doesn't write that test suites are bad
in general -- he writes that *his* development style doesn't fit with unit tests a la JUnit or
so.
Please remember: TeX was probably the very first open source program that ever came with a
test suite, the trip test, and where every port must pass this torture test, otherwise it
can't be named TeX. And the trip test was already there in 1978. I'd like if half of the other
open source software would have test suites that are 10% as good as those of TeX and friends.
And I don't know what you mean with "weird restrictions". If you mean the overall usage of
global variables, reluctance to use structures, and other stuff that's not up-to-par with our
current software methods -- that's caused by the restrictions of Pascal compilers at the time
of TeX's creation. Back in 1981/1982, when we did our first TeX port, we actually had to
change the Pascal compiler to be able to get TeX running.
Interview with Donald Knuth (InformIT)
A testsuite, let us not forget, whose values were determined the way
values in testsuites *should* be determined, by working out in some other
way (in this case by hand) what the values shold be and plonking them in.
I have seen all too many testsuites in which this is not the case. (Oddly
enough, these tests rarely fail, unless run in a universe in which 1 !=
1.)
Multicores are admission of defeat - and they are here to stay...
Multicores are admission of defeat - and they are here to stay...
128 core CPUs are irrelevant, start getting used to things not getting faster? Wrong answer!
128 core CPUs are irrelevant, start getting used to things not getting faster? Wrong answer!
Some things, like graphics, are natural things to parallelize. Your dealing with multiple
tiles in a mpeg video or some number of discrete mathematical functions be used over and over
again on a large number of 3D objects or whatnot.
then there are other things, like imagemagick you mentioned, that can be setup to run in batch
mode were you have lots of images or calculations to do over and over again on a number of
images or whatnot. With that there probably isn't much need to do any multithreading.. Just
fork it for how many images your working on. Batch programming at it's best.
Even for static renderings with raytracing this is the way to go.. the multitheaded portion
would be relatively small and then just fork it and stitch the images back together after it's
been rendered.
For lots of this stuff it would make sense to have to do multithreading if your in a
Windows-only world, but with Linux the overhead of fork is much smaller and leads to simpler
and less buggy programs.
And of course there is multitasking and all that. Even then how many things do you have going
on at once? 3 things? 4 things? How is that going to translate to big savings when your
dealing with a entry level Dell desktop computer with 8-16 cores?
Do you really want to see something with the complexity of Firefox or OpenOffice.org have it's
code base be refactored to support multi threaded programming? They have a tough enough job
now as it is trying to maintain their code base without all that extra overhead.
How many programs that you use on a daily basis that could benefit from it?
I did see big benefits from moving from One CPU to Two. I always wanted to do that, but I
couldn't justify the expense of a SMP board and such for home use.
From Two CPUs to Four will probably see some benefits.
For rendering games and doing other stuff integrating the GPU into the CPU die will help. So
your going to take advantage of maybe another 4 cores.
If people start doing crazy stuff with raytracing and graphics then I can see taking advantage
of up to 16 cores... Plus it'll make video processing much less buggy since we can hopefully
get away from the proprietary hell that Nvidia and (historically) ATI have driven us into.
I donno.
Intel and friends are talking about _EIGHTY_ cores. (Maybe they'll use 4-8 for normal tasks
and then the rest for graphics and a few specialized cores for giggles and grins?)
It seems that they want programmers to start using parallel processing on even basic tasks. I
think that this is a tall order given that, in general, software is in a such a lousy state as
it is right now. I mean it's not like even very advanced programmers have perfected single
threaded application programming.
But the interesting question isn't what percentage of -all- loads can be paralellized. The interesting question is what percentage of the loads that MATTER can be parallellized.
128 core CPUs are irrelevant, start getting used to things not getting faster? Wrong answer!
But really, mostly when I'm waiting for my computer, I'm waiting for it to do IO. Not waiting for it to compute. Thus advances in IO, such as seek-less solid-state disc and faster internet-links are much more relevant to me than faster CPUs. I suspect this is true for most peronal computer-users.
Multicores are admission of defeat - and they are here to stay...
It struck me that the distinction between the hardware and the operating system is blurring a
bit.
If the kernel is judiciously sprinkling processes over multiple cores, then it seems hasty to
decry this industry direction, even if the human brain has trouble grasping how to span a
single process across multiple cores: so what?
Multicores with a powerful language rule
You can say we've made zero progress only if, in fact, you pay no attention to the progress
that has been made.
VSIPL++ transparently parallelizes array and image processing operations, just about
completely hiding its use of MPI libraries underneath. It scales linearly with the number of
processors. It does this using the very powerful abstraction mechanisms that only became
widely usable in Standard C++, and are still available only in C++ and in a few researchy
languages like Haskell and OCaml.
The lesson here is that progress on fundamentally difficult problems comes not just from hard
work (and the developers of VSIPL++ implementations have certainly worked hard) but from
fundamentally powerful language mechanisms. Most problems are easy, and popular scripting
languages make it quick to deal with easy problems, but hard problems fall only to powerful
languages and people skilled in using powerful languages. (Reader should note that
"object-oriented" features were not especially helpful in solving this problem.)
Multicores are admission of defeat - and they are here to stay...
True. Lightspeed means that at 1Ghz, no signal can travel more than about 30cm. At 3Ghz, no
signal can travel more than 10cm. And switching-times aren't instantaneous, so the practical
limits on die-size are smaller.
I'd like to point out though, that the *physical* limit on serially dependant computation is a
*volume* and not an *area*.
If lightspeed allows a 100mm^2 (10mm by 10mm) then lightspeed also allows a cube of 10mm by
10mm by 10mm, or 1000mm^3.
There are reasons we don't do it that way, both construction and cooling would be a real
bitch. It's hard enough dealing with the multiple layers we have on todays CPUs, making an
awful LOT more layers would -not- simplify design.
Volume? Not yet...
Volume? Not yet...
True.
My point was just that if you're only concerned with PHYSICAL limitations, like lightspeed,
then the limit is a volume, and not an area.
I agree with you we are nowhere near being technically capable of designing and manufacturing
a true 3d cpu (one that uses the entire volume of say a 1000mm3 to do computation)
But this is a technical limitation, and one that can, atleast in principle, be overcome. And
not a physical one that asfar as we know is final -- like lightspeed.
Multicores are admission of defeat - and they are here to stay...
Consumer systems aren't doing single linear tasks any more. There's generally a bunch of
intermittent stuff going on throughout the session, and a single task that the user's
attention is on, whose reaction time determines the quality of the user experience. For this
sort of workload, adding cores helps up to the maximum number of possible
simultaneously-running tasks. That is, people will watch movies while waiting for their
spreadsheets to recalculate, and they notice how long it takes and whether the movie is
smooth.
That means that, while O(N) cores isn't yet useful for most people, the small constant number
of useful cores is more than one. I expect that there will be demand for about 8 cores when
software developers get used to being able to do CPU-hungry background tasks without impacting
the interactive thread at all.
Interesting how his main work computer is not connected to the net. I wonder if that's for security reasons or to aid productivity (or both?).
How does he manage without twitter/facebook/reddit ?
How does he manage without twitter/facebook/reddit ?
From what I understand, it's to improve productivity. He has not used email since 1990, after
using it for fifteen years. On his web page, he wrote that he does his work in "batch mode",
and responds to correspondence the same way. So I'd gather he doesn't use a net-connected
computer because it limits distractions and lets him do things in batches.
TeX is highly parallelizable
TeX is highly parallelizable
One thing that I think is missed here is that if you look at what the benefits of Moore's law
have gone to in the past, they've been split between making the *computer* more capable by
exploiting the increased power for more sophisticated tasks (like real-time video decoding)
and making *programming* the computer easier, by permitting programmers to write less
efficient programs in higher level languages.
Now hardware manufacturers are saying "we're going to produce chips where the only way to
exploit the benefits of the increased power is to adopt significantly more complex and
difficult programming models, and we're going to expect programmers to recode the applications
to do that."
Well, so basically what they're saying is that they're not going to (be able to) continue to
make computers easier to program. That sounds like Moore's Law hitting a wall to me. Sure by
going massively parallel we could perhaps continue to reduce time to compute in those rare
situations where it's worth it to do so, but that's only half of the benefit we've
traditionally gotten from Moore's Law.
Instead of saying "rewrite your programs to take advantage of multiple cores" they could *not*
give us more cores and say "rewrite your programs to be more efficient with the hardware you
have," but that would be blatantly admitting that Moore's Law is dead.
let's see some numbers
let's see some numbers
You can probably go into the kernel and disable support for more then one or two cores, then
enable them later and see if there is any difference. On a quad core machine, of course. Which
is pretty affordable nowadays, in terms of development machines.
let's see some numbers
For what it's worth, I regularly run testsuites and build jobs that are embarrassingly
parallel, so when I purchased a quad-core for work this fall, make -j made a nearly linear
improvement on this particular bottleneck in my work cycle. Didn't make me any smarter or
quicker with emacs, but made a half hour cycle between major edits collapse to about 7
minutes. That was very noticed. I'll happily support buying 256 cores on the same line of
reasoning, when they're available, because 7 seconds is much better than 7 minutes!
(And this is a tolerably fast job; I've had embarrassingly parallel workloads that take
*weeks* on the provided hardware. That's when you start snooping around for cross-organization
job queues and whatnot, to steal idle time from other machines.)
I'll grant that developers are probably not who the multicore "revolution" is aimed at. But
the numbers I've seen suggest that transaction-bound network servers of various forms see a
good win too: particularly those that were built around some sort of process-pool or
thread-pool model. More of the parallel thingies run at the same time. QED.
