|
|
Log in / Subscribe / Register

Galaxy in-memory data grid released

Galaxy in-memory data grid released

Posted Jul 12, 2012 22:40 UTC (Thu) by pron (guest, #85696)
In reply to: Galaxy in-memory data grid released by zlynx
Parent article: Galaxy in-memory data grid released

Author of Galaxy here.

Let me explain the choice of Java (or rather, the JVM) for the project. For large-scale high-performance software, the JVM is almost the only reasonable choice. First, the number of available high-quality component, esp. open-source, is unparalleled. Second, the JVM has excellent concurrency support, from low-level (memory barriers, CAS) to high-level (thread pools and tasks), and incredible parallelism support through the fork/join framework (that our commercial product built on top of Galaxy uses), that's available for C/C++ only if you use Cilk, and very few projects do. Java gives you all this fine-tuned concurrency on all processors and OSs. Finally, for large enough systems, the JVM provides better performance than anything else out there, including C++ in most cases. In addition, some interesting concurrent data-structures require garbage collection, and Java's GC is without rival. All of these reasons combined make the JVM the most practical and responsible choice for a large class of software systems.


to post comments

Good opportunity for Java update

Posted Jul 15, 2012 17:24 UTC (Sun) by man_ls (guest, #15091) [Link] (14 responses)

What do you think about these issues, and are they relevant to your project?
  • Memory management: the JVM comes with its own memory allocation scheme, which often is at odds with the memory management of the host operating system.
  • Performance: although not noticeable in benchmarks these days due to JIT, there is still a cost associated with the first execution of code. Also, there are no low-level optimizations possible.
  • Multithreading: synchronization between threads is effective, but usually very slow.
  • Latency: the garbage collector induces noticeable delays and unpredictabilities in performance, for no apparent reason.
I would also ask about the lack of functional programming, just out of curiosity: since before I drifted away from Java-land I always felt there was a need for it, not met by Java. But if you haven't felt the need there is no point.

Good opportunity for Java update

Posted Jul 15, 2012 19:30 UTC (Sun) by bjartur (guest, #67801) [Link]

Well, you could compile Common Lisp to JVM bytecode.

See http://common-lisp.net/project/armedbear/

Good opportunity for Java update

Posted Jul 16, 2012 5:09 UTC (Mon) by pron (guest, #85696) [Link] (12 responses)

> the JVM comes with its own memory allocation scheme, which often is at odds with the memory management of the host operating system

I'm not entirely sure what you mean. Can you elaborate?

> there is still a cost associated with the first execution of code

Sure, JVM applications' startup time is not stellar, but for server apps that are seldom re-started, what's the problem?

> Also, there are no low-level optimizations possible.

Here you're wrong. For most common cases, low level optimizations are better with Java than with C. That's because CPUs are so advanced these days, and there are optimizations that can only be done at runtime. The only low-level optimization not currently available for Java is SSE (though that will change).

> synchronization between threads is effective, but usually very slow

Simple thread synchronization is just as slow as on other platforms. But the JVM gives you access to low-level synchronization like memory fences and CAS-es, and fork-join is plain awesome.

> the garbage collector induces noticeable delays and unpredictabilities in performance

True, but modern garbage collectors are more predictable. Azul even has what they call a "pauseless" GC, that has a (quite low) upper bound on GC pauses.

As for functional programming - the Java language doesn't have that, but plenty of JVM languages do, some of them (particularly Clojure) are quite elegant.

So when you add everything up and also consider JVM monitoring, I think that for most large, performance-wary application, choosing anything but the JVM requires explanation. The JVM is pretty much the default choice.

Good opportunity for Java update

Posted Jul 16, 2012 6:10 UTC (Mon) by wahern (subscriber, #37304) [Link]

How do you go from writing an apologia with explicit admissions that the JVM is suboptimal to "so when you add everything up and also consider JVM monitoring, I think that for most large, performance-wary application, choosing anything but the JVM requires explanation. The JVM is pretty much the default choice."

I've nothing against the JVM and Java per se, notwithstanding integration and compatibility issues, just like I've nothing against using Perl or Fortran.

But it's worth pointing out that people who use proprietary systems--which unfortunately must include Java since the only high-performance examples are proprietary--perceive only the very best inside the black box they're presented. They're handed a bunch of, say, atomic or threading primitives and think, "hey, my vendor chose this so it must be the best algorithm, and they must have implemented it in the sanest, more performant manner possible". And you _have_ to think that way because the only alternative is to become bitter and unenthusiastic about something you have to use day-in and day-out.

The biggest pain regarding Java is GC and unix integration. Neither of those will ever be better than plain C or C++. It's not even worth arguing about. Either they matter or they don't matter, for the most part.

Good opportunity for Java update

Posted Jul 16, 2012 9:07 UTC (Mon) by man_ls (guest, #15091) [Link] (4 responses)

Can you elaborate?
Sure. The virtual machine allocates memory in chunks, for efficiency; but this means that the host operating system is often not able to optimize its use of pages. For example, allocations done using mmap on Linux can be tracked and optimized, but the JVM is a black box as far as the Linux kernel is concerned.

The same goes for all other low-level optimizations, but it is most noticeable for memory allocation. For example there was talk here on LWN about making Xen guests communicate better with the host operating system about their memory necessities. Having a (virtual) operating system inside another is bad enough; having a (virtual) machine inside another will have higher costs regarding memory. Of course, if you are not worried about memory then it is no problem; but e.g. big data applications will have to be concerned (since they eat memory up like crazy).

Simple thread synchronization is just as slow as on other platforms.
Not really true; Java forces the generic semaphore implementation, which cannot compete with things like spinlocks. IIRC using synchronized imposed a penalty of about 100ms.
But the JVM gives you access to low-level synchronization like memory fences and CAS-es, and fork-join is plain awesome.
Thanks, didn't know that. From what I've read fork-join is a variety of map-reduce, which is available for other languages, right? I've used it in JavaScript on MongoDB on a completely different context. Having easy access to the fork and join functions is interesting. Of course, the devil is in the details.
Azul even has what they call a "pauseless" GC, that has a (quite low) upper bound on GC pauses. As for functional programming - the Java language doesn't have that, but plenty of JVM languages do, some of them (particularly Clojure) are quite elegant.
Are you actually using Azul, which is a proprietary implementation? And, are you actually using Clojure or Scala?
So when you add everything up and also consider JVM monitoring, I think that for most large, performance-wary application, choosing anything but the JVM requires explanation.
This point of view is quite common in the Java world, and I can understand why some people complain about it (e.g. the "virtual machine trap" mentioned by nix). Paraphrasing, it goes like this: "the JVM ecosystem solves all problems: first concern is addressed by [x proprietary system], second concern by [y weird functional-only language], third by [z optional Oracle-only feature], fourth by [w optimization on a proprietary operating system]. And the JVM is open source and has several Free implementations!" So on paper "Java" (or rather the JVM) solves all problems. But in reality, those concerns cannot be all solved at the same time. What you usually have is a Java monoculture using tens or hundreds of Apache libraries and a few proprietary ones on the Oracle JVM, running on just one operating system for fear of having it all crumble upon you.

That is not what we are used to in the Free software world. Perl, Python, C, C++ (and even stepchildren like PHP or JavaScript) are fiercely free and include everything needed for development, from day one. If one optimization is available on one platform, it is usually available on all of them. The languages solve only one subset of problems, but that subset is always solved. The toolchain is also free, and there are no proprietary augmentations. Or rather, people do not use them as arguments that "C++ is great (with Intel's compiler)" or "Python rules (with Apple's extensions)". Or rather, a subset of people use these arguments but they are quickly disregarded.

For me, strong typing, compilation and the lack of functional programming are enough to rule Java out, and the other functional JVM languages are not interesting (there is a limited pool of developers). I suspect many other people are in the same situation, as it is not usual to see startups use Java as their primary platform. But I am intrigued about e.g. Twitter using Scala. Perhaps the cost of entry is high but (as you say) for large applications the JVM is a good choice. But as such, it would be positioned as a replacement for C++, not for server-side web applications, if I am reading the situation correctly.

Good opportunity for Java update

Posted Jul 16, 2012 10:55 UTC (Mon) by jezuch (subscriber, #52988) [Link] (3 responses)

> Not really true; Java forces the generic semaphore implementation, which cannot compete with things like spinlocks. IIRC using synchronized imposed a penalty of about 100ms.

I'm curious: where did you get this number? I'm pretty sure that if it was true, I would have noticed it by now.

Good opportunity for Java update

Posted Jul 16, 2012 11:01 UTC (Mon) by man_ls (guest, #15091) [Link] (2 responses)

I did my own benchmark, and I am citing from memory. It was a long time ago, things have probably improved a lot since 2004 or so...

Good opportunity for Java update

Posted Jul 16, 2012 16:58 UTC (Mon) by raven667 (guest, #5198) [Link] (1 responses)

Probably 8-)

One thing to point out is that the environments with the most optimization effort being put into them are the JVM and the various JavaScript runtimes. All the resources spent on research and implementation shows, in the different benchmarks I've seen Java and JS tend to bubble up around the top closer to compiled languages than other VM environments like Perl, Python, PHP or Ruby.

Good opportunity for Java update

Posted Jul 16, 2012 21:33 UTC (Mon) by man_ls (guest, #15091) [Link]

I think that my results might have been 100ns per synchronization, not 100ms. Sorry about the confusion; funny how memory can make you twist things around to suit your arguments. So let's check out how things behave in practice.

This bit of data seems to suggest that things have improved about 4-fold since then. My own results are a bit higher than those on the page (core i5; IcedTea6): 0.035s vs 0.002s for 1M synchronizations, suggesting a cost of about 30ns for each synchronization. Calling a synchronized method is even cheaper, about 20ns. Not too shabby...

My updated code yields:

1M computations: 0.035s
1M synchronized code blocks: 0.004s
1M synchronized method calls: 0.023s

Good opportunity for Java update

Posted Jul 17, 2012 4:29 UTC (Tue) by wahern (subscriber, #37304) [Link] (5 responses)

Azul's technology explodes the amount of RAM used. For example, the minimum memory spec is 32GB:

http://www.azulsystems.com/products/zing/specs

If Azul Zing were as magical as they say it is, the company would have been scooped up by Sun, Oracle, IBM, HP, or even Microsoft long ago, instead of perpetually taking in millions from investors. The company originally implemented their technique in hardware. They specially designed processors with a unique memory architecture. Emulating this architecture on commodity hardware is, clearly, quite expensive.

Zing (like many other techniques with serious downsides ignored by the cult) is just a crutch for Java/JVM zealots eager to believe that Java is the perfect language. (I mean, I'm sure it's amazing work, just not in the way people think it is.)

I don't understand why people keeping buying into Java and JavaScript hype hook, line, and sinker. How many JIT implementations have HotSpot and V8 chewed through? Remember a couple of years ago when Google brought aboard Lars Bak? There was a huge media campaign--highly technical-- describing all the awesome and revolutionary techniques being applied to accelerate JavaScript. Then a year later half of those techniques were silently discarded. People rarely talk about the costs, just the benefits. Benefits are easy to describe (fast!), costs are more difficult to describe (both technically and because it deflates egos and expectations).

Which is not to imply that HotSpot and V8 don't have incredibly powerful and performant JIT compilers. They're amazing feats. Just not quite as amazing as people think they are. Their language specifications impose costs that can't just be willed away, no matter how many engineers you throw at the problem, and no matter how many graduate thesis papers get written.

Good opportunity for Java update

Posted Jul 17, 2012 5:47 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (4 responses)

Azul uses dirty VM tricks to achieve incredible performance feats. They're incredibly inefficient on x86-based hardware, so that's why it requires a lot of RAM. And since most of their customers are financial institutions - they don't care much about it.

Azul also produces their own hardware with a hardware acceleration of garbage collection. It works MUCH better.

Good opportunity for Java update

Posted Jul 17, 2012 8:16 UTC (Tue) by man_ls (guest, #15091) [Link] (3 responses)

Throwing memory at problems is always a good way to speed them up. That is essentially what IBM does with WebSphere and its many frankenstenian by-products: initialize everything in advance, take ages to start and consume lots of RAM. After that it goes fast. It looks like a good idea until you find out that dev machines also have to be top of the line machines with 4 GB RAM -- it may not look much today, but not everyone can buy hardware every two years. Or provide VMs with so much memory.

Good opportunity for Java update

Posted Jul 17, 2012 8:24 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (2 responses)

It's not like they need a lot of ram, it's just that their GC algorithms have very high overhead. In fact, large amounts of ram on normal GCs is a receipt for failure because of lengthy pauses during full collections.

Good opportunity for Java update

Posted Jul 17, 2012 8:29 UTC (Tue) by man_ls (guest, #15091) [Link] (1 responses)

Yes, GC can be a problem if you allocate and free large amounts of RAM (especially in a lot of objects or string manipulations). The IBM trick is to never release the objects (i.e. keep object references around) so the GC does not have to work.

I understand that what Azul does is keep lots of object references so GC doesn't have to do much work. It might be inferred that object allocations must also be penalized, right?

Good opportunity for Java update

Posted Jul 17, 2012 8:44 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]

Nope, they need a lot of ram to make their GC pauseless(in tech speak: to allow the mutator work in parallel with the collector). They do this by simulating forwarding pointers using vm.


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds