LWN.net Logo

Good opportunity for Java update

Good opportunity for Java update

Posted Jul 16, 2012 9:07 UTC (Mon) by man_ls (subscriber, #15091)
In reply to: Good opportunity for Java update by pron
Parent article: Galaxy in-memory data grid released

Can you elaborate?
Sure. The virtual machine allocates memory in chunks, for efficiency; but this means that the host operating system is often not able to optimize its use of pages. For example, allocations done using mmap on Linux can be tracked and optimized, but the JVM is a black box as far as the Linux kernel is concerned.

The same goes for all other low-level optimizations, but it is most noticeable for memory allocation. For example there was talk here on LWN about making Xen guests communicate better with the host operating system about their memory necessities. Having a (virtual) operating system inside another is bad enough; having a (virtual) machine inside another will have higher costs regarding memory. Of course, if you are not worried about memory then it is no problem; but e.g. big data applications will have to be concerned (since they eat memory up like crazy).

Simple thread synchronization is just as slow as on other platforms.
Not really true; Java forces the generic semaphore implementation, which cannot compete with things like spinlocks. IIRC using synchronized imposed a penalty of about 100ms.
But the JVM gives you access to low-level synchronization like memory fences and CAS-es, and fork-join is plain awesome.
Thanks, didn't know that. From what I've read fork-join is a variety of map-reduce, which is available for other languages, right? I've used it in JavaScript on MongoDB on a completely different context. Having easy access to the fork and join functions is interesting. Of course, the devil is in the details.
Azul even has what they call a "pauseless" GC, that has a (quite low) upper bound on GC pauses. As for functional programming - the Java language doesn't have that, but plenty of JVM languages do, some of them (particularly Clojure) are quite elegant.
Are you actually using Azul, which is a proprietary implementation? And, are you actually using Clojure or Scala?
So when you add everything up and also consider JVM monitoring, I think that for most large, performance-wary application, choosing anything but the JVM requires explanation.
This point of view is quite common in the Java world, and I can understand why some people complain about it (e.g. the "virtual machine trap" mentioned by nix). Paraphrasing, it goes like this: "the JVM ecosystem solves all problems: first concern is addressed by [x proprietary system], second concern by [y weird functional-only language], third by [z optional Oracle-only feature], fourth by [w optimization on a proprietary operating system]. And the JVM is open source and has several Free implementations!" So on paper "Java" (or rather the JVM) solves all problems. But in reality, those concerns cannot be all solved at the same time. What you usually have is a Java monoculture using tens or hundreds of Apache libraries and a few proprietary ones on the Oracle JVM, running on just one operating system for fear of having it all crumble upon you.

That is not what we are used to in the Free software world. Perl, Python, C, C++ (and even stepchildren like PHP or JavaScript) are fiercely free and include everything needed for development, from day one. If one optimization is available on one platform, it is usually available on all of them. The languages solve only one subset of problems, but that subset is always solved. The toolchain is also free, and there are no proprietary augmentations. Or rather, people do not use them as arguments that "C++ is great (with Intel's compiler)" or "Python rules (with Apple's extensions)". Or rather, a subset of people use these arguments but they are quickly disregarded.

For me, strong typing, compilation and the lack of functional programming are enough to rule Java out, and the other functional JVM languages are not interesting (there is a limited pool of developers). I suspect many other people are in the same situation, as it is not usual to see startups use Java as their primary platform. But I am intrigued about e.g. Twitter using Scala. Perhaps the cost of entry is high but (as you say) for large applications the JVM is a good choice. But as such, it would be positioned as a replacement for C++, not for server-side web applications, if I am reading the situation correctly.


(Log in to post comments)

Good opportunity for Java update

Posted Jul 16, 2012 10:55 UTC (Mon) by jezuch (subscriber, #52988) [Link]

> Not really true; Java forces the generic semaphore implementation, which cannot compete with things like spinlocks. IIRC using synchronized imposed a penalty of about 100ms.

I'm curious: where did you get this number? I'm pretty sure that if it was true, I would have noticed it by now.

Good opportunity for Java update

Posted Jul 16, 2012 11:01 UTC (Mon) by man_ls (subscriber, #15091) [Link]

I did my own benchmark, and I am citing from memory. It was a long time ago, things have probably improved a lot since 2004 or so...

Good opportunity for Java update

Posted Jul 16, 2012 16:58 UTC (Mon) by raven667 (subscriber, #5198) [Link]

Probably 8-)

One thing to point out is that the environments with the most optimization effort being put into them are the JVM and the various JavaScript runtimes. All the resources spent on research and implementation shows, in the different benchmarks I've seen Java and JS tend to bubble up around the top closer to compiled languages than other VM environments like Perl, Python, PHP or Ruby.

Good opportunity for Java update

Posted Jul 16, 2012 21:33 UTC (Mon) by man_ls (subscriber, #15091) [Link]

I think that my results might have been 100ns per synchronization, not 100ms. Sorry about the confusion; funny how memory can make you twist things around to suit your arguments. So let's check out how things behave in practice.

This bit of data seems to suggest that things have improved about 4-fold since then. My own results are a bit higher than those on the page (core i5; IcedTea6): 0.035s vs 0.002s for 1M synchronizations, suggesting a cost of about 30ns for each synchronization. Calling a synchronized method is even cheaper, about 20ns. Not too shabby...

My updated code yields:

1M computations: 0.035s
1M synchronized code blocks: 0.004s
1M synchronized method calls: 0.023s

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds