|
|
Log in / Subscribe / Register

Facebook's "HipHop" PHP translator

Facebook has announced the release of its "HipHop" tool under the PHP license. "HipHop for PHP isn't technically a compiler itself. Rather it is a source code transformer. HipHop programmatically transforms your PHP source code into highly optimized C++ and then uses g++ to compile it. HipHop executes the source code in a semantically equivalent manner and sacrifices some rarely used features - such as eval() - in exchange for improved performance. HipHop includes a code transformer, a reimplementation of PHP's runtime system, and a rewrite of many common PHP Extensions to take advantage of these performance optimizations." These optimizations are said to double the speed of PHP code.

to post comments

Facebook's "HipHop" PHP translator

Posted Feb 2, 2010 19:34 UTC (Tue) by ncm (guest, #165) [Link] (49 responses)

There's no difference between a "compiler" and a "source code transformer". It doesn't matter if you "transform" PHP to C++, or C++ to assembly, or (as Digital did in the '90s) x86 machine code to Alpha machine code. It's the same process, with the same problems, the same algorithms, the same solutions, and the same outcome. The only difference, for less direct routes, is lost optimization opportunities.

It's disappointing that this thing only doubles the speed of the code, rather than making it 100x faster, or 20x. There should be another 10x worth of easy pickings lying there on the table.

Facebook's "HipHop" PHP translator

Posted Feb 2, 2010 19:38 UTC (Tue) by dskoll (subscriber, #1630) [Link] (16 responses)

I'm also surprised it's only a 2x improvement. That either means the PHP interpreter is very efficient (*snigger*) or that most of the slowness is in library functions implementing all the PHP built-in functions.

(Or they didn't benchmark it properly and something else has become the bottleneck.)

Facebook's "HipHop" PHP translator

Posted Feb 2, 2010 20:09 UTC (Tue) by elanthis (guest, #6227) [Link] (14 responses)

Or, I dunno, the language semantics. But hey, you armchair compiler/runtime enthusiasts go
on thinking that C/C++ gives a 20x performance advantage when executing dynamicly-typed
language semantics. I mean, obviously the compiler can just turn $x+$y into a single
machine op, right? It's not like it has to translate that into a function call that dispatches into
the correct + operator implementation for the operands after possibly converting them to
compatible types at runtime or anything, and then encapsulating the result into a dynamic
type. Nice and simple machine code translation like native C math just isn't even remotely
possible here, guys. That's why there is so much hoopla over tracing JIT engines for
dynamic languages, although a tracing engine for a staticly-typed language is still even
faster.

Facebook's "HipHop" PHP translator

Posted Feb 2, 2010 20:15 UTC (Tue) by JoeBuck (subscriber, #2330) [Link] (3 responses)

It's possible to do better. But to win, you need to have good enough dataflow analysis to know that, for example, $x is always an integer and $y is always a string, so your compiled code can omit the dynamic dispatch.

Facebook's "HipHop" PHP translator

Posted Feb 2, 2010 23:33 UTC (Tue) by bk (guest, #25617) [Link] (2 responses)

Right, so your whole server blows up when a malicious user enters "$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$#!/bin/sh\nrm-rf" instead of "1/1/72".

Facebook's "HipHop" PHP translator

Posted Feb 3, 2010 9:51 UTC (Wed) by alankila (guest, #47141) [Link]

The point of the dataflow analysis is to know when that is possible and act accordingly.

Facebook's "HipHop" PHP translator

Posted Feb 4, 2010 23:48 UTC (Thu) by efexis (guest, #26355) [Link]

No, but now I have to horizontally scroll to read this thread... thanks for that :-/

Facebook's "HipHop" PHP translator

Posted Feb 2, 2010 20:52 UTC (Tue) by dskoll (subscriber, #1630) [Link] (9 responses)

I mean, obviously the compiler can just turn $x+$y into a single machine op, right?

Did you read the article? Among other things, it claimed:

Whenever possible our generated code uses static binding for functions and variables. We also use type inference to pick the most specific type possible for our variables and thus save memory.

So yes, I would think that in a lot of cases, it could generate very fast code for $x+$y.

Facebook's "HipHop" PHP translator

Posted Feb 3, 2010 0:09 UTC (Wed) by elanthis (guest, #6227) [Link] (8 responses)

Reading an article and a site's home page does not make you an expert,
clearly. Let's pretend I don't have actual real experience with this kind
of tech and just think this through logically instead of quoting soundbites
from a web page. If you are writing:

code like:

$x = 1;
$y = 3;
$z = $x + $y;

Then sure, it can use type inference. When you have something like:

function foo($x, $y) {
return $x + $y;
}

Then the compiler has no freaking clue what is going to be passed in.

Generally, I find, people do not create variables to hold static values,
but instead they tend to hold data from various other sources, many of
which by PHP semantics simply cannot have their types inferred. Some code
can be highly optimized by such techniques, some cannot, and the latter
just happens to be more common in at least the apps I've worked with.

Unless the compiler is compiling multiple versions of these functions (like
how a tracing JIT compiler might), then most of the code in a real-life app
(not a micro-benchmark) is going to be inelligable for type lowering, and
the runtime has to fall back on full dynamic type semantics.

This is one reason that, no matter what advances the dynamic language folks
working on JS or Python or whatever might possibly make, the speed of a
statically-typed language can always be greater by just using the same
techniques the dynamic runtimes use but without the type guards (for
tracing implementations) or dynamic fallbacks.

Facebook's "HipHop" PHP translator

Posted Feb 3, 2010 10:31 UTC (Wed) by PO8 (guest, #41661) [Link] (3 responses)

Static analysis techniques used in the last 20 years can generally infer the type of parameters to a procedure or function call from context. Often they can do it even without looking at the whole program. They are good at looking at small clues and following chains of reasoning.

I'm a huge fan of static typing, but from an efficiency point of view it's just not a huge win in 2010. There are too many well-known clever tricks for making dynamically-typed languages run fast. I like static typing because it serves as a kind of formal analysis that increases the odds of my writing a correct program. If it didn't do that I'd have ditched it for dynamic typing long ago.

Facebook's "HipHop" PHP translator

Posted Feb 3, 2010 12:06 UTC (Wed) by HelloWorld (guest, #56129) [Link] (2 responses)

If it is "not a huge win in 2010", then why are statically typed languages still vastly faster in any benchmark?
For example, compare JavaScript in Google's V8 to Haskell:
http://shootout.alioth.debian.org/u32/benchmark.php?test=...
JavaScript is between three and onehundred times slower. And this is a state-of-the-art VM using tracing and all that.

Facebook's "HipHop" PHP translator

Posted Feb 3, 2010 14:50 UTC (Wed) by darthscsi (guest, #8111) [Link] (1 responses)

Google's V8 implementation of JavaScript is not the fastest dynamically
typed language implementation. PLT Scheme is hardly the fastest
implementation Scheme and it handily beats Google's V8.

http://shootout.alioth.debian.org/u32/benchmark.php?
test=all&lang=v8&lang2=mzscheme

I wish Chez Scheme was on the list, as it's the fastest scheme I know of,
but it is a commercial implementation. It does all sorts of tricks with
object padding, stashing type tags for the common types inside pointers,
placing objects in per type regions (middle bits become type tags), type
inference and unboxing, and more.

Facebook's "HipHop" PHP translator

Posted Feb 5, 2010 20:47 UTC (Fri) by HelloWorld (guest, #56129) [Link]

Yes, perhaps it is possible to make a dynamically typed language almost as fast as a statically typed one. The question is, why bother? It makes the implementation much more complex in order to achieve the questionable goal of dynamic typing. I don't think that's a good idea.

Facebook's "HipHop" PHP translator

Posted Feb 3, 2010 14:08 UTC (Wed) by dskoll (subscriber, #1630) [Link] (3 responses)

function foo($x, $y) { return $x + $y; }

Then the compiler has no freaking clue what is going to be passed in.

Possibly. But still, I'd expect a better than 2x speedup. Parsing the PHP code, especially all the included and required files has to be pretty slow.

Also, the "+" operator in PHP can only work on numbers. So $x and $y are either integers or floats; sorting out the possible cases is pretty quick.

Facebook's "HipHop" PHP translator

Posted Feb 3, 2010 15:10 UTC (Wed) by liljencrantz (guest, #28458) [Link] (2 responses)

Parsing is done ahead of time, so it is in fact more or less free. The plus operator also works on arrays.

Facebook's "HipHop" PHP translator

Posted Feb 3, 2010 15:18 UTC (Wed) by dskoll (subscriber, #1630) [Link] (1 responses)

Parsing is done ahead of time, so it is in fact more or less free.

Eh? Not in standard PHP. Parsing is free only if you're using an opcode cache like APC or similar.

The plus operator also works on arrays.

So it does. Still, it's relatively restricted. It doesn't work on strings or objects.

Facebook's "HipHop" PHP translator

Posted Feb 4, 2010 14:17 UTC (Thu) by jond (subscriber, #37669) [Link]

I think it can be assumed a site dealing with "400 billion PHP-based page
views every month" is using an opcode cache already.

Facebook's "HipHop" PHP translator

Posted Feb 3, 2010 9:17 UTC (Wed) by srepmub (guest, #47230) [Link]

the speedup one can get from compiling a (restricted) dynamic language varies greatly, depending on what a given program is doing. if a program is IO or memory bound, or string intensive (serving web pages sure sounds like it would be string intensive), there may not be a significant speedup at all. the example set of shed skin, a (restricted-)Python-to-C++ compiler clearly demonstrates this: while most programs become a lot faster (up to 350 times), some actually become slower.

Facebook's "HipHop" PHP translator

Posted Feb 2, 2010 22:00 UTC (Tue) by ejr (subscriber, #51652) [Link] (5 responses)

Remember, Scheme->C worked from a well-defined language. This tool targets the equivalent for PHP, which is, um, not easily defined.

Of course PHP->Scheme or PHP->Lisp would have benefited from all those years of optimization in other systems. But it would be seen as a dead-end and impossible to use in "real life". Some enterprising PHP hoster could base such a system off of the HipHop work, show performance advantages over other hosters, and...

Lisp

Posted Feb 2, 2010 23:06 UTC (Tue) by ncm (guest, #165) [Link]

It would be seen as very, very funny -- particularly by Lisp fans. Or, possibly, just in bad taste. A Forth -> Lisp translator would be funnier.

Facebook's "HipHop" PHP translator

Posted Feb 2, 2010 23:48 UTC (Tue) by chax (guest, #52122) [Link] (3 responses)

And this is what is Roadsend PHP Compiler doing (php -> scheme (bigloo) ->
c). Roadsend is now producing a new compiler based on llvm.

Facebook's "HipHop" PHP translator

Posted Feb 3, 2010 1:07 UTC (Wed) by ejr (subscriber, #51652) [Link]

I suspect many of these would better be served by working atop a JavaScript runtime rather than dropping all the way to LLVM. The compiling JS runtimes already handle runtime specialization that would benefit PHP, and JS likely is a closer semantic match type-wise. LLVM's specific strengths over other JIT engines are in expressing vector operations, imho.

But I'm glad someone went a more sensible initial route to re-use an existing dynamic language implementation (although gambit's more my preference). Thanks!

Lisp comedy

Posted Feb 3, 2010 1:48 UTC (Wed) by ncm (guest, #165) [Link] (1 responses)

It writes itself.

Lisp comedy

Posted Feb 3, 2010 10:34 UTC (Wed) by PO8 (guest, #41661) [Link]

(! LOL)

Facebook's "HipHop" PHP translator

Posted Feb 2, 2010 22:01 UTC (Tue) by xorbe (guest, #3165) [Link] (22 responses)

> There's no difference between ...
> The only difference ...

Anyways, most people consider a compiler to generate something a bit lower level than C++, and would consider "source code transformation" to be more informative about its true nature.

Facebook's "HipHop" PHP translator

Posted Feb 2, 2010 22:19 UTC (Tue) by ccshan (guest, #2723) [Link] (21 responses)

Your quotes are out of context; please read what you quoted again. There's no difference between a compiler and a source-code transformer. There are differences between a compiler/source-code-transformer that uses less direct routes and one that uses more direct routes. A compiler can use less or more direct routes just as a source-code transformer can use less or more direct routes, because, after all, there is no difference between a compiler and a source-code transformer.

Separately, you suggested that a compiler is considered "to generate something a bit lower level than C++". According to this definition, and assuming that assembly code is "lower level than C++", you would call a preprocessor for assembly code (such as HLA "the High-Level Assembler" perhaps) a compiler, right? After all, it generates assembly code.

Facebook's "HipHop" PHP translator

Posted Feb 3, 2010 4:41 UTC (Wed) by xorbe (guest, #3165) [Link] (20 responses)

What? Yeah so technically the terms are interchangeable, but I'd use compiler, source transform, and preprocessor as others would expect within a too change.

PHP code -> source code transform -> g++ compiler

That's how 99.9% would talk about it.

PHP code -> compiler -> compiler

That's you confusing the 99.9%! That's it, I'm referring to all input/output programs as transmogrification binaries!

A compiler's a compiler, no matter how small

Posted Feb 3, 2010 7:11 UTC (Wed) by ncm (guest, #165) [Link] (19 responses)

99.9% of statistics are made up on the spot.

Regardless, a program that takes source code and gives you a binary executable is a compiler. If it happens to use G++ inside none of the user's business. So, 99.9% of people who have the faintest idea what a compiler is (i.e., 0.1% of normal people) would call it a compiler, because that's what it is. Be proud, you're in the remaining 0.001% of the population who call cars "four-wheeled self-propelled vehicles" and hamburgers "layered bovine handmeals" or something.

A compiler's a compiler, no matter how small

Posted Feb 3, 2010 10:25 UTC (Wed) by dgm (subscriber, #49227) [Link] (18 responses)

Close, but no cookie. A program that takes source code and gives you an executable binary is usually a compiler+assembler+linker, or at least a compiler+linker.
There's more than opcodes in an executable file.

A compiler's a compiler, no matter how small

Posted Feb 3, 2010 19:55 UTC (Wed) by ncm (guest, #165) [Link] (17 responses)

Gcc produces binary executables. Gcc uses gas and gold underneath, and that's none of my business. Gcc is a compiler.

Enjoy your layered bovine handmeals.

Compiling is not translating

Posted Feb 4, 2010 22:15 UTC (Thu) by man_ls (guest, #15091) [Link] (16 responses)

The difference is important. I am currently trying to write a program to translate Java code to Python code, and no sane mind would call that a compiler. Even if it generates code and then tries to run it. I don't know if "code transformer" is a good description, but "compiler" is not it. I prefer calling the process "translation" myself.

My mother makes hamburgers with 50% beef, 50% pork. They are delicious. You may choose to call them something else than "hamburgers" and I will not be offended.

Compiling is not translating

Posted Feb 4, 2010 23:19 UTC (Thu) by nix (subscriber, #2304) [Link] (15 responses)

Indeed. Compilers are a special kind of translator, not vice versa. (GCC
is a translator: so is your Java->Python transformer.)

Compiling is not translating

Posted Feb 4, 2010 23:22 UTC (Thu) by ccshan (guest, #2723) [Link] (14 responses)

> Compilers are a special kind of translator
So, which special kind is that?

Compiling is not translating

Posted Feb 4, 2010 23:53 UTC (Thu) by nix (subscriber, #2304) [Link] (13 responses)

A translator which targets something which can be executed, typically.
Things are blurred somewhat when one considers compilers that yield
various forms of bytecode which are then interpreted by something else.

Translators can yield all sorts of output: e.g. IDL 'compilers' which in
one mode of operation can take IDL descriptions and yield (say) C++ header
files (which do not themselves produce executable code, in general). Hell,
even TeX counts -- although it's a very strange translator (parser? what's
that? we don't use normal terminology here, we have the 'mouth'
and 'stomach' instead) it even has an optimization pass of sorts
(reflowing text) and *it* certainly doesn't yield executable code.
(Although dvips can take the DVI file TeX produces and create PostScript
out of it, which is of course itself a full-blown computer language which
is then typically interpreted by the GhostScript translator to yield a
bunch of page images.)

Compiling is not translating

Posted Feb 5, 2010 1:44 UTC (Fri) by ccshan (guest, #2723) [Link]

I don't see what you mean by "execute". For example, I'm not sure why you think DVI is not executable but PostScript is. Maybe you mean that DVI is not Turing-complete but PostScript is, but lots of things are compiled to finite-state machines (if only large ones).

And C++ header files *are* Turing-complete, if we ignore implementation limits on template instantiation (just as we ignore physical-computer limits on memory use).

Compiling is not translating

Posted Feb 5, 2010 9:36 UTC (Fri) by anselm (subscriber, #2796) [Link]

TeX is no more or less of a compiler than the Sun Java »compiler« is. Both create machine code for a »virtual« processor that is then executed by a specialised interpreter. Read sections 583-591 of »TeX: The Program« by Donald E. Knuth.

Compiling is not translating

Posted Feb 5, 2010 15:44 UTC (Fri) by marcH (subscriber, #57642) [Link] (10 responses)

A translator which targets something which can be executed, typically. Things are blurred somewhat when one considers compilers that yield various forms of bytecode which are then interpreted by something else.

The "executable" road will never reach any satisfying definition. It will always be "blurred somewhat".

You might have a bit more success with "high-level vs low-level", as found in Wikipedia. Maybe.

Compiling is not translating

Posted Feb 5, 2010 17:13 UTC (Fri) by man_ls (guest, #15091) [Link] (9 responses)

It is not so blurred; since modern processors have a native mode of execution there is an absolute: a native executable format. Even if internally they translate the instruction set to microcode, that layer is inaccessible from the outside. So there is at least one public native executable format per processor, although there can be more (for instance x86 and x86_64). "Compiling" is going from a human-readable file to this executable format. Merriam-Webster has a clear definition:
2 : a computer program that translates an entire set of instructions written in a higher-level symbolic language (as C) into machine language before the instructions can be executed
From there up files are not literally executable. However judging by the similarity to a processor instruction set people can create virtual machines, bytecode languages and the like. So moving from human-readable files to machine-readable files can be called "compiling" by generalization. Still there is no blurring: going from human-readable to human-readable I would call "translating", and from machine-readable to machine-readable I would call "emulation".

Compiling is not translating

Posted Feb 5, 2010 17:41 UTC (Fri) by ccshan (guest, #2723) [Link] (8 responses)

The two blurry parts of your definition of "compile" are what is human-readable and what is machine-readable. If whether a layer is "inaccessible from the outside" is relevant as you suggested, then would a translator from PHP to C++ become a compiler as soon as someone builds a sealed box that takes C++ code as input and runs it (or even type-checks it)? If whether a language is "similar" to a processor instruction set is relevant as you suggested, then is JVM bytecode really all that similar to a processor instruction set and all that dissimilar to C++ code? (Compare: LISP machine.) Also, humans can understand machine code.

Compiling is not translating

Posted Feb 5, 2010 18:07 UTC (Fri) by man_ls (guest, #15091) [Link] (7 responses)

I don't agree with the last part: humans can maybe understand machine code, but not read it in the usual sense. Just as there is usually no ambiguity between text files and binary files.

As to the rest, I'm fine with it: as far as something can be said to be human-readable and something else can be said to be machine-readable, "compiling" is converting between the two. A PHP -> C++ translator, combined with a C++ compiler, yields a PHP compiler.

Compiling is not translating

Posted Feb 5, 2010 18:14 UTC (Fri) by ccshan (guest, #2723) [Link] (6 responses)

> humans can maybe understand machine code, but not read it in the usual sense. Just as there is usually no ambiguity between text files and binary files.
So, which usual sense is that?

> as far as something can be said to be human-readable and something else can be said to be machine-readable, "compiling" is converting between the two
Given that all programming languages are machine-readable, no distinction has been made between translators and compilers. A PHP->C++ translator is a PHP->C++ compiler.

Continuing with the obvious

Posted Feb 5, 2010 18:45 UTC (Fri) by man_ls (guest, #15091) [Link] (5 responses)

Text files are strings of values that map to letters and symbols. ASCII for instance is restricted to 0-9, A-Z, a-z, some punctuation marks and a few other symbols; Unicode is quite wider but still maps code points to printable symbols. Binary files are streams of bytes with no such intent.

For "machine-readable" I obviously mean "readable to execute". While "human-readable" is "readable to understand". Again, the intent is what counts. And before you continue pointing out silly flaws: unlambda or brainf*ck are not really made to be readable, but are still meant to be understood by people.

Continuing with the obvious

Posted Feb 6, 2010 0:47 UTC (Sat) by marcH (subscriber, #57642) [Link] (3 responses)

> Text files are strings of values that map to letters and symbols. [...] Binary files are streams of bytes with no such intent.

Sorry but the text/binary boundary does not work any better. It's just a uninteresting and irrelevant matter of representation and encoding. You can store very high-level languages in binary and very low-level languages in ASCII.

Consider assembly for instance. It is text. So following your logic "gcc" is a just a "translator" and "as" is a compiler?

Text and assembly

Posted Feb 6, 2010 10:57 UTC (Sat) by man_ls (guest, #15091) [Link] (2 responses)

Let me answer both of you at the same time since you make similar arguments. Text encodings are not just a trivial matter of representation, any more than the alphabet is just a trivial matter of drawing pretty lines. Text encodings are the essence of symbolic representation with computers. The level of the language is not really relevant.
Consider assembly for instance.
Assembly language is not an instance; it is a special case of a language designed to mimic machine instructions and still be understandable by humans. To the extent that "return" is as much a symbol as "0x4e75", you are right that both things are equivalent; but assembly is not a real language so much as a mere transposition of opcodes. In fact, it is so much a boundary case that a special term has been invented for the conversion between assembly and machine code, also called "assembly" -- and this process is so distinct that it gives its name to the language itself, "assembly language". Aren't words fun?

Text and assembly

Posted Feb 6, 2010 22:22 UTC (Sat) by ccshan (guest, #2723) [Link] (1 responses)

You have not answered any question, even explicit questions about your supposedly non-blurry distinction between compilers and mere translators, nor dispelled any concern, even explicit concerns about your distinction between compilers and mere translators being blurry. Questions to answer are signaled by the question-mark (?) punctuation.

Text and assembly

Posted Feb 7, 2010 0:26 UTC (Sun) by man_ls (guest, #15091) [Link]

I'm sorry that you feel that way, but I have tried to clarify (not answer) what I felt were the important questions (implicit or explicit) in the discussion. The rest has no interest for me, and I would probably be the wrong person to answer them. I did learn something in the discussion and the issue of compiling vs translating vs mere assembling is now clearer for me, so thanks for that.

Continuing with the obvious

Posted Feb 6, 2010 0:47 UTC (Sat) by ccshan (guest, #2723) [Link]

> Binary files are streams of bytes with no such intent.

I assume that by "such intent" you mean intent to be "readable to understand". But do you mean a file as a sequence of bytes or a file as displayed by, say, the Linux console to the human retina? If the former (which is the text/binary distinction that file(1) tries to make), then a text file (such as 116 104 105 115) is no more intended to be "readable to understand" than a binary file. If the latter, then isn't README.gz a text file? In any case, I don't see an answer to the question "humans can maybe understand machine code, but not read it in *which* sense".

> For "machine-readable" I obviously mean "readable to execute". While "human-readable" is "readable to understand". Again, the intent is what counts.

Isn't the PDP-11 machine language intended to be "readable to understand"? If someone were to design a programming language to be "readable to understand", but that language ended up isomorphic to another language designed to be "readable to execute", then would translators that produce this/these language(s) be compilers? (It has happened.) Is an assembly-language preprocessor a compiler but cpp a mere translator?

Given that all programming languages are intended to be "readable to execute", no distinction has been made between translators and compilers. A PHP->C++ translator is a PHP->C++ compiler.

Facebook's "HipHop" PHP translator

Posted Feb 2, 2010 23:28 UTC (Tue) by bk (guest, #25617) [Link] (1 responses)

Let:

A be a theoretical 100% perfectly optimizing C++ to machine code compiler
B be a theoretical 100% perfectly optimizing PHP to C++ "code transformer"/compiler
C be a theoretical 100% perfectly optimizing PHP to machine code compiler

...would it not be true that A + B = C ?

(IOW, there need not necessarily be 'lost optimization opportunities')

100%

Posted Feb 3, 2010 0:33 UTC (Wed) by ncm (guest, #165) [Link]

No. If the A knew that its input could only come from B, then A could apply additional optimizations forbidden for the general case. However, the construct "100% perfectly optimizing" is strictly meaningless; many optimizations imply tradeoffs, potentially slowing down unlikely sequences in exchange for improving more common ones.

Facebook's "HipHop" PHP translator

Posted Feb 3, 2010 17:06 UTC (Wed) by marcH (subscriber, #57642) [Link]

> There's no difference between a "compiler" and a "source code transformer".

A "source code transformer" is a very special class of compiler, because it produces something I can read.

If some people can read assembly code, then they have a different definition of "source code transformer" than me. That's fine, because I never talk to them anyway.

Facebook's "HipHop" PHP translator

Posted Feb 2, 2010 19:42 UTC (Tue) by brad@vaxxine.com (guest, #6399) [Link] (2 responses)

How does this compare with APC which simply caches the P-code? I know APC gives greater than 50% speedup.

Facebook's "HipHop" PHP translator

Posted Feb 2, 2010 20:26 UTC (Tue) by oseemann (subscriber, #6687) [Link]

APC merely caches the bytecode. The speedup is achieved through omitting the compilation of the php script sources for each request. Without a bytecode cache the complete php application, which can be hundreds of kilobytes of code, will have to be compiled for each request.

Facebook's "HipHop" PHP translator

Posted Feb 2, 2010 21:13 UTC (Tue) by cowsandmilk (guest, #55475) [Link]

based on the contributions Facebook has made in the past to APC (eg Brian Shire's work), I wouldn't be surprised if the 2x speedup was actually in reference to using that on their servers, not naively running PHP.

Other compilers

Posted Feb 2, 2010 20:47 UTC (Tue) by linuxbox (guest, #6928) [Link]

How does it compare with:

http://www.roadsend.com/home/index.php?pageID=compiler (now open source, I guess)

and

http://www.phpcompiler.org/

?

Facebook's "HipHop" PHP translator

Posted Feb 2, 2010 20:49 UTC (Tue) by cdamian (subscriber, #1271) [Link] (2 responses)

My guess is that we will see a lot more closed source PHP projects now. For many people the speed gain will be secondary.

Facebook's "HipHop" PHP translator

Posted Feb 2, 2010 20:53 UTC (Tue) by tzafrir (subscriber, #11501) [Link] (1 responses)

Source code obfuscators have been around for a while.

Facebook's "HipHop" PHP translator

Posted Feb 2, 2010 23:05 UTC (Tue) by drag (guest, #31333) [Link]

I can compile Python into bytecode and distribute that if I wanted to hide
the source code.

I expect that there is a similar way to do it with PHP.

Facebook's "HipHop" PHP translator

Posted Feb 2, 2010 21:59 UTC (Tue) by paragw (guest, #45306) [Link] (44 responses)

There is a very decent solution to speed up PHP - Java! Caucho Quercus is 100% Java based, open source PHP runtime that runs most PHP scripts unmodified. You get all standard J2EE features for free - distributed sessions, clustering ability etc. and you also get JIT for free. It also allows you to use Java classes in PHP.

Plus the Quercus runtime runs on any decent J2EE app server - Resin, Weblogic, Tomcat, Jboss etc.

Wonder if Facebook folks hated Java so much to ignore such an attractive solution or if the C++ translator provides significantly more speed increase over Java solution. Would be good to hear if they had any pros/cons.

Java

Posted Feb 2, 2010 22:19 UTC (Tue) by ncm (guest, #165) [Link] (38 responses)

One might guess that when trying to make something faster, translating it into something inherently slow seemed unlikely to be the best approach.

Java

Posted Feb 2, 2010 23:18 UTC (Tue) by paragw (guest, #45306) [Link] (32 responses)

Java is not really all that slower than C++ - See here
http://blogs.azulsystems.com/cliff/2009/09/java-vs-c-perf...
again.html for extensive proof.

For server side apps especially so.

Java

Posted Feb 3, 2010 0:29 UTC (Wed) by clugstj (subscriber, #4020) [Link] (31 responses)

Extensive proof? Stuff on some guy's blog? You have quite a different idea of what "extensive proof" is than others (at least I).

Java

Posted Feb 3, 2010 0:50 UTC (Wed) by paragw (guest, #45306) [Link] (30 responses)

You should have at least read the blog post in its entirety. It's not some guy's blog - the guy wrote better part of the Sun HotSpot JVM and many people contributed to the post with sample programs, various rebuttals and stuff.

If that is not extensive for you I am not sure what is. (But read the post first, completely before commenting.)

Java

Posted Feb 3, 2010 2:08 UTC (Wed) by ncm (guest, #165) [Link] (29 responses)

It always kills me when somebody compares the performance of some language or other to C and then insists they've compared it to C++ too. Here's what I wrote back when I first saw that "extensive" comparison.
Each posting or comment that uses the construction "C/C++" invalidates itself. C and C++ are different languages with very different performance characteristics. It used to be that scientists and engineers used Fortran for serious calculations because their Fortran programs and libraries ran much faster than C programs. Nowadays, serious work is done in C++ because it's notably faster than Fortran. This is not because Fortran got slower.

For similar reasons, we may also dispose of comparisons with C. A Java (or Lisp) program "as fast as C" or "faster than C" has achieved little. That this is seen, instead, as a major achievement is telling. "As fast as C" is a low bar.

...

It's not surprising to find these sorts of misunderstandings promoted on a Java forum. People who are serious about performance aren't using Java, and people who are using Java are, perforce, not serious about performance. If you would like to be serious about performance, you will need to look elsewhere.

Java

Posted Feb 3, 2010 2:55 UTC (Wed) by paragw (guest, #45306) [Link] (21 responses)

<
People who are serious about performance aren't using Java, and people who are using Java are, perforce, not serious about performance. If you would like to be serious about performance, you will need to look elsewhere.
>

This is puzzling to me. You state it as an absolute fact that the 'else' performs better than Java but I think when we say performance we should first be clear about what it really means - raw execution speed, how fast it feels (UI), how much memory is required to do a certain task in certain time etc.

Even if we stick the to the easiest to deal with metric - raw execution speed - the JVM is not interpreted anymore - it has a JIT which is directly executing native instructions on the CPU. How could that be any slower than the static compilation? I would argue a decent JIT should run the code faster than the statically compiled native binary because it can make use of the run-time information - for example profile it on the fly and optimize later invocations.

The article I pointed to covers nearly all aspects of performance and even if you ignore a whole bunch of other stuff that favors Java (speed of development, security/sandboxing, maturity of development/debugging tools etc.) - writing server side applications in Java is a no-brainer decision in my opinion and it has no earth shattering performance penalties that one needs to be actively worried about.

But please explain to me purely on technical grounds - why would a JIT JVM be any slower than native code? Or do we need to look at other factors to complete the picture (such as some language features that allow for better code generation that is not possible to do with JIT, native code compilers are simply better due to whatever reasons etc.)?

(BTW I am totally willing to ignore the memory factor - it's just not that big of a deal for server side applications for obvious reasons.)

Java

Posted Feb 3, 2010 7:39 UTC (Wed) by ncm (guest, #165) [Link] (10 responses)

This isn't really the forum for it, but (1) Java bytecode semantics are badly designed for machine-code optimization; (2) JVMs have a huge memory footprint that plays hell with caches at all levels; (3) garbage collected memory plays hell with caches at all levels; (4) Java doesn't offer the control needed to minimize cache footprint of user-level operations; (5) Java is not expressive enough to enable serious high-level library optimizations; finally, (6) "should" is very different from "does".

You may well say Java is "fast enough for server-side applications", meaning, I suppose, faster than PHP or Ruby. But that is not being "serious about performance" at all, so it amounts to changing the subject. Changing the subject is what usually happens right about here in the discussion, so you've jumped the gun a bit. In any case, some people need good performance (however you measure it) on their servers, too. Java is not where they find it.

Java

Posted Feb 3, 2010 9:39 UTC (Wed) by renox (guest, #23785) [Link] (1 responses)

(3) depends of the GC used obviously..
There has been research into 'VM aware' GCs which improves cooperation between the virtual memory manager and the GC, as far as I know they aren't used though as they require modification of the host OS..

GC

Posted Feb 3, 2010 20:03 UTC (Wed) by ncm (guest, #165) [Link]

"There has been research into", in a discussion of demonstrable performance, always means "I'd like to change the subject, please."

Java

Posted Feb 3, 2010 10:02 UTC (Wed) by alankila (guest, #47141) [Link] (1 responses)

These are all good criticisms. These are probably part of the reason why Java runs at about 1/2 to 1/3
the speed of C, rather than surpassing the speed of C.

I know the speed fairly well because I'm working on a project that is a Java port of a C++ source base,
with identical implementation, just translated by hand. (Or well, it used to be: Java is so much nicer
to work with than C++ that I have since abandoned the C++ codebase.) My experience also agrees
with various references for the speed of Java, such as the debian alioth shootout tests that also fairly
consistently place the Java runtime speed around that figure.

Memory usage is much worse than the C++ version, but still, I can spare ~100 MB of memory
instead of 20 MB of memory for this application. A bit like operating system, Java uses as much
memory as you give to it, and additional memory is spent in making the GC cycles larger, but
otherwise doesn't really benefit it. However, too little memory will cause it to crash, which forces me
to err on the safe side. The runtime system is clearly more memory hungry, but perhaps not more
than factor of 2-3 again.

Java

Posted Feb 3, 2010 16:28 UTC (Wed) by dlang (guest, #313) [Link]

on modern CPUs where the memory access is an order of magnitude slower than cache access, a larger memory footprint translates pretty directly into worse performance.

one of the complaints about Java is that it doesn't give you the control needed to avoid this.

Java

Posted Feb 3, 2010 22:57 UTC (Wed) by robilad (guest, #27163) [Link] (5 responses)

* (1) Java bytecode semantics are badly designed for machine-code optimization;

Java bytecode is a transfer format for executable code written in Java. JVMs take bytecode and transform it into internal representations suitable for general purpose and machine-specific optimizations. You may want to read up on the scientific papers describing Hotspot client and server JIT compiler from this decade.

(2) JVMs have a huge memory footprint that plays hell with caches at all levels;

Modern JVMs in the embedded space take up a few hundred kilobytes, at worst. Do check out the scientific papers on Squawk, for example.

(3) garbage collected memory plays hell with caches at all levels;

Well, no, not in this decade ;). You must have missed the research from the JiksRVM team around locality & garbage collection from back in 2004. Copying garbage collection have potential a performance advantage over free-list allocators, as they place objects with similar lifetime expectations and access patters together. Combined with online object reordering (check which fields are hot, and lay out chains of hot objects together in memory during the copy phase) garbage collection in a dynamically executed language like Java can actually have much better locality then a static one.

(4) Java doesn't offer the control needed to minimize cache footprint of user-level operations;

The only hit for "minimize cache footprint of user-level operations" on Google is this thread, so I think you're alone with that claim, whatever it is supposed to mean.

(5) Java is not expressive enough to enable serious high-level library optimizations;

And again, the only hit for "serious high-level library optimizations" is your post in this thread, so I think you're not particularly convincing, whatever it is you are trying to say.

Java

Posted Feb 3, 2010 23:25 UTC (Wed) by ncm (guest, #165) [Link] (4 responses)

Ah, "research", "potential", "can actually have", google fun.

Your remarks fail to explain why Java programs, despite all of the above, still run so much slower even than C and C++ programs. (Paragw has the "speed doesn't matter" field covered, so you don't need to go there.) I could be all wrong about why Java programs are so slow; certainly there are people who know more about Java implementations than I do. Maybe you can elucidate the true reasons.

Java

Posted Feb 4, 2010 11:04 UTC (Thu) by robilad (guest, #27163) [Link] (1 responses)

Given that Java programs don't run much slower then C or C++ programs there is nothing for me to explain. It's like asking me to explain why Linux doesn't work, or similar 'Get the facts' type of marketing mythology.

With Java now steadily extending its foothold in the both the embedded and the high performance computing world, I'd suggest that actually getting C++00X out of the door sometime this decade is a better strategy to keep C++ as relevant as it today is in the coming years.

Focusing on rapidly improving the technology rather then on smearing the 'competition' worked great for the past decade for Java, after all.

Java

Posted Feb 4, 2010 20:59 UTC (Thu) by ncm (guest, #165) [Link]

Denial is a popular approach too.

Slow Java programs

Posted Feb 4, 2010 22:44 UTC (Thu) by man_ls (guest, #15091) [Link] (1 responses)

They are so slow because of the reasons you mention, basically because of the JVM. For a bit of fun you can compile your Java code to a binary with gcj and see how much faster it gets.

By the way, AFAIK gcj uses no intermediate language, so it is a proper compilation and not a transformation or a translation.

Slow Java programs

Posted Feb 5, 2010 5:55 UTC (Fri) by ncm (guest, #165) [Link]

I would say it's all three.

Java

Posted Feb 3, 2010 16:53 UTC (Wed) by nye (guest, #51576) [Link] (9 responses)

>BTW I am totally willing to ignore the memory factor - it's just not that big of a deal for server side applications for obvious reasons

What are those obvious reasons?

Memory use is a major deciding factor for me when it comes to a choice of server-side software, since the cost of VPSs or dedicated server rental is largely proportional to the amount of RAM you want. I'm certain this is not an uncommon concern, especially with the current popularity of virtualised servers.

In this case, Java is only worthwhile if a) you have no other option or b) you're running a lot of applications, all of which are JVM based, so at least the humungous cost of the JVM can be amortised, if not the generally increased memory usage of the individual applications.

Java

Posted Feb 3, 2010 18:23 UTC (Wed) by paragw (guest, #45306) [Link] (8 responses)

I think you answered your own question. But you have to consider that the memory footprint of a JVM is nowhere near humongous - not at all for server applications which need some features that you get for free with a JVM/AppServer based solution.

But consider this -

1) We run multiple applications on Java - the app server runs with a maximum heap size of 200M and it does lots of things - provides all the app server functions, load balancing/failover, in memory session replication, caching, connection pooling etc. If you were to build all of this in native code - do you think it will work in few Kb of memory? For the record we have native code apps that do some of these things and routinely consume hundreds of Megabytes of memory.

2) Memory is cheap. So if your Java application requires 100Mb more per server and you have 4 of them - what is the cost of that extra used memory?

3) Tell me where I can buy a application server written in C or C++ that gives me all of what the JVM based App servers provide. Then tell me what it costs to have a solution based off of that - consider the feature parity with java app servers that you need, also consider the costs of development time, programmer costs, dev/debug/profiling tools, off-the-shelf component availability etc.

So no memory is nowhere near being the most common concern when developing Enterprise applications. All the other factors are - slowness is one of them and Java ain't slow for server apps. And all the overheads are a) mostly amortized by the features you get with a Java based app server and b) the equivalent solutions built in native code are not going to use much less resources - not enough to care about anyways.

Sure you can make a case for niche apps with specific requirements to be better off done in C/C++/what have you but for everything else Java is just fine.

Changing the subject

Posted Feb 3, 2010 19:51 UTC (Wed) by ncm (guest, #165) [Link] (4 responses)

Yes, that was the place where everybody else changes the subject, too.

But, pray tell, what could "Java ain't slow for server apps" possibly mean, in a discussion of language performance? Server apps are database-bound anyway? Server app performance is limited by network roundtrip time? Server app users aren't actually customers, so they had better learn to wait and like it? My experience is that the former excuses shade insensibly into the later ones.

But this is drifting off topic.

Changing the subject

Posted Feb 3, 2010 20:21 UTC (Wed) by paragw (guest, #45306) [Link] (3 responses)

Not really - our subject is the same - which is asking is Java slower than <language of your choice>, and if it is then is it slow enough to totally give up.

You ask what Java is not slow means - but on the other hand you claim everything else is faster than Java and then don't prove a) what it means and b) where the proof is for the same. Which to me is the same thing - I could ask "Pray tell - what C is faster than Java could possibly mean in a discussion of language performance". To that end I picked up a subject very relevant to this LWN article - Server side applications.

I know it was a meme for long time that Java was slow - and older JVMs were slower than C - but that has changed. Take a modern JVM and have a competent Java programmer write a program that does something meaningful, have a competent C programmer do to the same. Decide on what "being fast" means - that could mean many different things depending on the person, situation, hardware, functionality, viewpoints etc. Compare on reasonable set of "speed points" and then comment how slow Java really is.

Even for CPU bound applications, a recent JVM with JIT is not going to be substantially slower than native code. I have a dozen Java based server side applications running on application server, that load near instantaneously on the web browser. These apps do not just load a pre-formatted HTML page from the DB and send it to client - they do lot of dynamic processing. So it is quite adequate to use Java in this case - and I am not alone doing that.

At the very least one could safely conclude that Java is fast enough and the benefits of using Java for large scale server side projects far greatly outweighs the little loss of speed or increase in memory consumption.

Changing the subject

Posted Feb 3, 2010 23:07 UTC (Wed) by ncm (guest, #165) [Link] (2 responses)

I see what you meant now. "Server app" means, here, "app that doesn't do much, so how fast it is doesn't matter much". In a discussion involving performance comparisons, uses where speed doesn't matter much are way off-topic. I know of server apps, though, where absolute speed is enormously important.

If you have to "decide on what being fast" means, you're already blowing smoke. Measure anything you please. If the numbers come out different, then you've measured something. If they come out the same, you probably didn't. The "Java is slow" meme endures because it is backed by fact. You may insist that Java (or Lisp, or what-have-you) is less slow than it once was, and that would be a fact, too, but it doesn't change the other.

There are enormous benefits, that might offset the slowness, to using Java when your code is obliged to run on in a Java environment. Likewise, there are enormous benefits to writing with Chinese glyphs when your audience is in China. That amounts to a less than ringing endorsement of Chinese glyphs. Certainly there are plenty of people who read Chinese glyphs, but that says nothing about the inherent merits of massively redundant, massively ambiguous syllabaries vs. alphabets. As in any area, it's easy to muddy the water, but muddying the water calls attention to what it seeks to obscure.

We don't need more muddied water.

Changing the subject

Posted Feb 4, 2010 0:51 UTC (Thu) by paragw (guest, #45306) [Link] (1 responses)

<
"Server app" means, here, "app that doesn't do much, so how fast it is doesn't matter much"
>
You just made that up. I never said that. I am not the only person managing/writing enterprise Java applications that really do a whole lot and are fast enough. I have never ever heard anyone complain about slowness due to the language being sucky.

"The "Java is slow" meme endures because it is backed by fact. "

And I say show me the facts that are relevant today/tomorrow. And your perception (or mine for that matter) does not qualify as a fact. Sure you can say Java programs are slower to start up than C programs and thus I would not write frequently exec'd programs in Java. I can say long running server programs are perfectly suited to be written in Java. And we can list these things and add them up and that still would not make Java absolutely slower than C or other way around. So we have to pick a thing to measure - startup speed, execution speed, Features/Speed or Features/Memory ratio.

<
"If you have to "decide on what being fast" means, you're already blowing smoke."
>

So you would much rather rely on everyone's differing perceptions of "fast"? I am not sure that it will take this discussion anywhere.

And where people really muddy waters in the Java vs. C discussion is when they don't consider the "Features offered" part. Once you start adding features and reach feature parity with what Java offers - you are going to bloat code, use more memory - it's a given. That does not however mean we should stick to trivial programs for comparing languages.

And there is no slowness as far as run time execution is concerned - it's all in your perception or it is just another optimization opportunity for the JVM that is untapped. A JIT can do equally well or more optimization than the C compiler. All speed critical programs are run time profiled anyways - I get that for free with a JVM.

So if you want to continue the discussion first let's decide what we are going to measure - other wise I don't see how this can be fruitfully concluded.

Changing the subject

Posted Feb 4, 2010 1:36 UTC (Thu) by ncm (guest, #165) [Link]

You have illustrated my point adequately already, thank you.

Java

Posted Feb 3, 2010 20:01 UTC (Wed) by dlang (guest, #313) [Link] (2 responses)

you have stopped saying that Java is as fast/efficient as C and are now saying that the loss in performance/efficiency is worth the benefits you get from Java.

just because something is written in C doesn't mean it's efficient. If you are having apps that take hundreds of megs of memory when written in C and take less than that when written in Java, your developers messed up. It may be that they aren't making appropriate use of shared libraries, it may be that they have made the apps into monolithic programs that try and do everything rather than having separate programs that communicate with each other. The way to mess up an application architecture are endless, and other than some languages not letting you implement some methods (by not giving you enough control to implement them), pretty much independent of the language used.

Java

Posted Feb 3, 2010 20:11 UTC (Wed) by ncm (guest, #165) [Link]

Yes, you don't get valid results by comparing badly-written programs. You only get valid results by comparing well-written programs. How well a program can be written, though, is limited by the language, and Java has myriad ways to encourage writing badly, so it's hard to find a real well-written Java program. It can be done, but when it's done you still end up with a Java program, so it's hardly worth the trouble.

Java

Posted Feb 3, 2010 21:13 UTC (Wed) by paragw (guest, #45306) [Link]

No I am not changing what I am saying - you are twisting it :)

I said "at the very least" - it is fast enough. So that means it could be faster in some cases and little bit slower in others, completely usable in some cases and totally unusable in other. Of course no one can make an absolute statement to the effect that "some such is always greater than some such for all possible situations/variations/probabilities".

Show me a application server written in C that runs in few megs of memory, please. Make sure it has all the features the Java App servers provide. Reality is you cannot make a significant difference unless you start wandering into the land of impracticality. So why bother - just write in Java and enjoy all the benefits.

Java

Posted Feb 3, 2010 23:13 UTC (Wed) by cmccabe (guest, #60281) [Link] (6 responses)

> Each posting or comment that uses the construction "C/C++" invalidates
> itself. C and C++ are different languages with very different performance
> characteristics. It used to be that scientists and engineers used Fortran
> for serious calculations because their Fortran programs and libraries ran
> much faster than C programs. Nowadays, serious work is done in C++ because
> it's notably faster than Fortran. This is not because Fortran got slower.

Ok, I'll bite. Under what circumstances do you think C++ is faster than regular C?

I guess clever use of templates might give you an edge in certain circumstances. I doubt that use of virtual functions rather than function pointers will give you any performance increase, since the C++ vtable is basically just a table of function pointers. The whole streams vs. printf thing is just a total loss.

Meta-troll: If I were doing "serious calculations" I would ignore choice of programming language and focus on plugging my code into some sort of massively parallel framework, like Hadoop, MapReduce, or one of the bioinformatics ones.

I now return you to your regularly scheduled holy war over Java performance. Since there haven't been any car analogies or comparisons to Hitler and the Nazis, I guess it hasn't run its course yet.

Java

Posted Feb 4, 2010 1:30 UTC (Thu) by ncm (guest, #165) [Link]

A framework is a response to language failure: the languages they used weren't powerful enough to express what they needed as an ordinary library. If you're seriously interested in where C++ can be much faster than C, and faster than Fortran -- not to mention massively parallel -- using an ordinary library, you may look into VSIPL++. It's GPL, so you can learn from it and do the same things in your own libraries.

There's no holy war. Exaggerations of Java performance pop up regularly, and each needs to be capped individually.

Java

Posted Feb 4, 2010 5:45 UTC (Thu) by sbishop (guest, #33061) [Link] (4 responses)

Under what circumstances do you think C++ is faster than regular C?

The commonly claimed circumstance is when operator overloading is used. Unless unspeakable acts are performed using preprocessor tricks, generic sorting functions written in C have to take a function pointer. And that will most likely result in a function pointer being dereferenced at runtime.

On the other hand, the C++ standard library has you define the less-than operator for the type you're sorting. When that standard library code is compiled to sort your type (using templates, like you said), you're likely to end up with no function call at all. (If the compiler can statically determine the type you want to sort, it can inline the comparison logic.) And in my experience, as the need for speed increases, the more likely you are to know what you're sorting at compile time.

I'm sure there are other circumstances where typical, well-written C++ code will outperform typical, well-written C code. I can think of a few others, but that is the only one I'm confident about enough to defend in public. ;)

Java

Posted Feb 5, 2010 21:44 UTC (Fri) by cmccabe (guest, #60281) [Link] (3 responses)

> The commonly claimed circumstance is when operator overloading is used.
> Unless unspeakable acts are performed using preprocessor tricks, generic
> sorting functions written in C have to take a function pointer. And that
> will most likely result in a function pointer being dereferenced at
> runtime.

Generic type-safe sorting in C (without function pointers) is a solved problem.
http://www.corpit.ru/mjt/qsort/qsort.h

I do agree that there are some times when code written with C++ templates can be more readable than code written with C-style macros. But template code is no cakewalk, either. You have to be skilled to write it, and reading it can be harder than parsing perl. The error messages you get from templates are hard for anyone to understand (as an aside, I heard they are introducing a feature called 'concepts' to mitigate this.)

It seems like all of this has been already discussed on lwn already here: http://lwn.net/Articles/286539/

The quick summary was
* virtual function dispatch performance in C++ is the same as function pointer overhead in C on modern architectures.
* templates can produce fast code in some cases, c.f. VISPL++.

> I'm sure there are other circumstances where typical, well-written C++
> code will outperform typical, well-written C code.

typical != well-written.

One of the major criticisms of C++ is that you can make a much bigger mess, much faster, than in C. I think even Bjarne S. himself acknowledges this. I'm too lazy too look up the quote though.

Nowhere near C++

Posted Feb 5, 2010 22:49 UTC (Fri) by quotemstr (subscriber, #45331) [Link] (1 responses)

So that pile of preprocessor hacks can sort an array. What about a linked list? What about a collection of pointers to shared memory? What about elements in a ring buffer? C++'s std::sort handles all those easily, if you have iterators defined. Plus, you can automatically look up the comparison routine (via operator<) if you don't want to specify one explicitly.

Nowhere near C++

Posted Feb 10, 2010 1:36 UTC (Wed) by cmccabe (guest, #60281) [Link]

What I like about C is that it's basically a portably assembly language. It has abstractions, but only the ones that the CPU itself gives you. There is a kind of elegance in the minimalism.

If I wanted abstraction and elegance, I would use something like Ruby, or OCaml. After using these languages, C++ just feels like an awkward compromise.

I've read all of the Effective C++ books and written many KLOCS of C++. I believe that sometimes C++ really is the right tool for the job. In general, though, I think that people tend to overuse the language, sticking it in niches that would better be filled by a C core surrounded by a truly high-level language.

Java

Posted Feb 6, 2010 1:38 UTC (Sat) by sbishop (guest, #33061) [Link]

typical != well-written.

Certainly. But that's independent of the programming language! Perhaps I should have said "most well-written C (and C++)" programs.

Your qsort example proves my point, by the way. Certainly, "typical" C programs aren't written that way.

One of the major criticisms of C++ is that you can make a much bigger mess, much faster, than in C. I think even Bjarne S. himself acknowledges this. I'm too lazy too look up the quote though.

Yes. When it's up to me, I choose between C and C++ by trying to decide whether the headaches of C++ (compiler and ABI incompatibilities, higher complexity, etc) outweigh the benefits (language support for OOP, easy access to high-quality data structures and algorithms, etc). "Right tool for the job" and all that...

Java performance

Posted Feb 3, 2010 14:00 UTC (Wed) by robilad (guest, #27163) [Link] (4 responses)

See http://scribblethink.org/Computer/javaCbenchmark.html for a serious look at the "Java is slow compared to C++" meme from more then 5 years ago. Hint: it wasn't slow at all back then any more, and it only kept getting faster since.

To give you a tiny example: What's the fastest way today to sort a terrabyte of data? Using Apache Hadoop - which is free software written in Java and runs on the JVM:

http://developer.yahoo.net/blogs/hadoop/2008/07/apache_ha...

But of course, that was all back in 2008, it could have been a one hit wonder. But Hadoop proceeded to set two records in 2009:

http://developer.yahoo.net/blogs/hadoop/2009/05/hadoop_so...

'inherenly slow' ? Not really, no. Do check out some of the recent papers on modern JVMs, you may find a few things you may like.

Java performance

Posted Feb 3, 2010 17:21 UTC (Wed) by jwb (guest, #15467) [Link] (2 responses)

An interesting comparison. What do you think the fastest way is to sort a
kilobyte of data? It sure as hell isn't Hadoop. Sorting a terabyte of data is a
problem so large as to mask all of the common Java performance problems.

Java performance

Posted Feb 3, 2010 23:47 UTC (Wed) by robilad (guest, #27163) [Link] (1 responses)

If Java was 'inherently slow', as ncm believes, then it'd be quite unlikely for that 'inherent' disadvantage to disappear by making the problems solved by it harder.

May C & C++ developers sort their kilobytes in peace, maybe they'll even adopt the dual-pivot quicksort algorithm improvement from Java someday. ;)

Java performance

Posted Feb 4, 2010 11:01 UTC (Thu) by jwakely (guest, #60262) [Link]

You had a reasonable point, but ...

> May C & C++ developers sort their kilobytes in peace

DEMsort, psort and OzSort used java too then?

And http://googleblog.blogspot.com/2008/11/sorting-1pb-with-m... too?

Java performance

Posted Feb 5, 2010 1:20 UTC (Fri) by njs (subscriber, #40338) [Link]

That's a pretty silly example, though -- MapReduce (including Hadoop) isn't even an architecture for CPU-bound problems, it's (more or less) a strategy for parallelizing IO.

Now, granted, many people think of programming language speed as this mystical pixie dust that rubs off on whatever system that programming language is embedded in, but those people are silly too. If a programming language is slow in the sense that it takes more CPU cycles to execute a given algorithm, but the overall system is not bottlenecked by the CPU spent on that algorithm... then the programming language speed is irrelevant.

Anyway, in conclusion, Java has disadvantages when doing HPC work, but most of us don't do HPC work, so, whatever. Personally I dislike its aesthetics, but that's just me.

Facebook's "HipHop" PHP translator

Posted Feb 3, 2010 6:34 UTC (Wed) by doogie (guest, #2445) [Link] (4 responses)

Hate to burst your bubble, but no. The free version of quercus is just a standard AST-like interpeter. You have to pay for a commercial version that can convert to java byte-code.

Facebook's "HipHop" PHP translator

Posted Feb 3, 2010 15:41 UTC (Wed) by paragw (guest, #45306) [Link] (3 responses)

I don't understand - it translates PHP to Java code which is compiled to bytecode and then to native code by the JIT. So you are saying the open version does not cache the PHP->Java->Bytecode phase - it certainly does not "interpret" PHP from what I know.

But even if that is the case, it already runs PHP faster than the normal PHP runtime does - as fast as APC.

But the point here is if they spent significant effort in building C++ translator they could have also chosen to spend it optimizing the open version.

Facebook's "HipHop" PHP translator

Posted Feb 3, 2010 20:41 UTC (Wed) by doogie (guest, #2445) [Link] (2 responses)

"Standard" bytecode compilers in java produce a Class, by creating an array of bytes, then using a custom ClassLoader to load it. The free version of qercus doesn't do this, it just maintains an AST-like object graph, then loops over that at runtime. This keeps the java JIT from doing certain kinds of analysis and optimizations.

Facebook's "HipHop" PHP translator

Posted Feb 3, 2010 21:20 UTC (Wed) by paragw (guest, #45306) [Link] (1 responses)

From the Caucho site -

1) PHP code is interpreted/compiled into Java and 2) Quercus and its libraries are written entirely in Java. This architecture allows PHP applications and Java libraries to talk directly with one another at the program level. To facilitate this new Java/PHP architecture, Quercus provides and API and interface to expose Java libraries to PHP.

What does compiled into Java mean - source code translation. The Facebook guy also refers to this being the case. If you look at second point the libraries are already standard Java and the JIT can do whatever it wants to do with it? No?

Facebook's "HipHop" PHP translator

Posted Feb 3, 2010 21:22 UTC (Wed) by paragw (guest, #45306) [Link]

But anyway - I have not checked the code yet - even if I assume what you said was true - my original point was that Facebook developers could have taken the open source quercus and modified it to add optimizations and that would have been relatively trivial rather than writing a completely new C++ translator from scratch.

Facebook's "HipHop" PHP translator

Posted Feb 2, 2010 22:25 UTC (Tue) by bronson (subscriber, #4806) [Link] (4 responses)

Hope someone sends the output through CFront. Language to language to language...

Cfront

Posted Feb 3, 2010 0:29 UTC (Wed) by ncm (guest, #165) [Link] (3 responses)

Cfront is an excellent example of a compiler that translates from one source language to another. Routinely trivialized at the time, and even today, as a "preprocessor", in fact from the very first version it parsed completely into internal semantic representation, and then generated C code from there using the same method that would have been used to produce assembly code. It was perhaps the first example of what became absolutely the normal way to implement the first compiler for any new language.

Cfront

Posted Feb 3, 2010 4:05 UTC (Wed) by jzbiciak (guest, #5246) [Link] (2 responses)

Amusingly, most C programs are also valid C++ programs. It's certainly possible that the output of Cfront is in this subset.

So, you could run the output of Cfront through Cfront and so on, and it'd be "turtles all the way down!"

Cfront

Posted Feb 3, 2010 12:23 UTC (Wed) by HelloWorld (guest, #56129) [Link] (1 responses)

<i>Amusingly, most C programs are also valid C++ programs.</i>
In fact, most C programs are not valid C++ programs. There is no implicit conversion from void* to T* for any T other than void. Thus trivial stuff found in every C program (such as T *t = malloc(42);) will break C++ compatibility.

Cfront

Posted Feb 3, 2010 15:27 UTC (Wed) by jzbiciak (guest, #5246) [Link]

Fair enough.

It isn't terribly hard to add all these casts, but yes, you wouldn't be able to take the output of Cfront and feed it back into Cfront unless Cfront put all these casts in place ahead of time, or you wrote something to add them.

I guess since I regularly compile with -Wc++-compat I've developed a number of very minor habits that make my C programs also valid C++ programs, and so I forget about such things.

Facebook's "HipHop" PHP translator

Posted Feb 3, 2010 4:24 UTC (Wed) by pabs (subscriber, #43278) [Link] (1 responses)

Anyone able to find where the code has moved to? The github link in the article seems to redirect to the github homepage.

Code release

Posted Feb 3, 2010 16:25 UTC (Wed) by markh (subscriber, #33984) [Link]

The code hasn't been released yet. "Soon", "keep an eye on github this week".


Copyright © 2010, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds