There's no difference between a "compiler" and a "source code transformer". It doesn't matter if you "transform" PHP to C++, or C++ to assembly, or (as Digital did in the '90s) x86 machine code to Alpha machine code. It's the same process, with the same problems, the same algorithms, the same solutions, and the same outcome. The only difference, for less direct routes, is lost optimization opportunities.
It's disappointing that this thing only doubles the speed of the code, rather than making it 100x faster, or 20x. There should be another 10x worth of easy pickings lying there on the table.
Posted Feb 2, 2010 19:38 UTC (Tue) by dskoll (subscriber, #1630)
[Link]
I'm also surprised it's only a 2x improvement. That either means the PHP interpreter is very efficient (*snigger*) or that most of the slowness is in library functions implementing all the PHP built-in functions.
(Or they didn't benchmark it properly and something else has become the bottleneck.)
Facebook's "HipHop" PHP translator
Posted Feb 2, 2010 20:09 UTC (Tue) by elanthis (guest, #6227)
[Link]
Or, I dunno, the language semantics. But hey, you armchair compiler/runtime enthusiasts go
on thinking that C/C++ gives a 20x performance advantage when executing dynamicly-typed
language semantics. I mean, obviously the compiler can just turn $x+$y into a single
machine op, right? It's not like it has to translate that into a function call that dispatches into
the correct + operator implementation for the operands after possibly converting them to
compatible types at runtime or anything, and then encapsulating the result into a dynamic
type. Nice and simple machine code translation like native C math just isn't even remotely
possible here, guys. That's why there is so much hoopla over tracing JIT engines for
dynamic languages, although a tracing engine for a staticly-typed language is still even
faster.
Facebook's "HipHop" PHP translator
Posted Feb 2, 2010 20:15 UTC (Tue) by JoeBuck (subscriber, #2330)
[Link]
It's possible to do better. But to win, you need to have good enough dataflow analysis to know that, for example, $x is always an integer and $y is always a string, so your compiled code can omit the dynamic dispatch.
Facebook's "HipHop" PHP translator
Posted Feb 2, 2010 23:33 UTC (Tue) by bk (guest, #25617)
[Link]
Right, so your whole server blows up when a malicious user enters "$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$#!/bin/sh\nrm-rf" instead of "1/1/72".
Facebook's "HipHop" PHP translator
Posted Feb 3, 2010 9:51 UTC (Wed) by alankila (subscriber, #47141)
[Link]
The point of the dataflow analysis is to know when that is possible and act accordingly.
Facebook's "HipHop" PHP translator
Posted Feb 4, 2010 23:48 UTC (Thu) by efexis (guest, #26355)
[Link]
No, but now I have to horizontally scroll to read this thread... thanks for that :-/
Facebook's "HipHop" PHP translator
Posted Feb 2, 2010 20:52 UTC (Tue) by dskoll (subscriber, #1630)
[Link]
I mean, obviously the compiler can just turn $x+$y into a single
machine op, right?
Did you read the article? Among other things, it claimed:
Whenever possible our generated code uses static binding for functions and variables. We also use type inference to pick the most specific type possible for our variables and thus save memory.
So yes, I would think that in a lot of cases, it could generate very fast code for $x+$y.
Facebook's "HipHop" PHP translator
Posted Feb 3, 2010 0:09 UTC (Wed) by elanthis (guest, #6227)
[Link]
Reading an article and a site's home page does not make you an expert,
clearly. Let's pretend I don't have actual real experience with this kind
of tech and just think this through logically instead of quoting soundbites
from a web page. If you are writing:
code like:
$x = 1;
$y = 3;
$z = $x + $y;
Then sure, it can use type inference. When you have something like:
function foo($x, $y) {
return $x + $y;
}
Then the compiler has no freaking clue what is going to be passed in.
Generally, I find, people do not create variables to hold static values,
but instead they tend to hold data from various other sources, many of
which by PHP semantics simply cannot have their types inferred. Some code
can be highly optimized by such techniques, some cannot, and the latter
just happens to be more common in at least the apps I've worked with.
Unless the compiler is compiling multiple versions of these functions (like
how a tracing JIT compiler might), then most of the code in a real-life app
(not a micro-benchmark) is going to be inelligable for type lowering, and
the runtime has to fall back on full dynamic type semantics.
This is one reason that, no matter what advances the dynamic language folks
working on JS or Python or whatever might possibly make, the speed of a
statically-typed language can always be greater by just using the same
techniques the dynamic runtimes use but without the type guards (for
tracing implementations) or dynamic fallbacks.
Facebook's "HipHop" PHP translator
Posted Feb 3, 2010 10:31 UTC (Wed) by PO8 (guest, #41661)
[Link]
Static analysis techniques used in the last 20 years can generally infer the type of parameters to a procedure or function call from context. Often they can do it even without looking at the whole program. They are good at looking at small clues and following chains of reasoning.
I'm a huge fan of static typing, but from an efficiency point of view it's just not a huge win in 2010. There are too many well-known clever tricks for making dynamically-typed languages run fast. I like static typing because it serves as a kind of formal analysis that increases the odds of my writing a correct program. If it didn't do that I'd have ditched it for dynamic typing long ago.
Facebook's "HipHop" PHP translator
Posted Feb 3, 2010 12:06 UTC (Wed) by HelloWorld (guest, #56129)
[Link]
If it is "not a huge win in 2010", then why are statically typed languages still vastly faster in any benchmark?
For example, compare JavaScript in Google's V8 to Haskell: http://shootout.alioth.debian.org/u32/benchmark.php?test=...
JavaScript is between three and onehundred times slower. And this is a state-of-the-art VM using tracing and all that.
Facebook's "HipHop" PHP translator
Posted Feb 3, 2010 14:50 UTC (Wed) by darthscsi (subscriber, #8111)
[Link]
Google's V8 implementation of JavaScript is not the fastest dynamically
typed language implementation. PLT Scheme is hardly the fastest
implementation Scheme and it handily beats Google's V8.
I wish Chez Scheme was on the list, as it's the fastest scheme I know of,
but it is a commercial implementation. It does all sorts of tricks with
object padding, stashing type tags for the common types inside pointers,
placing objects in per type regions (middle bits become type tags), type
inference and unboxing, and more.
Facebook's "HipHop" PHP translator
Posted Feb 5, 2010 20:47 UTC (Fri) by HelloWorld (guest, #56129)
[Link]
Yes, perhaps it is possible to make a dynamically typed language almost as fast as a statically typed one. The question is, why bother? It makes the implementation much more complex in order to achieve the questionable goal of dynamic typing. I don't think that's a good idea.
Facebook's "HipHop" PHP translator
Posted Feb 3, 2010 14:08 UTC (Wed) by dskoll (subscriber, #1630)
[Link]
function foo($x, $y) {
return $x + $y;
}
Then the compiler has no freaking clue what is going to be passed in.
Possibly. But still, I'd expect a better than 2x speedup. Parsing the PHP code, especially all the included and required files has to be pretty slow.
Also, the "+" operator in PHP can only work on numbers. So $x and $y
are either integers or floats; sorting out the possible cases is pretty quick.
Facebook's "HipHop" PHP translator
Posted Feb 3, 2010 15:10 UTC (Wed) by liljencrantz (subscriber, #28458)
[Link]
Parsing is done ahead of time, so it is in fact more or less free. The plus operator also works on arrays.
Facebook's "HipHop" PHP translator
Posted Feb 3, 2010 15:18 UTC (Wed) by dskoll (subscriber, #1630)
[Link]
Parsing is done ahead of time, so it is in fact more or less free.
Eh? Not in standard PHP. Parsing is free only if you're using an opcode cache like APC or similar.
The plus operator also works on arrays.
So it does. Still, it's relatively restricted. It doesn't work on strings or objects.
Facebook's "HipHop" PHP translator
Posted Feb 4, 2010 14:17 UTC (Thu) by jond (subscriber, #37669)
[Link]
I think it can be assumed a site dealing with "400 billion PHP-based page
views every month" is using an opcode cache already.
Facebook's "HipHop" PHP translator
Posted Feb 3, 2010 9:17 UTC (Wed) by srepmub (subscriber, #47230)
[Link]
the speedup one can get from compiling a (restricted) dynamic language varies greatly, depending on what a given program is doing. if a program is IO or memory bound, or string intensive (serving web pages sure sounds like it would be string intensive), there may not be a significant speedup at all. the example set of shed skin, a (restricted-)Python-to-C++ compiler clearly demonstrates this: while most programs become a lot faster (up to 350 times), some actually become slower.
Facebook's "HipHop" PHP translator
Posted Feb 2, 2010 22:00 UTC (Tue) by ejr (subscriber, #51652)
[Link]
Remember, Scheme->C worked from a well-defined language. This tool targets the equivalent for PHP, which is, um, not easily defined.
Of course PHP->Scheme or PHP->Lisp would have benefited from all those years of optimization in other systems. But it would be seen as a dead-end and impossible to use in "real life". Some enterprising PHP hoster could base such a system off of the HipHop work, show performance advantages over other hosters, and...
Lisp
Posted Feb 2, 2010 23:06 UTC (Tue) by ncm (subscriber, #165)
[Link]
It would be seen as very, very funny -- particularly by Lisp fans. Or, possibly, just in bad taste. A Forth -> Lisp translator would be funnier.
Facebook's "HipHop" PHP translator
Posted Feb 2, 2010 23:48 UTC (Tue) by chax (guest, #52122)
[Link]
And this is what is Roadsend PHP Compiler doing (php -> scheme (bigloo) ->
c). Roadsend is now producing a new compiler based on llvm.
Facebook's "HipHop" PHP translator
Posted Feb 3, 2010 1:07 UTC (Wed) by ejr (subscriber, #51652)
[Link]
I suspect many of these would better be served by working atop a JavaScript runtime rather than dropping all the way to LLVM. The compiling JS runtimes already handle runtime specialization that would benefit PHP, and JS likely is a closer semantic match type-wise. LLVM's specific strengths over other JIT engines are in expressing vector operations, imho.
But I'm glad someone went a more sensible initial route to re-use an existing dynamic language implementation (although gambit's more my preference). Thanks!
Lisp comedy
Posted Feb 3, 2010 1:48 UTC (Wed) by ncm (subscriber, #165)
[Link]
It writes itself.
Lisp comedy
Posted Feb 3, 2010 10:34 UTC (Wed) by PO8 (guest, #41661)
[Link]
(! LOL)
Facebook's "HipHop" PHP translator
Posted Feb 2, 2010 22:01 UTC (Tue) by xorbe (subscriber, #3165)
[Link]
> There's no difference between ...
> The only difference ...
Anyways, most people consider a compiler to generate something a bit lower level than C++, and would consider "source code transformation" to be more informative about its true nature.
Facebook's "HipHop" PHP translator
Posted Feb 2, 2010 22:19 UTC (Tue) by ccshan (guest, #2723)
[Link]
Your quotes are out of context; please read what you quoted again. There's no difference between a compiler and a source-code transformer. There are differences between a compiler/source-code-transformer that uses less direct routes and one that uses more direct routes. A compiler can use less or more direct routes just as a source-code transformer can use less or more direct routes, because, after all, there is no difference between a compiler and a source-code transformer.
Separately, you suggested that a compiler is considered "to generate something a bit lower level than C++". According to this definition, and assuming that assembly code is "lower level than C++", you would call a preprocessor for assembly code (such as HLA "the High-Level Assembler" perhaps) a compiler, right? After all, it generates assembly code.
Facebook's "HipHop" PHP translator
Posted Feb 3, 2010 4:41 UTC (Wed) by xorbe (subscriber, #3165)
[Link]
What? Yeah so technically the terms are interchangeable, but I'd use compiler, source transform, and preprocessor as others would expect within a too change.
PHP code -> source code transform -> g++ compiler
That's how 99.9% would talk about it.
PHP code -> compiler -> compiler
That's you confusing the 99.9%! That's it, I'm referring to all input/output programs as transmogrification binaries!
A compiler's a compiler, no matter how small
Posted Feb 3, 2010 7:11 UTC (Wed) by ncm (subscriber, #165)
[Link]
99.9% of statistics are made up on the spot.
Regardless, a program that takes source code and gives you a binary executable is a compiler. If it happens to use G++ inside none of the user's business. So, 99.9% of people who have the faintest idea what a compiler is (i.e., 0.1% of normal people) would call it a compiler, because that's what it is. Be proud, you're in the remaining 0.001% of the population who call cars "four-wheeled self-propelled vehicles" and hamburgers "layered bovine handmeals" or something.
A compiler's a compiler, no matter how small
Posted Feb 3, 2010 10:25 UTC (Wed) by dgm (subscriber, #49227)
[Link]
Close, but no cookie. A program that takes source code and gives you an executable binary is usually a compiler+assembler+linker, or at least a compiler+linker.
There's more than opcodes in an executable file.
A compiler's a compiler, no matter how small
Posted Feb 3, 2010 19:55 UTC (Wed) by ncm (subscriber, #165)
[Link]
Gcc produces binary executables. Gcc uses gas and gold underneath, and that's none of my business. Gcc is a compiler.
Enjoy your layered bovine handmeals.
Compiling is not translating
Posted Feb 4, 2010 22:15 UTC (Thu) by man_ls (subscriber, #15091)
[Link]
The difference is important. I am currently trying to write a program to translate Java code to Python code, and no sane mind would call that a compiler. Even if it generates code and then tries to run it. I don't know if "code transformer" is a good description, but "compiler" is not it. I prefer calling the process "translation" myself.
My mother makes hamburgers with 50% beef, 50% pork. They are delicious. You may choose to call them something else than "hamburgers" and I will not be offended.
Compiling is not translating
Posted Feb 4, 2010 23:19 UTC (Thu) by nix (subscriber, #2304)
[Link]
Indeed. Compilers are a special kind of translator, not vice versa. (GCC
is a translator: so is your Java->Python transformer.)
Compiling is not translating
Posted Feb 4, 2010 23:22 UTC (Thu) by ccshan (guest, #2723)
[Link]
> Compilers are a special kind of translator
So, which special kind is that?
Compiling is not translating
Posted Feb 4, 2010 23:53 UTC (Thu) by nix (subscriber, #2304)
[Link]
A translator which targets something which can be executed, typically.
Things are blurred somewhat when one considers compilers that yield
various forms of bytecode which are then interpreted by something else.
Translators can yield all sorts of output: e.g. IDL 'compilers' which in
one mode of operation can take IDL descriptions and yield (say) C++ header
files (which do not themselves produce executable code, in general). Hell,
even TeX counts -- although it's a very strange translator (parser? what's
that? we don't use normal terminology here, we have the 'mouth'
and 'stomach' instead) it even has an optimization pass of sorts
(reflowing text) and *it* certainly doesn't yield executable code.
(Although dvips can take the DVI file TeX produces and create PostScript
out of it, which is of course itself a full-blown computer language which
is then typically interpreted by the GhostScript translator to yield a
bunch of page images.)
Compiling is not translating
Posted Feb 5, 2010 1:44 UTC (Fri) by ccshan (guest, #2723)
[Link]
I don't see what you mean by "execute". For example, I'm not sure why you think DVI is not executable but PostScript is. Maybe you mean that DVI is not Turing-complete but PostScript is, but lots of things are compiled to finite-state machines (if only large ones).
And C++ header files *are* Turing-complete, if we ignore implementation limits on template instantiation (just as we ignore physical-computer limits on memory use).
Compiling is not translating
Posted Feb 5, 2010 9:36 UTC (Fri) by anselm (subscriber, #2796)
[Link]
TeX is no more or less of a compiler than the Sun Java »compiler« is. Both
create machine code for a »virtual« processor that is then executed by a
specialised interpreter. Read sections 583-591 of »TeX: The Program« by
Donald E. Knuth.
Compiling is not translating
Posted Feb 5, 2010 15:44 UTC (Fri) by marcH (subscriber, #57642)
[Link]
A translator which targets something which can be executed, typically. Things are blurred somewhat when one considers compilers that yield various forms of bytecode which are then interpreted by something else.
The "executable" road will never reach any satisfying definition. It will always be "blurred somewhat".
You might have a bit more success with "high-level vs low-level", as found in Wikipedia. Maybe.
Compiling is not translating
Posted Feb 5, 2010 17:13 UTC (Fri) by man_ls (subscriber, #15091)
[Link]
It is not so blurred; since modern processors have a native mode of execution there is an absolute: a native executable format. Even if internally they translate the instruction set to microcode, that layer is inaccessible from the outside. So there is at least one public native executable format per processor, although there can be more (for instance x86 and x86_64). "Compiling" is going from a human-readable file to this executable format. Merriam-Webster has a clear definition:
2 : a computer program that translates an entire set of instructions written in a higher-level symbolic language (as C) into machine language before the instructions can be executed
From there up files are not literally executable. However judging by the similarity to a processor instruction set people can create virtual machines, bytecode languages and the like. So moving from human-readable files to machine-readable files can be called "compiling" by generalization. Still there is no blurring: going from human-readable to human-readable I would call "translating", and from machine-readable to machine-readable I would call "emulation".
Compiling is not translating
Posted Feb 5, 2010 17:41 UTC (Fri) by ccshan (guest, #2723)
[Link]
The two blurry parts of your definition of "compile" are what is human-readable and what is machine-readable. If whether a layer is "inaccessible from the outside" is relevant as you suggested, then would a translator from PHP to C++ become a compiler as soon as someone builds a sealed box that takes C++ code as input and runs it (or even type-checks it)? If whether a language is "similar" to a processor instruction set is relevant as you suggested, then is JVM bytecode really all that similar to a processor instruction set and all that dissimilar to C++ code? (Compare: LISP machine.) Also, humans can understand machine code.
Compiling is not translating
Posted Feb 5, 2010 18:07 UTC (Fri) by man_ls (subscriber, #15091)
[Link]
I don't agree with the last part: humans can maybe understand machine code, but not read it in the usual sense. Just as there is usually no ambiguity between text files and binary files.
As to the rest, I'm fine with it: as far as something can be said to be human-readable and something else can be said to be machine-readable, "compiling" is converting between the two. A PHP -> C++ translator, combined with a C++ compiler, yields a PHP compiler.
Compiling is not translating
Posted Feb 5, 2010 18:14 UTC (Fri) by ccshan (guest, #2723)
[Link]
> humans can maybe understand machine code, but not read it in the usual sense. Just as there is usually no ambiguity between text files and binary files.
So, which usual sense is that?
> as far as something can be said to be human-readable and something else can be said to be machine-readable, "compiling" is converting between the two
Given that all programming languages are machine-readable, no distinction has been made between translators and compilers. A PHP->C++ translator is a PHP->C++ compiler.
Continuing with the obvious
Posted Feb 5, 2010 18:45 UTC (Fri) by man_ls (subscriber, #15091)
[Link]
Text files are strings of values that map to letters and symbols. ASCII for instance is restricted to 0-9, A-Z, a-z, some punctuation marks and a few other symbols; Unicode is quite wider but still maps code points to printable symbols. Binary files are streams of bytes with no such intent.
For "machine-readable" I obviously mean "readable to execute". While "human-readable" is "readable to understand". Again, the intent is what counts. And before you continue pointing out silly flaws: unlambda or brainf*ck are not really made to be readable, but are still meant to be understood by people.
Continuing with the obvious
Posted Feb 6, 2010 0:47 UTC (Sat) by marcH (subscriber, #57642)
[Link]
> Text files are strings of values that map to letters and symbols. [...] Binary files are streams of bytes with no such intent.
Sorry but the text/binary boundary does not work any better. It's just a uninteresting and irrelevant matter of representation and encoding. You can store very high-level languages in binary and very low-level languages in ASCII.
Consider assembly for instance. It is text. So following your logic "gcc" is a just a "translator" and "as" is a compiler?
Text and assembly
Posted Feb 6, 2010 10:57 UTC (Sat) by man_ls (subscriber, #15091)
[Link]
Let me answer both of you at the same time since you make similar arguments. Text encodings are not just a trivial matter of representation, any more than the alphabet is just a trivial matter of drawing pretty lines. Text encodings are the essence of symbolic representation with computers. The level of the language is not really relevant.
Consider assembly for instance.
Assembly language is not an instance; it is a special case of a language designed to mimic machine instructions and still be understandable by humans. To the extent that "return" is as much a symbol as "0x4e75", you are right that both things are equivalent; but assembly is not a real language so much as a mere transposition of opcodes. In fact, it is so much a boundary case that a special term has been invented for the conversion between assembly and machine code, also called "assembly" -- and this process is so distinct that it gives its name to the language itself, "assembly language". Aren't words fun?
Text and assembly
Posted Feb 6, 2010 22:22 UTC (Sat) by ccshan (guest, #2723)
[Link]
You have not answered any question, even explicit questions about your supposedly non-blurry distinction between compilers and mere translators, nor dispelled any concern, even explicit concerns about your distinction between compilers and mere translators being blurry. Questions to answer are signaled by the question-mark (?) punctuation.
Text and assembly
Posted Feb 7, 2010 0:26 UTC (Sun) by man_ls (subscriber, #15091)
[Link]
I'm sorry that you feel that way, but I have tried to clarify (not answer) what I felt were the important questions (implicit or explicit) in the discussion. The rest has no interest for me, and I would probably be the wrong person to answer them. I did learn something in the discussion and the issue of compiling vs translating vs mere assembling is now clearer for me, so thanks for that.
Continuing with the obvious
Posted Feb 6, 2010 0:47 UTC (Sat) by ccshan (guest, #2723)
[Link]
> Binary files are streams of bytes with no such intent.
I assume that by "such intent" you mean intent to be "readable to understand". But do you mean a file as a sequence of bytes or a file as displayed by, say, the Linux console to the human retina? If the former (which is the text/binary distinction that file(1) tries to make), then a text file (such as 116 104 105 115) is no more intended to be "readable to understand" than a binary file. If the latter, then isn't README.gz a text file? In any case, I don't see an answer to the question "humans can maybe understand machine code, but not read it in *which* sense".
> For "machine-readable" I obviously mean "readable to execute". While "human-readable" is "readable to understand". Again, the intent is what counts.
Isn't the PDP-11 machine language intended to be "readable to understand"? If someone were to design a programming language to be "readable to understand", but that language ended up isomorphic to another language designed to be "readable to execute", then would translators that produce this/these language(s) be compilers? (It has happened.) Is an assembly-language preprocessor a compiler but cpp a mere translator?
Given that all programming languages are intended to be "readable to execute", no distinction has been made between translators and compilers. A PHP->C++ translator is a PHP->C++ compiler.
Facebook's "HipHop" PHP translator
Posted Feb 2, 2010 23:28 UTC (Tue) by bk (guest, #25617)
[Link]
Let:
A be a theoretical 100% perfectly optimizing C++ to machine code compiler
B be a theoretical 100% perfectly optimizing PHP to C++ "code transformer"/compiler
C be a theoretical 100% perfectly optimizing PHP to machine code compiler
...would it not be true that A + B = C ?
(IOW, there need not necessarily be 'lost optimization opportunities')
100%
Posted Feb 3, 2010 0:33 UTC (Wed) by ncm (subscriber, #165)
[Link]
No. If the A knew that its input could only come from B, then A could apply additional optimizations forbidden for the general case. However, the construct "100% perfectly optimizing" is strictly meaningless; many optimizations imply tradeoffs, potentially slowing down unlikely sequences in exchange for improving more common ones.
Facebook's "HipHop" PHP translator
Posted Feb 3, 2010 17:06 UTC (Wed) by marcH (subscriber, #57642)
[Link]
> There's no difference between a "compiler" and a "source code transformer".
A "source code transformer" is a very special class of compiler, because it produces something I can read.
If some people can read assembly code, then they have a different definition of "source code transformer" than me. That's fine, because I never talk to them anyway.