LWN.net Logo

Compiling is not translating

Compiling is not translating

Posted Feb 5, 2010 15:44 UTC (Fri) by marcH (subscriber, #57642)
In reply to: Compiling is not translating by nix
Parent article: Facebook's "HipHop" PHP translator

A translator which targets something which can be executed, typically. Things are blurred somewhat when one considers compilers that yield various forms of bytecode which are then interpreted by something else.

The "executable" road will never reach any satisfying definition. It will always be "blurred somewhat".

You might have a bit more success with "high-level vs low-level", as found in Wikipedia. Maybe.


(Log in to post comments)

Compiling is not translating

Posted Feb 5, 2010 17:13 UTC (Fri) by man_ls (subscriber, #15091) [Link]

It is not so blurred; since modern processors have a native mode of execution there is an absolute: a native executable format. Even if internally they translate the instruction set to microcode, that layer is inaccessible from the outside. So there is at least one public native executable format per processor, although there can be more (for instance x86 and x86_64). "Compiling" is going from a human-readable file to this executable format. Merriam-Webster has a clear definition:
2 : a computer program that translates an entire set of instructions written in a higher-level symbolic language (as C) into machine language before the instructions can be executed
From there up files are not literally executable. However judging by the similarity to a processor instruction set people can create virtual machines, bytecode languages and the like. So moving from human-readable files to machine-readable files can be called "compiling" by generalization. Still there is no blurring: going from human-readable to human-readable I would call "translating", and from machine-readable to machine-readable I would call "emulation".

Compiling is not translating

Posted Feb 5, 2010 17:41 UTC (Fri) by ccshan (guest, #2723) [Link]

The two blurry parts of your definition of "compile" are what is human-readable and what is machine-readable. If whether a layer is "inaccessible from the outside" is relevant as you suggested, then would a translator from PHP to C++ become a compiler as soon as someone builds a sealed box that takes C++ code as input and runs it (or even type-checks it)? If whether a language is "similar" to a processor instruction set is relevant as you suggested, then is JVM bytecode really all that similar to a processor instruction set and all that dissimilar to C++ code? (Compare: LISP machine.) Also, humans can understand machine code.

Compiling is not translating

Posted Feb 5, 2010 18:07 UTC (Fri) by man_ls (subscriber, #15091) [Link]

I don't agree with the last part: humans can maybe understand machine code, but not read it in the usual sense. Just as there is usually no ambiguity between text files and binary files.

As to the rest, I'm fine with it: as far as something can be said to be human-readable and something else can be said to be machine-readable, "compiling" is converting between the two. A PHP -> C++ translator, combined with a C++ compiler, yields a PHP compiler.

Compiling is not translating

Posted Feb 5, 2010 18:14 UTC (Fri) by ccshan (guest, #2723) [Link]

> humans can maybe understand machine code, but not read it in the usual sense. Just as there is usually no ambiguity between text files and binary files.
So, which usual sense is that?

> as far as something can be said to be human-readable and something else can be said to be machine-readable, "compiling" is converting between the two
Given that all programming languages are machine-readable, no distinction has been made between translators and compilers. A PHP->C++ translator is a PHP->C++ compiler.

Continuing with the obvious

Posted Feb 5, 2010 18:45 UTC (Fri) by man_ls (subscriber, #15091) [Link]

Text files are strings of values that map to letters and symbols. ASCII for instance is restricted to 0-9, A-Z, a-z, some punctuation marks and a few other symbols; Unicode is quite wider but still maps code points to printable symbols. Binary files are streams of bytes with no such intent.

For "machine-readable" I obviously mean "readable to execute". While "human-readable" is "readable to understand". Again, the intent is what counts. And before you continue pointing out silly flaws: unlambda or brainf*ck are not really made to be readable, but are still meant to be understood by people.

Continuing with the obvious

Posted Feb 6, 2010 0:47 UTC (Sat) by marcH (subscriber, #57642) [Link]

> Text files are strings of values that map to letters and symbols. [...] Binary files are streams of bytes with no such intent.

Sorry but the text/binary boundary does not work any better. It's just a uninteresting and irrelevant matter of representation and encoding. You can store very high-level languages in binary and very low-level languages in ASCII.

Consider assembly for instance. It is text. So following your logic "gcc" is a just a "translator" and "as" is a compiler?

Text and assembly

Posted Feb 6, 2010 10:57 UTC (Sat) by man_ls (subscriber, #15091) [Link]

Let me answer both of you at the same time since you make similar arguments. Text encodings are not just a trivial matter of representation, any more than the alphabet is just a trivial matter of drawing pretty lines. Text encodings are the essence of symbolic representation with computers. The level of the language is not really relevant.
Consider assembly for instance.
Assembly language is not an instance; it is a special case of a language designed to mimic machine instructions and still be understandable by humans. To the extent that "return" is as much a symbol as "0x4e75", you are right that both things are equivalent; but assembly is not a real language so much as a mere transposition of opcodes. In fact, it is so much a boundary case that a special term has been invented for the conversion between assembly and machine code, also called "assembly" -- and this process is so distinct that it gives its name to the language itself, "assembly language". Aren't words fun?

Text and assembly

Posted Feb 6, 2010 22:22 UTC (Sat) by ccshan (guest, #2723) [Link]

You have not answered any question, even explicit questions about your supposedly non-blurry distinction between compilers and mere translators, nor dispelled any concern, even explicit concerns about your distinction between compilers and mere translators being blurry. Questions to answer are signaled by the question-mark (?) punctuation.

Text and assembly

Posted Feb 7, 2010 0:26 UTC (Sun) by man_ls (subscriber, #15091) [Link]

I'm sorry that you feel that way, but I have tried to clarify (not answer) what I felt were the important questions (implicit or explicit) in the discussion. The rest has no interest for me, and I would probably be the wrong person to answer them. I did learn something in the discussion and the issue of compiling vs translating vs mere assembling is now clearer for me, so thanks for that.

Continuing with the obvious

Posted Feb 6, 2010 0:47 UTC (Sat) by ccshan (guest, #2723) [Link]

> Binary files are streams of bytes with no such intent.

I assume that by "such intent" you mean intent to be "readable to understand". But do you mean a file as a sequence of bytes or a file as displayed by, say, the Linux console to the human retina? If the former (which is the text/binary distinction that file(1) tries to make), then a text file (such as 116 104 105 115) is no more intended to be "readable to understand" than a binary file. If the latter, then isn't README.gz a text file? In any case, I don't see an answer to the question "humans can maybe understand machine code, but not read it in *which* sense".

> For "machine-readable" I obviously mean "readable to execute". While "human-readable" is "readable to understand". Again, the intent is what counts.

Isn't the PDP-11 machine language intended to be "readable to understand"? If someone were to design a programming language to be "readable to understand", but that language ended up isomorphic to another language designed to be "readable to execute", then would translators that produce this/these language(s) be compilers? (It has happened.) Is an assembly-language preprocessor a compiler but cpp a mere translator?

Given that all programming languages are intended to be "readable to execute", no distinction has been made between translators and compilers. A PHP->C++ translator is a PHP->C++ compiler.

Copyright © 2012, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds