|| ||Alan Schmitt <alan.schmitt-AT-polytechnique.org>|
|| ||Attn: Development Editor, Latest Caml Weekly News|
|| ||Tue, 24 Mar 2009 09:56:07 +0000|
|| ||cwn <cwn-AT-lists.idyll.org>|
|| ||Article, Thread
Here is the latest Caml Weekly News, for the week of March 17 to 24,
1) XML output
2) ocaml-http is looking for a new maintainer
3) Google summer of Code proposal
1) XML output
** Rémi Dewitte asked and Gerd Stolpmann answered:
> I have used pxp to parse xml and I am happy with it. I'd like now to
> produce xml and wonder what are the options to do so (possibly the
Maybe not the simplest: Use the PXP preprocessor to create the output
tree, and print the tree:
> I think I am going to start with the Printf module. I wonder how well
> it handles utf8 for example.
UTF-8 are just bytes for printf.
> And I'll have to write a kind of xml_encode function. I am pretty
> it has already be done somewhere !
let xml_encode =
That would assume the input is UTF-8 encoded, and the output is
ASCII-encoded. You can control which ASCII characters get the special
XML representation &...; with the unsafe_chars optional argument.
Docs are at
** Sylvain Le Gall also replied:
Maybe it is a bit overkilling, but there is also ocamlduce.
(dev for ocaml 3.11:)
OCamlduce can also be used with Eliom/OCsigen.
AFAIK, using ocamlduce can help you to type check your output tree
directly within OCaml compiler...
** Matthieu Wipliez also replied:
Yet another solution is Xmlm by Daniel Bünzli.
This is probably the easiest and lightweight solution: Xmlm comes as a
module and its interface, and it's BSD so you can just copy/paste it
** Michael Ekstrand then added:
I second the xmlm suggestion. Polling event-based parsing is very slick
and maps well into the functional paradigm, and its XML writing support
(generating a stream of events identical to those you read) makes
generation quite intuitive and reliable.
2) ocaml-http is looking for a new maintainer
** Stefano Zacchiroli announced:
Hi all, ocaml-http  is looking for a new maintainer.
More details have been written on my blog .
If you are interested in taking over the maintenance, please mail me
3) Google summer of Code proposal
** In this long thread full of technical posts, Andrey Riabushenko
said and Xavier Leroy replied:
> I would like to develop LLVM frontend to Ocaml language. LLVM does
> in GSoC. LLVM do not mind to developed a ocaml frontend as LLVM
> I want to discuss details with you before I will make an official
> LLVM. [...]
Do authors of ocaml has something to say about the idea?
Da. A number of things, actually.
1- I know of at least 3, maybe 4 other projects that want to do
something with OCaml and LLVM. Clearly, some coordination between
these efforts is needed.
2- If you're applying for funding through a summer of code project,
you need to articulate good reasons why you want to combine OCaml and
LLVM. "Ocaml is cool, LLVM is cool, so OCaml+LLVM would be extra
cool" is not enough. "It will generate PIC-16 code" isn't either.
Run-time code generation could be a good enough reason, but you still
need to weigh the development cost of the LLVM approach against, for
example, hacking the current OCaml code generator so that it emits
machine code in memory instead of assembly code.
3- A language implementation like OCaml breaks down in four big parts:
1- Front-end compiler
2- Back-end compiler and code emitter
3- Run-time system
4- OS interface
Of these four, the back-end is not the biggest part nor the most
complicated part. LLVM gives you part 2, but you still need to
consider the other 3 parts. Saying "I'll do 1, 3 and 4 from scratch",
Harrop-style, means a 5-year project. To get somewhere within a
reasonable amount of time, you really need to reuse large parts of the
existing OCaml code base.
4- From a quick look at LLVM specs, the two aspects that appear most
problematic w.r.t. Caml are exception handling and GC interface.
LLVM seems to implement C++/Java "zero-cost" exceptions, which are
known to be quite costly for Caml. (Been there, done that, in the
early 1990s.) I regret that LLVM does not provide support for
alternate implementations of exceptions, like the C-- intermediate
language of Ramsey et al does:
The big issue, however, is GC interface. There are GC techniques that
need no support from the back-end: stack maps and conservative
collection. Stack maps are very costly in execution time.
Conservative GC like the Boehm-Weiser GC is already much better. But
for full efficiency, back-end support is required. LLVM documents a
minimal interface to produce stack maps and make them available to the
GC at run-time:
The first thing you want to investigate is whether this interface is
enough to support an exact, relocating collector such as
OCaml's. According to
Gordon Henriksen did succeed in interfacing OCaml's GC with LLVM.
Maybe there is still some more work to do here, in which case I
recommend you look into this first.
6- Here is a schematic of the Caml compiler. (Use a fixed-width font.)
| parsing and preprocessing
Parsetree (untyped AST)
| type inference and checking
Typedtree (type-annotated AST)
| pattern-matching compilation, elimination of modules,
/ \ closure conversion, inlining, uncurrying,
v \ data representation strategy
| code generation
In my opinion, the simplest approach is to start at Cmm and generate
LLVM code from there. Starting at one of the higher-level
intermediate forms would entail reimplementing large chunks of the
OCaml compiler. Note that Cmm is quite close to the C-- intermediate
language mentioned earlier, so there is a lot to learn from Fermin
Reig's experience with an OCaml/C-- back-end.
7- To finish, I'll preventively deflect some likely reactions by Jon
"But you'll be tied to OCaml's data representation strategy!"
Right, but 1- implementing you own data representation strategy is
a lot of work, especially if it is type-based, and 2- OCaml's
strategy is close to optimal for symbolic computing.
"But LLVM assembly is typed, so you must have types!"
Just use int32 or int64 as your universal type and cast to
appropriate pointer types in loads or stores.
"But your code will be tainted by OCaml's evil license!"
It is trivial to make ocamlopt dump Cmm code in a file or pipe.
(The -dcmm debug option already does this.) Then, you can have a
separate, untainted program that reads the Cmm code and transforms it.
"But shadow stacks are the only way to go for GC interface!"
No, it's probably the worst approach performance-wise; even a
conservative GC should work better.
** Jon Harrop replied:
> 3- A language implementation like OCaml breaks down in four big
> 1- Front-end compiler
> 2- Back-end compiler and code emitter
> 3- Run-time system
> 4- OS interface
> Of these four, the back-end is not the biggest part nor the most
> complicated part. LLVM gives you part 2, but you still need to
> consider the other 3 parts. Saying "I'll do 1, 3 and 4 from
> Harrop-style, means a 5-year project.
On the contrary, my "style" was to provide the features that I value
(multicore & FFI) in a usable form (stop-the-world) with the shortest
possible development time (i.e. <<6 months to create something useful).
1- Front-end compiler: use camlp4 to provide an embedded DSL for
high-performance parallel numerics and/or reuse front-ends from existing
compilers like OCaml, PolyML, MosML, NekoML to build completely new
2- Back-end compiler and code emitter: reuse LLVM.
3- Run-time system: write the simplest possible precise GC and use
stop-the-world to apply it to threads that may then run in parallel.
4- OS interface: make it as easy as possible to call C directly.
HLVM had solved (2), (3) and (4) after only 3 months of part-time
vindicated my style!
> 7- To finish, I'll preventively deflect some likely reactions by Jon
> "But you'll be tied to OCaml's data representation strategy!"
> Right, but 1- implementing you own data representation strategy is
> a lot of work, especially if it is type-based, and
Actually I found that easy, not least because I wanted a user-friendly
I just used C's data representation whenever possible.
> 2- OCaml's strategy is close to optimal for symbolic computing.
Is MLton not several times faster than OCaml for symbolic computing?
> "But LLVM assembly is typed, so you must have types!"
> Just use int32 or int64 as your universal type and cast to
> appropriate pointer types in loads or stores.
That is entirely possible and could be useful as an incremental
OCaml's existing bytecode interpreter but it is not a step toward my
> "But your code will be tainted by OCaml's evil license!"
> It is trivial to make ocamlopt dump Cmm code in a file or pipe.
> (The -dcmm debug option already does this.) Then, you can have a
> separate, untainted program that reads the Cmm code and
Again, that is another technically-feasible step away from my goals
OCaml's CMM has already been mangled for its data representation (e.g.
ints, boxed floats).
> "But shadow stacks are the only way to go for GC interface!"
> No, it's probably the worst approach performance-wise; even a
> conservative GC should work better.
Building a state-of-the-art optimized concurrent GC Leroy-style means an
infinity-year project. =:-p
Seriously though, I think it is essential to get a first working
version of a
GC that permits parallel threads. HLVM will be useful to a lot of
before its GC gets optimized.
Using folding to read the cwn in vim 6+
Here is a quick trick to help you read this CWN if you are viewing it
vim (version 6 or greater).
If you know of a better way, please let me know.
If you happen to miss a CWN, you can send me a message
(email@example.com) and I'll mail it to you, or go take a
the archive (<http://alan.petitepomme.net/cwn/>) or the RSS feed of the
archives (<http://alan.petitepomme.net/cwn/cwn.rss>). If you also wish
to receive it every week by mail, you may subscribe online at
Alan Schmitt <http://alan.petitepomme.net/>
The hacker: someone who figured things out and made something cool
to post comments)