guile and unicode

Posted Oct 22, 2015 17:53 UTC (Thu) by Wol (subscriber, #4433)
In reply to: guile and unicode by wingo
Parent article: Fedora opens up to bundling

> There is no such thing as a deprecated codepoint.

aiui, there is a unicode character for a-acute. There is also the sequence <compose><acute><a>. What are you going to do when one string uses one encoding, and another string uses the other? Apparently, the Unicode spec now says you are supposed to use the <compose> sequence, which Guile v2 implements.

Hence lilypond blowing up when what it thinks is a string COPY, is turned by v2 into a string TRANSLATION :-( Please note, that BOTH the input AND the output in this case are not some random encoding, but are quite explicitly Unicode character strings.

Cheers,
Wol

guile and unicode

Posted Oct 22, 2015 20:18 UTC (Thu) by wingo (guest, #26929) [Link]

This is a bit far afield of the original article, but you persist in a misunderstanding about a project that I maintain :) To Guile 1.x, a character is a byte. To Guile 2.x, a character is a unicode codepoint: not a grapheme.

So when Guile reads a byte sequence which according to the given locale it decodes as U+0065 LATIN SMALL LETTER E followed by U+0301 COMBINING ACUTE ACCENT, those are the code points it stores internally. It does not normalize the codepoint sequence, although there are the string-normalize-nfc, string-normalize-nfd, string-normalize-nfkc, and string-normalize-nfkd procedures if the application chooses to do so, for whatever reason.