Fedora opens up to bundling
Fedora opens up to bundling
Posted Oct 22, 2015 9:34 UTC (Thu) by Wol (subscriber, #4433)In reply to: Fedora opens up to bundling by cry_regarder
Parent article: Fedora opens up to bundling
I use lilypond. Which is still stuck on Guile v1.
"cat input | guile > output"
Assuming all guile does is read from stdin and copy to stdout, unfortunately Guile v2 breaks the expectation that "input == output", and this breaks lilypond :-( In very subtle and hard-to-fix ways :-(
So this is another, pretty classic, example of a program with bundled dependencies because mainstream (no longer) provides features on which it relies.
For info, as I understand it, Guile v2 converts deprecated code points on the fly to up-to-date ones. All composite characters (eg a-umlaut, e-acute etc) are now deprecated and should be <compose><accent><letter> or whatever the correct version is. Lilypond assumes that a character offset in the input will point to the same character in the output, and of course if any of these conversions has happened, it breaks that assumption :-(
Cheers,
Wol
Posted Oct 22, 2015 13:26 UTC (Thu)
by wingo (guest, #26929)
[Link] (2 responses)
Before Guile 2.0, Guile's strings were byte strings -- like in Python 2. Guile 2 changed so that strings were composed of characters. To Guile, a character is a unicode codepoint.
When reading strings from a byte stream, as from an fd, those characters have an encoding, which is usually taken from your locale. In some encodings, like ISO-8859-1, all byte sequences are valid, so reading data in and writing it out will produce the same byte sequence. In others, like UTF-8, maybe Guile could read an invalid sequence. In that case it can error, or replace the character with "?", depending on what the application chose to do. Likewise when writing, it could be Guile tries to write a codepoint that can't be expressed in the desired encoding; at which point it can error or write a ?. The application decides what strategy to take.
There is no such thing as a deprecated codepoint.
Posted Oct 22, 2015 17:53 UTC (Thu)
by Wol (subscriber, #4433)
[Link] (1 responses)
aiui, there is a unicode character for a-acute. There is also the sequence <compose><acute><a>. What are you going to do when one string uses one encoding, and another string uses the other? Apparently, the Unicode spec now says you are supposed to use the <compose> sequence, which Guile v2 implements.
Hence lilypond blowing up when what it thinks is a string COPY, is turned by v2 into a string TRANSLATION :-( Please note, that BOTH the input AND the output in this case are not some random encoding, but are quite explicitly Unicode character strings.
Cheers,
Posted Oct 22, 2015 20:18 UTC (Thu)
by wingo (guest, #26929)
[Link]
So when Guile reads a byte sequence which according to the given locale it decodes as U+0065 LATIN SMALL LETTER E followed by U+0301 COMBINING ACUTE ACCENT, those are the code points it stores internally. It does not normalize the codepoint sequence, although there are the string-normalize-nfc, string-normalize-nfd, string-normalize-nfkc, and string-normalize-nfkd procedures if the application chooses to do so, for whatever reason.
guile and unicode
guile and unicode
Wol
guile and unicode
