The future of Emacs, Guile, and Emacs Lisp
The future of Emacs, Guile, and Emacs Lisp
Posted Oct 9, 2014 8:23 UTC (Thu) by ncm (guest, #165)Parent article: The future of Emacs, Guile, and Emacs Lisp
Posted Oct 9, 2014 22:34 UTC (Thu)
by smurf (subscriber, #17840)
[Link] (4 responses)
However, it has the distinct advantage that your large ASCII text does not suddenly need eight times the storage space just because you insert a character with a smiling kitty face.
Posted Oct 9, 2014 22:49 UTC (Thu)
by mjg59 (subscriber, #23239)
[Link]
Posted Oct 10, 2014 16:08 UTC (Fri)
by lambda (subscriber, #40735)
[Link] (1 responses)
Except, when pattern matching UTF-8, you can generally just match on the bytes (code units) directly, rather than on the characters (codepoints); the algorithms that need to skip ahead by a fixed n characters are generally the exact string matching algorithms like Boyer-Moore and Knuth-Morris-Pratt. There's no reason to require that those be run on the codepoints instead of on the bytes.
If you're doing regular expression matching with Unicode data, even if you use UTF-32, you will need to consume variable length strings as single characters, as you can have decomposed characters that need to match as a single character.
People always bring up lack of constant codepoint indexing when UTF-8 is mentioned, but I have never seen an example in which you actually need to index by codepoint, that doesn't either break in the face of other issues like combining sequences, or can't be solved by just using code unit indexing.
Posted Oct 12, 2014 6:12 UTC (Sun)
by k8to (guest, #15413)
[Link]
It's a little more tedious to CUT a UTF8 string safely based on a size computed in bytes than in some other encodings, but not much more, and that's very rarely a fast path.
Posted Oct 14, 2014 17:29 UTC (Tue)
by Trelane (subscriber, #56877)
[Link]
The future of Emacs, Guile, and Emacs Lisp
The future of Emacs, Guile, and Emacs Lisp
The future of Emacs, Guile, and Emacs Lisp
The future of Emacs, Guile, and Emacs Lisp
The future of Emacs, Guile, and Emacs Lisp