Rustaceans at the border
Rustaceans at the border
Posted Apr 22, 2022 13:48 UTC (Fri) by khim (subscriber, #9252)In reply to: Rustaceans at the border by smurf
Parent article: Rustaceans at the border
> That, or the precedence of Latin-1 with its mountain of composed characters proved too strong and nobody even thought about solving the problem some other way until it was too late.
It's not even about the “precedence of Latin-1”. It's about the simple practical need to keep parts of your data in Unicode and parts in some other encoding with constant conversions between these.
It took years (about 10 to 20 years, in fact) before people, finally, stopped using legacy encodings.
If Unicode would have been impossible (or very hard and inefficient) to use in that fashion then it would have never taken off.
Size considerations were also quite real: Japan persisted for years with ISO-2022-JP both because roundtrip there is not perfect and also because it made documents 50% larger.
The only big issue with Unicode was initial assumption that 16-bit would be enough, after all: that prompted thus useless and very costly trip to USC-2 then UTF-16 and then, finally, to UTF-8.
USC-2 made sense but UTF-16 has all the problems of UTF-8 without giving you any benefits.
If people realized earlier that USC-2 wouldn't work then all that hoopla with two kinds of functions in Java, endless bugs with UTF-16 in browsers and other such things could have been avoided.
But oh, well, we can't change the path, can only adopt UTF-8 for the future.
